Clustering under composite generative models

Tiexing Wang, Donald J. Bucci, Yingbin Liang, Biao Chen, Pramod Kumar Varshney

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

This paper studies clustering of data samples generated from composite distributions using the Kolmogorov-Smirnov (KS) based K-means algorithm. All data sequences are assumed to be generated from unknown continuous distributions. The maximum intra-cluster KS distance of each distribution cluster is assumed to be smaller than the minimum inter-cluster KS distance of different clusters. The analysis of convergence and upper bounds on the error probability are provided for both cases with known and unknown number of clusters. Furthermore, it is shown that the probability of error decays exponentially as the number of samples in each data sequence goes to infinity, and the error exponent is only a function of the difference of the inter-cluster and intra-cluster KS distances. The analysis is validated by simulation results.

Original languageEnglish (US)
Title of host publication2018 52nd Annual Conference on Information Sciences and Systems, CISS 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-6
Number of pages6
ISBN (Electronic)9781538605790
DOIs
StatePublished - May 21 2018
Event52nd Annual Conference on Information Sciences and Systems, CISS 2018 - Princeton, United States
Duration: Mar 21 2018Mar 23 2018

Other

Other52nd Annual Conference on Information Sciences and Systems, CISS 2018
CountryUnited States
CityPrinceton
Period3/21/183/23/18

Fingerprint

Composite materials
Error probability

Keywords

  • composite distributions
  • K-means algorithm
  • Kolmogorov-Smirnov distance
  • probability of error

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Information Systems

Cite this

Wang, T., Bucci, D. J., Liang, Y., Chen, B., & Varshney, P. K. (2018). Clustering under composite generative models. In 2018 52nd Annual Conference on Information Sciences and Systems, CISS 2018 (pp. 1-6). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CISS.2018.8362256

Clustering under composite generative models. / Wang, Tiexing; Bucci, Donald J.; Liang, Yingbin; Chen, Biao; Varshney, Pramod Kumar.

2018 52nd Annual Conference on Information Sciences and Systems, CISS 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 1-6.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, T, Bucci, DJ, Liang, Y, Chen, B & Varshney, PK 2018, Clustering under composite generative models. in 2018 52nd Annual Conference on Information Sciences and Systems, CISS 2018. Institute of Electrical and Electronics Engineers Inc., pp. 1-6, 52nd Annual Conference on Information Sciences and Systems, CISS 2018, Princeton, United States, 3/21/18. https://doi.org/10.1109/CISS.2018.8362256
Wang T, Bucci DJ, Liang Y, Chen B, Varshney PK. Clustering under composite generative models. In 2018 52nd Annual Conference on Information Sciences and Systems, CISS 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 1-6 https://doi.org/10.1109/CISS.2018.8362256
Wang, Tiexing ; Bucci, Donald J. ; Liang, Yingbin ; Chen, Biao ; Varshney, Pramod Kumar. / Clustering under composite generative models. 2018 52nd Annual Conference on Information Sciences and Systems, CISS 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 1-6
@inproceedings{2e72578efe834540ac1f33687a5e9e77,
title = "Clustering under composite generative models",
abstract = "This paper studies clustering of data samples generated from composite distributions using the Kolmogorov-Smirnov (KS) based K-means algorithm. All data sequences are assumed to be generated from unknown continuous distributions. The maximum intra-cluster KS distance of each distribution cluster is assumed to be smaller than the minimum inter-cluster KS distance of different clusters. The analysis of convergence and upper bounds on the error probability are provided for both cases with known and unknown number of clusters. Furthermore, it is shown that the probability of error decays exponentially as the number of samples in each data sequence goes to infinity, and the error exponent is only a function of the difference of the inter-cluster and intra-cluster KS distances. The analysis is validated by simulation results.",
keywords = "composite distributions, K-means algorithm, Kolmogorov-Smirnov distance, probability of error",
author = "Tiexing Wang and Bucci, {Donald J.} and Yingbin Liang and Biao Chen and Varshney, {Pramod Kumar}",
year = "2018",
month = "5",
day = "21",
doi = "10.1109/CISS.2018.8362256",
language = "English (US)",
pages = "1--6",
booktitle = "2018 52nd Annual Conference on Information Sciences and Systems, CISS 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Clustering under composite generative models

AU - Wang, Tiexing

AU - Bucci, Donald J.

AU - Liang, Yingbin

AU - Chen, Biao

AU - Varshney, Pramod Kumar

PY - 2018/5/21

Y1 - 2018/5/21

N2 - This paper studies clustering of data samples generated from composite distributions using the Kolmogorov-Smirnov (KS) based K-means algorithm. All data sequences are assumed to be generated from unknown continuous distributions. The maximum intra-cluster KS distance of each distribution cluster is assumed to be smaller than the minimum inter-cluster KS distance of different clusters. The analysis of convergence and upper bounds on the error probability are provided for both cases with known and unknown number of clusters. Furthermore, it is shown that the probability of error decays exponentially as the number of samples in each data sequence goes to infinity, and the error exponent is only a function of the difference of the inter-cluster and intra-cluster KS distances. The analysis is validated by simulation results.

AB - This paper studies clustering of data samples generated from composite distributions using the Kolmogorov-Smirnov (KS) based K-means algorithm. All data sequences are assumed to be generated from unknown continuous distributions. The maximum intra-cluster KS distance of each distribution cluster is assumed to be smaller than the minimum inter-cluster KS distance of different clusters. The analysis of convergence and upper bounds on the error probability are provided for both cases with known and unknown number of clusters. Furthermore, it is shown that the probability of error decays exponentially as the number of samples in each data sequence goes to infinity, and the error exponent is only a function of the difference of the inter-cluster and intra-cluster KS distances. The analysis is validated by simulation results.

KW - composite distributions

KW - K-means algorithm

KW - Kolmogorov-Smirnov distance

KW - probability of error

UR - http://www.scopus.com/inward/record.url?scp=85048555402&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048555402&partnerID=8YFLogxK

U2 - 10.1109/CISS.2018.8362256

DO - 10.1109/CISS.2018.8362256

M3 - Conference contribution

AN - SCOPUS:85048555402

SP - 1

EP - 6

BT - 2018 52nd Annual Conference on Information Sciences and Systems, CISS 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -