TY - GEN
T1 - On Exponentially Consistency of Linkage-Based Hierarchical Clustering Algorithm Using Kolmogrov-Smirnov Distance
AU - Wang, Tiexing
AU - Liu, Yang
AU - Chen, Biao
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - This paper focuses on performance analysis of linkage-based hierarchical agglomerative clustering algorithms for sequence clustering using the Kolmogrov-Smirnov distance. Data sequences are assumed to be generated from unknown continuous distributions. The goal is to group the data sequences whose underlying generative distributions belong to one cluster without a priori knowledge of both the underlying distributions as well as the number of clusters. Upper bounds on the clustering error probability are derived. The upper bounds help establish the fact that the error probability decays exponentially fast as the sequence length goes to infinity and the obtained error exponent bound has a simple form. Tighter upper bounds on the error probability of single-linkage and complete-linkage algorithms are derived by taking advantage of the simplified metric updating for these two special cases. Simulation results are provided to validate the analysis.
AB - This paper focuses on performance analysis of linkage-based hierarchical agglomerative clustering algorithms for sequence clustering using the Kolmogrov-Smirnov distance. Data sequences are assumed to be generated from unknown continuous distributions. The goal is to group the data sequences whose underlying generative distributions belong to one cluster without a priori knowledge of both the underlying distributions as well as the number of clusters. Upper bounds on the clustering error probability are derived. The upper bounds help establish the fact that the error probability decays exponentially fast as the sequence length goes to infinity and the obtained error exponent bound has a simple form. Tighter upper bounds on the error probability of single-linkage and complete-linkage algorithms are derived by taking advantage of the simplified metric updating for these two special cases. Simulation results are provided to validate the analysis.
KW - Kolmogorov-Smirnov distance
KW - clustering
KW - exponential consistency
KW - hierarchical clustering algorithm.
KW - probability of error
UR - http://www.scopus.com/inward/record.url?scp=85089224617&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089224617&partnerID=8YFLogxK
U2 - 10.1109/ICASSP40776.2020.9053708
DO - 10.1109/ICASSP40776.2020.9053708
M3 - Conference contribution
AN - SCOPUS:85089224617
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 3997
EP - 4001
BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Y2 - 4 May 2020 through 8 May 2020
ER -