Abstract
In this paper, we propose methods to cluster groups of two-dimensional data whose mean functions are piecewise linear into several clusters with common characteristics such as the same slopes. To fit segmented line regression models with common features for each possible cluster, we use a restricted least squares method. In implementing the restricted least squares method, we estimate the maximum number of segments in each cluster by using both the permutation test method and the Bayes information criterion method and then propose to use the Bayes information criterion to determine the number of clusters. For a more effective implementation of the clustering algorithm, we propose a measure of the minimum distance worth detecting and illustrate its use in two examples. We summarize simulation results to study properties of the proposed methods and also prove the consistency of the cluster grouping estimated with a given number of clusters. The presentation and examples in this paper focus on the segmented line regression model with the ordered values of the independent variable, which has been the model of interest in cancer trend analysis, but the proposed method can be applied to a general model with design points either ordered or unordered.
Original language | English (US) |
---|---|
Pages (from-to) | 4087-4103 |
Number of pages | 17 |
Journal | Statistics in Medicine |
Volume | 33 |
Issue number | 23 |
DOIs | |
State | Published - Oct 15 2014 |
Keywords
- Bayes information criterion
- clustering
- joinpoint regression
- minimum distance worth detecting
- permutation test
ASJC Scopus subject areas
- Epidemiology
- Statistics and Probability