A clustering-based discretization for supervised learning

Ankit Gupta, Kishan G. Mehrotra, Chilukuri Mohan

Research output: Contribution to journalArticlepeer-review

46 Scopus citations

Abstract

We address the problem of discretization of continuous variables for machine learning classification algorithms. Existing procedures do not use interdependence between the variables towards this goal. Our proposed method uses clustering to exploit such interdependence. Numerical results show that this improves the classification performance in almost all cases. Even if an existing algorithm can successfully operate with continuous variables, better performance is obtained if the variables are first discretized. An additional advantage of discretization is that it reduces the overall computation time.

Original languageEnglish (US)
Pages (from-to)816-824
Number of pages9
JournalStatistics and Probability Letters
Volume80
Issue number9-10
DOIs
StatePublished - 2010

Keywords

  • Binning
  • Clustering
  • Discretization
  • Supervised learning

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'A clustering-based discretization for supervised learning'. Together they form a unique fingerprint.

Cite this