A hybrid multi-group approach for privacy-preserving data mining

Zhouxuan Teng, Wenliang Du

Research output: Contribution to journalArticlepeer-review

19 Scopus citations

Abstract

In this paper, we propose a hybrid multi-group approach for privacy preserving data mining. We make two contributions in this paper. First, we propose a hybrid approach. Previous work has used either the randomization approach or the secure multi-party computation (SMC) approach. However, these two approaches have complementary features: the randomization approach is much more efficient but less accurate, while the SMC approach is less efficient but more accurate. We propose a novel hybrid approach, which takes advantage of the strength of both approaches to balance the accuracy and efficiency constraints. Compared to the two existing approaches, our proposed approach can achieve much better accuracy than randomization approach and much reduced computation cost than SMC approach. We also propose a multi-group scheme that makes it flexible for the data miner to control the balance between data mining accuracy and privacy. This scheme is motivated by the fact that existing randomization schemes that randomize data at individual attribute level can produce insufficient accuracy when the number of dimensions is high. We partition attributes into groups, and develop a scheme to conduct group-based randomization to achieve better data mining accuracy. To demonstrate the effectiveness of the proposed general schemes, we have implemented them for the ID3 decision tree algorithm and association rule mining problem and we also present experimental results.

Original languageEnglish (US)
Pages (from-to)133-157
Number of pages25
JournalKnowledge and Information Systems
Volume19
Issue number2
DOIs
StatePublished - May 2009

Keywords

  • Hybrid
  • Privacy
  • Randomization
  • SMC

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Human-Computer Interaction
  • Hardware and Architecture
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A hybrid multi-group approach for privacy-preserving data mining'. Together they form a unique fingerprint.

Cite this