Deriving private information from randomized data

Zhengli Huang, Wenliang Du, Biao Chen

Research output: Contribution to journalConference Articlepeer-review

316 Scopus citations


Randomization has emerged as a useful technique for data disguising in privacy-preserving data mining. Its privacy properties have been studied in a number of papers. Kargupta et al. challenged the randomization schemes, and they pointed out that randomization might not be able to preserve privacy. However, it is still unclear what factors cause such a security breach, how they affect the privacy preserving property of the randomization, and what kinds of data have higher risk of disclosing their private contents even though they are randomized. We believe that the key factor is the correlations among attributes. We propose two data reconstruction methods that are based on data correlations. One method uses the Principal Component Analysis (PCA) technique, and the other method uses the Bayes Estimate (BE) technique. We have conducted theoretical and experimental analysis on the relationship between data correlations and the amount of private information that can be disclosed based our proposed data reconstructions schemes. Our studies have shown that when the correlations are high, the original data can be reconstructed more accurately, i.e., more private information can be disclosed. To improve privacy, we propose a modified randomization scheme, in which we let the correlation of random noises "similar" to the original data. Our results have shown that the reconstruction accuracy of both PCA-based and BE-based schemes become worse as the similarity increases.

Original languageEnglish (US)
Pages (from-to)37-48
Number of pages12
JournalProceedings of the ACM SIGMOD International Conference on Management of Data
StatePublished - 2005
EventSIGMOD 2005: ACM SIGMOD International Conference on Management of Data - Baltimore, MD, United States
Duration: Jun 14 2005Jun 16 2005


  • Bayes Estimate
  • PCA
  • Privacy-Preserving Data Mining
  • Randomization

ASJC Scopus subject areas

  • Software
  • Information Systems


Dive into the research topics of 'Deriving private information from randomized data'. Together they form a unique fingerprint.

Cite this