TY - JOUR
T1 - Differential privacy of hierarchical Census data
T2 - An optimization approach
AU - Fioretto, Ferdinando
AU - Van Hentenryck, Pascal
AU - Zhu, Keyu
N1 - Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/7
Y1 - 2021/7
N2 - This paper is motivated by applications of a Census Bureau interested in releasing aggregate socio-economic data about a large population without revealing sensitive information about any individual. The released information can be the number of individuals living alone, the number of cars they own, or their salary brackets. Recent events have identified some of the privacy challenges faced by these organizations [1]. To address them, this paper presents a novel differential-privacy mechanism for releasing hierarchical counts of individuals. The counts are reported at multiple granularities (e.g., the national, state, and county levels) and must be consistent across all levels. The core of the mechanism is an optimization model that redistributes the noise introduced to achieve differential privacy in order to meet the consistency constraints between the hierarchical levels. The key technical contribution of the paper shows that this optimization problem can be solved in polynomial time by exploiting the structure of its cost functions. Experimental results on very large, real datasets show that the proposed mechanism provides improvements of up to two orders of magnitude in terms of computational efficiency and accuracy with respect to other state-of-the-art techniques.
AB - This paper is motivated by applications of a Census Bureau interested in releasing aggregate socio-economic data about a large population without revealing sensitive information about any individual. The released information can be the number of individuals living alone, the number of cars they own, or their salary brackets. Recent events have identified some of the privacy challenges faced by these organizations [1]. To address them, this paper presents a novel differential-privacy mechanism for releasing hierarchical counts of individuals. The counts are reported at multiple granularities (e.g., the national, state, and county levels) and must be consistent across all levels. The core of the mechanism is an optimization model that redistributes the noise introduced to achieve differential privacy in order to meet the consistency constraints between the hierarchical levels. The key technical contribution of the paper shows that this optimization problem can be solved in polynomial time by exploiting the structure of its cost functions. Experimental results on very large, real datasets show that the proposed mechanism provides improvements of up to two orders of magnitude in terms of computational efficiency and accuracy with respect to other state-of-the-art techniques.
KW - Census
KW - Constrained optimization
KW - Differential privacy
UR - http://www.scopus.com/inward/record.url?scp=85101733577&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85101733577&partnerID=8YFLogxK
U2 - 10.1016/j.artint.2021.103475
DO - 10.1016/j.artint.2021.103475
M3 - Article
AN - SCOPUS:85101733577
SN - 0004-3702
VL - 296
JO - Artificial Intelligence
JF - Artificial Intelligence
M1 - 103475
ER -