TY - GEN
T1 - On the approximability of geometric and geographic generalization and the min-max bin covering problem
AU - Du, Wenliang
AU - Eppstein, David
AU - Goodrich, Michael T.
AU - Lueker, George S.
PY - 2009
Y1 - 2009
N2 - We study the problem of abstracting a table of data about individuals so that no selection query can identify fewer than k individuals. We show that it is impossible to achieve arbitrarily good polynomial-time approximations for a number of natural variations of the generalization technique, unless P = NP, even when the table has only a single quasi-identifying attribute that represents a geographic or unordered attribute: - Zip-codes: nodes of a planar graph generalized into connected subgraphs - GPS coordinates: points in R 2 generalized into non-overlapping rectangles - Unordered data: text labels that can be grouped arbitrarily. These hard single-attribute instances of generalization problems contrast with the previously known NP-hard instances, which require the number of attributes to be proportional to the number of individual records (the rows of the table). In addition to impossibility results, we provide approximation algorithms for these difficult single-attribute generalization problems, which, of course, apply to multiple-attribute instances with one that is quasi-identifying. Incidentally, the generalization problem for unordered data can be viewed as a novel type of bin packing problem-min-max bin covering-which may be of independent interest.
AB - We study the problem of abstracting a table of data about individuals so that no selection query can identify fewer than k individuals. We show that it is impossible to achieve arbitrarily good polynomial-time approximations for a number of natural variations of the generalization technique, unless P = NP, even when the table has only a single quasi-identifying attribute that represents a geographic or unordered attribute: - Zip-codes: nodes of a planar graph generalized into connected subgraphs - GPS coordinates: points in R 2 generalized into non-overlapping rectangles - Unordered data: text labels that can be grouped arbitrarily. These hard single-attribute instances of generalization problems contrast with the previously known NP-hard instances, which require the number of attributes to be proportional to the number of individual records (the rows of the table). In addition to impossibility results, we provide approximation algorithms for these difficult single-attribute generalization problems, which, of course, apply to multiple-attribute instances with one that is quasi-identifying. Incidentally, the generalization problem for unordered data can be viewed as a novel type of bin packing problem-min-max bin covering-which may be of independent interest.
UR - http://www.scopus.com/inward/record.url?scp=69949166633&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=69949166633&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-03367-4_22
DO - 10.1007/978-3-642-03367-4_22
M3 - Conference contribution
AN - SCOPUS:69949166633
SN - 3642033660
SN - 9783642033667
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 242
EP - 253
BT - Algorithms and Data Structures - 11th International Symposium, WADS 2009, Proceedings
T2 - 11th International Symposium on Algorithms and Data Structures, WADS 2009
Y2 - 21 August 2009 through 23 August 2009
ER -