Privacy-Preserving Sharing of High-Dimensional Data based on Computational Game Theory
In the big data era, person-specific data are being collected in an unprecedented manner. Given the potential wealth of insights in personal data, many organizations aim to share data while protecting privacy by sharing de-identified data, but are concerned because various demonstrations show such data can be re-identified. A wide array of deterrents have been designed to mitigate concerns, some of which are technical (e.g., obfuscating data), while others are more social (e.g., legal contracts). However, these investigations have focused on worst-case scenarios and spurred the adoption of data sharing practices that unnecessarily impede research. A formal re-identification risk assessment is required to help data sharers make better decisions about how to share data. Game-theoretic approaches, which model rational interactions among the parties involved, can optimally balance utility and risks in data sharing scenarios. I utilize a game-theoretic lens to develop more effective, quantifiable protections for data sharing. This is a fundamentally different approach because it accounts for adversarial behavior and capabilities and tailors protections to anticipated recipients with reasonable resources. I demonstrate this approach with large-scale real-world genomic datasets and show risks can be balanced against utility more effectively than traditional approaches. Confronting high dimensionality in practical scenarios, I develop AI algorithms to accelerate the solution search. I find it is possible to achieve zero risk, in that the recipient never gains from re-identification, while sharing almost as much data as the optimal solution that allows for a small amount of risk. Recognizing that such models are dependent on a variety of parameters, I perform extensive sensitivity analyses to show that my findings are robust to their fluctuations. My dissertation focuses on answering theoretical questions about the privacy-preserving data sharing problems in multi-stage adversarial scenarios and designing practical algorithms for game-solving in high-dimensional environments. I tailor my approaches for building scalable systems demanded by modern big data applications. The game-theoretic methodology that I examine using demographic, genomic, and phenotypic data has the potential to be applied to other data types and be regarded as a general data protection methodology.