Privacy-Preserving Sharing of High-Dimensional Data based on Computational Game Theory

Wan, Zhiyu

Privacy-Preserving Sharing of High-Dimensional Data based on Computational Game Theory

dc.contributor.advisor	Malin, Bradley A.
dc.creator	Wan, Zhiyu
dc.date.accessioned	2020-12-29T15:30:29Z
dc.date.available	2020-12-29T15:30:29Z
dc.date.created	2020-12
dc.date.issued	2020-11-18
dc.date.submitted	December 2020
dc.identifier.uri	http://hdl.handle.net/1803/16396
dc.description.abstract	In the big data era, person-specific data are being collected in an unprecedented manner. Given the potential wealth of insights in personal data, many organizations aim to share data while protecting privacy by sharing de-identified data, but are concerned because various demonstrations show such data can be re-identified. A wide array of deterrents have been designed to mitigate concerns, some of which are technical (e.g., obfuscating data), while others are more social (e.g., legal contracts). However, these investigations have focused on worst-case scenarios and spurred the adoption of data sharing practices that unnecessarily impede research. A formal re-identification risk assessment is required to help data sharers make better decisions about how to share data. Game-theoretic approaches, which model rational interactions among the parties involved, can optimally balance utility and risks in data sharing scenarios. I utilize a game-theoretic lens to develop more effective, quantifiable protections for data sharing. This is a fundamentally different approach because it accounts for adversarial behavior and capabilities and tailors protections to anticipated recipients with reasonable resources. I demonstrate this approach with large-scale real-world genomic datasets and show risks can be balanced against utility more effectively than traditional approaches. Confronting high dimensionality in practical scenarios, I develop AI algorithms to accelerate the solution search. I find it is possible to achieve zero risk, in that the recipient never gains from re-identification, while sharing almost as much data as the optimal solution that allows for a small amount of risk. Recognizing that such models are dependent on a variety of parameters, I perform extensive sensitivity analyses to show that my findings are robust to their fluctuations. My dissertation focuses on answering theoretical questions about the privacy-preserving data sharing problems in multi-stage adversarial scenarios and designing practical algorithms for game-solving in high-dimensional environments. I tailor my approaches for building scalable systems demanded by modern big data applications. The game-theoretic methodology that I examine using demographic, genomic, and phenotypic data has the potential to be applied to other data types and be regarded as a general data protection methodology.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Data sharing
dc.subject	Privacy
dc.subject	Re-identification
dc.subject	Risk assessment
dc.subject	Game theory
dc.subject	Genomic data
dc.subject	Summary statistics
dc.subject	Adversarial modeling
dc.subject	Genetic algorithm
dc.subject	Sensitivity analysis
dc.title	Privacy-Preserving Sharing of High-Dimensional Data based on Computational Game Theory
dc.type	Thesis
dc.date.updated	2020-12-29T15:30:30Z
dc.type.material	text
thesis.degree.name	PhD
thesis.degree.level	Doctoral
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Vanderbilt University Graduate School
dc.creator.orcid	0000-0003-3752-5778

Files in this item

Name:: WAN-DISSERTATION-2020.pdf
Size:: 10.61Mb
Format:: PDF

View/Open

Name:: WAN-DISSERTATION_FINAL.docx
Size:: 12.36Mb
Format:: Microsoft Word 2007

View/Open

This item appears in the following Collection(s)

Electronic Theses and Dissertations
Electronic theses and dissertations of masters and doctoral students submitted to the Graduate School.

Show simple item record