Privacy Leaks and Efficient Countermeasures for Human Genetics and Machine Learning

Xie, Wei

Privacy Leaks and Efficient Countermeasures for Human Genetics and Machine Learning

dc.creator	Xie, Wei
dc.date.accessioned	2020-08-24T11:48:20Z
dc.date.available	2020-04-06
dc.date.issued	2018-04-06
dc.identifier.uri	https://etd.library.vanderbilt.edu/etd-03222018-105330
dc.identifier.uri	http://hdl.handle.net/1803/15381
dc.description.abstract	Modern scientific investigations have increasingly relied upon the expanded collection and analysis of data (“big data”). In the genetics community, there is evidence to suggest that increased statistical power can be achieved when genomic and phenotype data are shared beyond their initial points of collection and combined with other resources. In recognition of this opportunity, numerous initiatives such as the database of Genotypes and Phenotypes (dbGaP) have been established to facilitate the dissemination of such data to a wide array of potential users. Meanwhile, the sensitive nature of genome and phenotype data, has raised tremendous privacy concerns due to risks such as revealing personal identity and sensitive disease information. Heated discussion over genetic privacy has led the community to act conservatively in terms of data sharing by restricting data access. This dissertation begins with the introduction of novel methods and findings to breach the privacy of individuals to whom genomic data corresponds. In particular, this dissertation focuses on statistical inference methods to detect when an individual has participated in a genomic study, with a subsequent unveiling of their exact phenotype (disease status or quantitative traits), using publicly accessible information. Next, we recognize that novel technical solutions could help thwart such attacks. Specifically, this dissertation introduces a collection of cryptographic methods to protect patient privacy while supporting common statistical and machine learning models widely used in genetics (such as meta-analysis, logistic regression). It is well-known that cryptographic solutions often incur intense computation and are significantly slower than non-secure models. This is problematic because it limits the likelihood that such methods would be considered plausible for real world adoption. Thus, as a final contribution, this dissertation proposes novel algorithms to accelerate cryptography-based machine learning. Specifically, this dissertation develops several distributed optimization methods to significantly accelerate privacy-preserving distributed machine learning and validates their efficiency and accuracy extensively on large-scale datasets. Such works bridge the gap between distributed machine learning, optimization, and cryptography, and could act as drop-in replacement for many privacy-preserving methods proposed in genetic and machine learning research.
dc.format.mimetype	application/pdf
dc.subject	meta-analysis
dc.subject	cryptography
dc.subject	Privacy-preserving machine learning
dc.subject	GWAS
dc.subject	Genomic privacy
dc.subject	distributed optimization
dc.title	Privacy Leaks and Efficient Countermeasures for Human Genetics and Machine Learning
dc.type	dissertation
dc.contributor.committeeMember	Douglas Fisher
dc.contributor.committeeMember	Todd Edwards
dc.contributor.committeeMember	Aniruddha Gokhale
dc.contributor.committeeMember	Nancy Cox
dc.type.material	text
thesis.degree.name	PHD
thesis.degree.level	dissertation
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Vanderbilt University
local.embargo.terms	2020-04-06
local.embargo.lift	2020-04-06
dc.contributor.committeeChair	Bradley Malin

Files in this item

Name:: XieDissertation18.pdf
Size:: 2.735Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Electronic Theses and Dissertations
Electronic theses and dissertations of masters and doctoral students submitted to the Graduate School.

Show simple item record