Show simple item record

Privacy Leaks and Efficient Countermeasures for Human Genetics and Machine Learning

dc.creatorXie, Wei
dc.date.accessioned2020-08-24T11:48:20Z
dc.date.available2020-04-06
dc.date.issued2018-04-06
dc.identifier.urihttps://etd.library.vanderbilt.edu/etd-03222018-105330
dc.identifier.urihttp://hdl.handle.net/1803/15381
dc.description.abstractModern scientific investigations have increasingly relied upon the expanded collection and analysis of data (“big data”). In the genetics community, there is evidence to suggest that increased statistical power can be achieved when genomic and phenotype data are shared beyond their initial points of collection and combined with other resources. In recognition of this opportunity, numerous initiatives such as the database of Genotypes and Phenotypes (dbGaP) have been established to facilitate the dissemination of such data to a wide array of potential users. Meanwhile, the sensitive nature of genome and phenotype data, has raised tremendous privacy concerns due to risks such as revealing personal identity and sensitive disease information. Heated discussion over genetic privacy has led the community to act conservatively in terms of data sharing by restricting data access. This dissertation begins with the introduction of novel methods and findings to breach the privacy of individuals to whom genomic data corresponds. In particular, this dissertation focuses on statistical inference methods to detect when an individual has participated in a genomic study, with a subsequent unveiling of their exact phenotype (disease status or quantitative traits), using publicly accessible information. Next, we recognize that novel technical solutions could help thwart such attacks. Specifically, this dissertation introduces a collection of cryptographic methods to protect patient privacy while supporting common statistical and machine learning models widely used in genetics (such as meta-analysis, logistic regression). It is well-known that cryptographic solutions often incur intense computation and are significantly slower than non-secure models. This is problematic because it limits the likelihood that such methods would be considered plausible for real world adoption. Thus, as a final contribution, this dissertation proposes novel algorithms to accelerate cryptography-based machine learning. Specifically, this dissertation develops several distributed optimization methods to significantly accelerate privacy-preserving distributed machine learning and validates their efficiency and accuracy extensively on large-scale datasets. Such works bridge the gap between distributed machine learning, optimization, and cryptography, and could act as drop-in replacement for many privacy-preserving methods proposed in genetic and machine learning research.
dc.format.mimetypeapplication/pdf
dc.subjectmeta-analysis
dc.subjectcryptography
dc.subjectPrivacy-preserving machine learning
dc.subjectGWAS
dc.subjectGenomic privacy
dc.subjectdistributed optimization
dc.titlePrivacy Leaks and Efficient Countermeasures for Human Genetics and Machine Learning
dc.typedissertation
dc.contributor.committeeMemberDouglas Fisher
dc.contributor.committeeMemberTodd Edwards
dc.contributor.committeeMemberAniruddha Gokhale
dc.contributor.committeeMemberNancy Cox
dc.type.materialtext
thesis.degree.namePHD
thesis.degree.leveldissertation
thesis.degree.disciplineComputer Science
thesis.degree.grantorVanderbilt University
local.embargo.terms2020-04-06
local.embargo.lift2020-04-06
dc.contributor.committeeChairBradley Malin


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record