Show simple item record

Identifying and Addressing Constraints to Fair De-identification and Data Sharing

dc.contributor.advisorMalin, Bradley A
dc.creatorBrown, J Thomas
dc.date.accessioned2024-05-15T17:22:47Z
dc.date.available2024-05-15T17:22:47Z
dc.date.created2024-05
dc.date.issued2024-03-15
dc.date.submittedMay 2024
dc.identifier.urihttp://hdl.handle.net/1803/18972
dc.description.abstractImproving health outcomes and achieving health equity requires that researchers have access to large and diverse datasets. The Health Insurance Portability and Accountability Act of 1996 and other privacy legislation permit broad dissemination of person-level data that has been de-identified. The process of de-identification involves removing directly identifying information (e.g., names) and transforming the data in a manner that reduces the risk individual patients can be re-identified (e.g., generalizing date of birth to 5-year age ranges). However, for several reasons, traditional de-identification methods cannot support all use cases. First, they were not designed to support public health research amidst an emerging biosurveillance event. They generally rely on retrospective risk assessments, which delay dataset updates. They also fail to flex with changes in infection rates or population demographics over time, which unnecessarily degrades the data’s utility. Second, traditional methods have not prioritized fairness with respect to both privacy protections and group representation. As such, de-identification may disproportionately expose minority groups to re-identification and/or disproportionately degrade their representation in the dataset and subsequently their potential benefit from research. I address both limitations in this dissertation in three parts. First, I develop a framework to dynamically adapt de-identification for near-real time sharing of person-level surveillance data. I show how this framework can support early detection of underlying disparities while reducing patients’ privacy risk. Second, I formalize the tradeoff between equalizing privacy risk and equalizing data utility between records in de- identified data, proving the impossibility to concurrently equalize both in most real-world settings. Finally, I develop a de-identification method that transcends data transformation conventions to enable cooperative privacy protections. In doing so, I show how certain privacy protections can be altruistically donated by majority groups’ records such that minority groups’ records retain greater data utility than that provided by standard de-identification methods. Collectively, this work identifies and addresses several constraints to share data in a way that both protects privacy and preserves the representation of the full population. The constraints motivate the need for, and should guide the development of, innovative data sharing solutions that supports society’s pursuit of health equity.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectdata sharing
dc.subjectdata privacy
dc.subjectfairness
dc.subjectde-identification
dc.subjectanonymization
dc.subjectpandemic surveillance
dc.subjecthealth equity
dc.subjectalgorithmic fairness
dc.titleIdentifying and Addressing Constraints to Fair De-identification and Data Sharing
dc.typeThesis
dc.date.updated2024-05-15T17:22:47Z
dc.type.materialtext
thesis.degree.namePhD
thesis.degree.levelDoctoral
thesis.degree.disciplineBiomedical Informatics
thesis.degree.grantorVanderbilt University Graduate School
dc.creator.orcid0000-0001-9252-2559
dc.contributor.committeeChairMalin, Bradley A


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record