Identifying and Addressing Constraints to Fair De-identification and Data Sharing

Brown, J Thomas

Identifying and Addressing Constraints to Fair De-identification and Data Sharing

dc.contributor.advisor	Malin, Bradley A
dc.creator	Brown, J Thomas
dc.date.accessioned	2024-05-15T17:22:47Z
dc.date.available	2024-05-15T17:22:47Z
dc.date.created	2024-05
dc.date.issued	2024-03-15
dc.date.submitted	May 2024
dc.identifier.uri	http://hdl.handle.net/1803/18972
dc.description.abstract	Improving health outcomes and achieving health equity requires that researchers have access to large and diverse datasets. The Health Insurance Portability and Accountability Act of 1996 and other privacy legislation permit broad dissemination of person-level data that has been de-identified. The process of de-identification involves removing directly identifying information (e.g., names) and transforming the data in a manner that reduces the risk individual patients can be re-identified (e.g., generalizing date of birth to 5-year age ranges). However, for several reasons, traditional de-identification methods cannot support all use cases. First, they were not designed to support public health research amidst an emerging biosurveillance event. They generally rely on retrospective risk assessments, which delay dataset updates. They also fail to flex with changes in infection rates or population demographics over time, which unnecessarily degrades the data’s utility. Second, traditional methods have not prioritized fairness with respect to both privacy protections and group representation. As such, de-identification may disproportionately expose minority groups to re-identification and/or disproportionately degrade their representation in the dataset and subsequently their potential benefit from research. I address both limitations in this dissertation in three parts. First, I develop a framework to dynamically adapt de-identification for near-real time sharing of person-level surveillance data. I show how this framework can support early detection of underlying disparities while reducing patients’ privacy risk. Second, I formalize the tradeoff between equalizing privacy risk and equalizing data utility between records in de- identified data, proving the impossibility to concurrently equalize both in most real-world settings. Finally, I develop a de-identification method that transcends data transformation conventions to enable cooperative privacy protections. In doing so, I show how certain privacy protections can be altruistically donated by majority groups’ records such that minority groups’ records retain greater data utility than that provided by standard de-identification methods. Collectively, this work identifies and addresses several constraints to share data in a way that both protects privacy and preserves the representation of the full population. The constraints motivate the need for, and should guide the development of, innovative data sharing solutions that supports society’s pursuit of health equity.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	data sharing
dc.subject	data privacy
dc.subject	fairness
dc.subject	de-identification
dc.subject	anonymization
dc.subject	pandemic surveillance
dc.subject	health equity
dc.subject	algorithmic fairness
dc.title	Identifying and Addressing Constraints to Fair De-identification and Data Sharing
dc.type	Thesis
dc.date.updated	2024-05-15T17:22:47Z
dc.type.material	text
thesis.degree.name	PhD
thesis.degree.level	Doctoral
thesis.degree.discipline	Biomedical Informatics
thesis.degree.grantor	Vanderbilt University Graduate School
dc.creator.orcid	0000-0001-9252-2559
dc.contributor.committeeChair	Malin, Bradley A

Files in this item

Name:: BROWN-DISSERTATION-2024.pdf
Size:: 16.28Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Electronic Theses and Dissertations
Electronic theses and dissertations of masters and doctoral students submitted to the Graduate School.

Show simple item record