Practical k-Anonymity on large datasets
Podgursky, Benjamin T
The implicit contract between an individual and a website is that a viewer will remain anonymous unless they choose to identify themselves. On the other hand, there are many advantages to allowing websites to tailor content to viewers based on hints about the person's likely interests and habits. However, as people spend increasing amounts of time engaged networked and online, the line between a person's online presence and their offline identity has blurred. Ideally the goals of providing personalized internet content and the implicit contract of net-anonymity can be reconciled. This thesis studies what research from the field of privacy preserving data publishing can be used to use offline data anonymously for web personalization. The anonymity models of k-Anonymity and (k,1)-Anonymity, or k-Unlinkability, turn out to be promising models to study for this problem, and this work studies how to anonymize insight data using these models. Rapleaf is a company that helps websites personalize their content, with the goal of anonymizing content while still keeping the data specific enough to be insightful. Rapleaf's personalization dataset is used as a case study for investigating the challenges associated with anonymizing one of these datasets. It is hoped that through the findings reported here web data can be anonymized while remaining useful, and that organizations will be encouraged to view anonymity and insight as goals that can be equitably balanced, rather than being mutually exclusive.