About PRIVEE Disclosures Utility Profiling STAR

Visualization for Joinable Data Discovery

PRIVEE

Open data sets containing personal information, are primarily published through a release-and-forget model, where data owners and custodians have little to no cognizance to the privacy risks. We address this gap by developing a visual analytic tool PRIVEE that help data defenders gain awareness about disclosure risks in local, joinable data neighborhoods.

Abstract: Open data sets that contain personal information are susceptible to adversarial attacks even when anonymized. By performing lowcost joins on multiple datasets with shared attributes, malicious users of open data portals might get access to information that violates individuals’ privacy. However, open data sets are primarily published using a release-and-forget model, whereby data owners and custodians have little to no cognizance of these privacy risks. We address this critical gap by developing a visual analytic solution that enables data defenders to gain awareness about the disclosure risks in local, joinable data neighborhoods. The solution is derived through a design study with data privacy researchers, where we initially play the role of a red team and engage in an ethical data hacking exercise based on privacy attack scenarios. We use this problem and domain characterization to develop a set of visual analytic interventions as a defense mechanism and realize them in PRIVEE, a visual risk inspection workflow that acts as a proactive monitor for data defenders. PRIVEE uses a combination of risk scores and associated interactive visualizations to let data defenders explore vulnerable joins and interpret risks at multiple levels of data granularity. We demonstrate how PRIVEE can help emulate the attack strategies and diagnose disclosure risks through two case studies with data privacy experts.

[Details]  [Paper]  [Presentation Video]

Disclosures during Red Team Exercise

This paper highlights some of the vulnerabilities discovered during a red team exercise into the open data ecosystem. We report some examples where sensitive details of data subjects are exposed by joining open datasets.

Abstract: The open data ecosystem is susceptible to vulnerabilities due to disclosure risks. Though the datasets are anonymized during release, the prevalence of the release-and-forget model makes the data defenders blind to privacy issues arising after the dataset release. One such issue can be the disclosure risks in the presence of newly released datasets which may compromise the privacy of the data subjects of the anonymous open datasets. In this paper, we first examine some of these pitfalls through the examples we observed during a red teaming exercise and then envision other possible vulnerabilities in this context. We also discuss proactive risk monitoring, including developing a collection of highly susceptible open datasets and a visual analytic workflow that empowers data defenders towards undertaking dynamic risk calibration strategies.

[Details]  [Paper] 

Utility Profiling

We introduce a utility metric to quantify the utility of joined datasets. Our visual analytic workflow helps to identify joinable datasets, prioritize them based on their utility, and inspect the joined dataset

Abstract: The widespread adoption of open datasets across various domains has emphasized the significance of joining and computing their utility. However, the interplay between computation and human interaction is vital for informed decision-making. To address this issue, we first propose a utility metric to calibrate the usefulness of open datasets when joined with other such datasets. Further, we distill this utility metric through a visual analytic framework called VALUE, which empowers the researchers to identify joinable datasets, prioritize them based on their utility, and inspect the joined dataset. This transparent evaluation of the utility of the joined datasets is implemented through a human-in-the-loop approach where the researchers can adapt and refine the selection criteria according to their mental model of utility. Finally, we demonstrate the effectiveness of our approach through a usage scenario using real-world open datasets.

[Details]  [Paper] 

State-of-the-Art Report

In this survey paper, we analyze the methods and techniques used for handling data privacy in visualization. We also reflect on the gaps and future research opportunities in this domain.

Abstract: Preservation of data privacy and protection of sensitive information from potential adversaries constitute a key socio-technical challenge in the modern era of ubiquitous digital transformation. Addressing this challenge needs analysis of multiple factors: algorithmic choices for balancing privacy and loss of utility, potential attack scenarios that can be undertaken by adversaries, implications for data owners, data subjects, and data sharing policies, and access control mechanisms that need to be built into interactive data interfaces. Visualization has a key role to play as part of the solution space, both as a medium of privacy-aware information communication and also as a tool for understanding the link between privacy parameters and data sharing policies. The field of privacy-preserving data visualization has witnessed progress along many of these dimensions. In this state-of-theart report, our goal is to provide a systematic analysis of the approaches, methods, and techniques used for handling data privacy in visualization. We also reflect on the road-map ahead by analyzing the gaps and research opportunities for solving some of the pressing socio-technical challenges involving data privacy with the help of visualization.

[Details]  [Paper]  [Presentation Video]