Algorithms for Private Data Analysis

Award #: 0729171
Amount Awarded: $282,428
Sponsoring Organization:NSF
Grant Period: 2007-2010
Primary Investigator(s): Adam Smith

Abstract

Data privacy has become a fundamental problem of the modern information infrastructure. Collections of personal and sensitive data, previously the purview of governments and statistical agencies, are now ubiquitous. Increasing volumes of information are collected and archived by health networks, financial organizations, search engines, intrusion detection systems, social networking systems, retailers and other enterprises. Potential social benefits from analyzing these databases are enormous. The main challenge is to learn the properties of a database as a whole while protecting the privacy of individual contributors. This project investigates the design of new algorithms and learning techniques for private data analysis. The focus is on the following research activities: (1) Exploring the ramifications of the Smooth Sensitivity framework. Releasing a function's value privately via this framework requires understanding a combinatorial property of the function, termed the smooth sensitivity. The investigators study the complexity of computing smooth sensitivity for a variety of functions, ranging from statistical summaries to graph properties. (2) Designing generic efficient methods based on sampling that can be applied without computing smooth sensitivity explicitly. (3) Investigating what classes of concepts and distributions can be efficiently learned, in the PAC sense, while maintaining privacy. The most significant social impact of this research will come directly from its technical success: inasmuch as it provides a rigorous basis for limiting privacy breaches and new algorithmic techniques, it will help prevent damaging violations of personal privacy, and contribute to the more effective use of the vast amounts of information currently being accumulated.

Related Publications