Private Federated Analytics At Scale

Roth, Edo

Repository landing page

oai:repository.upenn.edu:edissertations-6769

Private Federated Analytics At Scale

Authors: Edo Roth
Publication date: 1 January 2022
Publisher: ScholarlyCommons

Abstract

Collecting distributed data from millions of individuals for the purpose of analytics is a common scenario – from Apple collecting typed words and emojis to improve its keyboard suggestions, to Google collecting location data to see how busy restaurants and businesses are. This data is often sensitive, and can be overly revealing about the individuals and communities whose data is being analyzed en masse. Differential privacy has become the gold-standard method to give strong individual privacy guarantees while releasing aggregate statistics about sensitive data. However, the process of computing such statistics can itself be a privacy risk. For instance, a simple approach would be to collect all the raw data at a single central entity, which then computes and releases the statistics. This entity then has to be trusted to not abuse the raw data; in practice, it can be difficult to find an entity with the requisite level of trust. In this thesis, we describe a new approach that uses cryptographic techniques to collect data privately and safely, without placing trust in any party. Although the natural candidates, such as secure multiparty computation (MPC) and fully homomorphic encryption (FHE) do not scale to millions of parties on their own, our key insight is that there are ways to refactor computations in such a way that they can be done using simpler techniques that do scale, such as additively homomorphic encryption. Our solution restructures centralized computations into distributed protocols that can be executed efficiently at scale. The systems we design based on this approach can support billions of participants and can handle a variety of real queries from the literature, including machine learning tasks, Pregel-style graph queries, and queries over large categorical data. We automate the distributed refactoring so that analysts can write the query as if the data were centralized without understanding how the rewriting works, and we protect against malicious parties who aim to poison or bias the results

Similar works

Full text

Open in the Core reader

Download PDF

ScholarlyCommons@Penn

oai:repository.upenn.edu:ediss...

Last time updated on 05/10/2022

This paper was published in ScholarlyCommons@Penn.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.