PowerLearn: Differentially private and model agnostic tabular data embeddings

1Mohamed bin Zayed University of Artificial Intelligence, UAE
2 Massachusetts Institute of Technology, USA

Abstract

Traditional collaborative learning approaches are based on sharing of model weights between clients and a server. However, there are advantages to resource efficiency through schemes based on sharing of activations. Several differentially private methods were developed for sharing of weights while such mechanisms do not exist so far for sharing of activations. We propose Power-Learning to learn a privacy encoding network in conjunction with a small utility generation network such that the final activations generated from it are equipped with formal differential privacy guarantees. These privatized activations are then shared with a more powerful server, that learns a post-processing that results in a higher accuracy for machine learning tasks. We show that our co-design of collaborative and private learning results in requiring only one round of privatized communication and lesser compute on the client than traditional methods. The privatized activations that we share from the client are agnostic to the type of model used on the server in order to process these activations to complete a task.

System Overview

PowerLearn System Overview
Schematic illustration of the Power-Learning for distributed and private learning, that theoretically calibrates and measures the obtained level of ϵ and δ for differential privacy. This calibration is done after the minimization of a specifically proposed privacy loss that is minimized in regularization with the machine learning utility loss.

Contributions

  • First framework for sharing neural network activations with formal differential privacy guarantees
  • Novel regularized learning scheme providing theoretical privacy guarantees for shared activations
  • Resource-efficient privacy with significant computational and communication efficiency improvements
  • Model-agnostic approach allowing servers to employ any machine learning method

Method

Our method transforms private data into embeddings through a carefully designed two-step process:

PowerLearn Algorithm

Main Results