Relevant Thesis-Based Degree Programs
Graduate Student Supervision
Master's Student Supervision
Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.
The gold standard privacy notion, differential privacy (DP), has gained widespread adoption in academic research, industry products, and government databases due to its mathematically provable privacy guarantee. However, the composability property of DP leads to privacy degradation with multiple accesses to the same data. Differentially private data generation has emerged as a solution, creating synthetic datasets resembling private data while allowing repeated access without additional privacy loss. Existing methods often assume specific use cases for synthetic data, limiting flexibility.This thesis addresses the challenge of producing flexible synthetic data by leveraging deep generative modeling and addressing privacy loss in other methods such as generative adversarial networks (GAN). we propose utilizing public data to learn perceptual features (PFs) for comparing real and synthetic data distributions, employing a non-adversarial generator training scheme based on Maximum Mean Discrepancy (MMD) to mitigate privacy loss.Experimental results reveal the efficacy of our method. it successfully generates samples for CIFAR-10, CelebA, MNIST, and FashionMNIST. Theoretical analysis of our privacy-preserving loss function clarifies the privacy-accuracy trade-offs.
This work introduces two novel kernel-based measures to enforce certain invariance properties in the learned representation space of a deep neural network. The first method, MMD-B-Fair, learns fair representations of data via kernel two-sample testing. It finds neural features of data where a maximum mean discrepancy (MMD) test cannot distinguish between different representations of different sensitive groups, while preserving information about the target variable to be predicted. To minimize the power of an MMD test this method exploits the simple asymptotics of a block testing scheme to address challenges presented by the complex dependency of the test threshold on the estimated MMD. Compared to existing methods on fair representation learning, MMD-B-Fair does not require generative modeling or discriminative architectural tuning, and is able to achieve competitive results on fairness benchmarks and downstream transfer. The second method, CIRCE, introduces a measure of conditional independence for multivariate continuous-valued variables that can be efficiently used as a regularizer to learn deep neural features that are conditionally independent of a known distractor Z given a target label Y. CIRCE requires just a single ridge regression from Y to kernelized features of Z, which can be done in advance. It is then only necessary to enforce independence of the learned neural features from the residuals of this regression. By contrast, earlier measures of conditional dependence require multiple regressions for each step of feature learning, resulting in severe bias and variance, and greater computational cost. CIRCE has superior performance to previous methods on challenging benchmarks, including learning conditionally invariant image features. Python implementations of both methods are made publicly available at github.com/namratadeka/mmd-b-fair and github.com/namratadeka/circe.