The data used to train deep neural network (DNN) models in applications such
as healthcare and finance typically contain sensitive information. A DNN model
may suffer from overfitting. Overfitted models have been shown to be
susceptible to query-based attacks such as membership inference attacks (MIAs).
MIAs aim to determine whether a sample belongs to the dataset used to train a
classifier (members) or not (nonmembers). Recently, a new class of label based
MIAs (LAB MIAs) was proposed, where an adversary was only required to have
knowledge of predicted labels of samples. Developing a defense against an
adversary carrying out a LAB MIA on DNN models that cannot be retrained remains
an open problem.
We present LDL, a light weight defense against LAB MIAs. LDL works by
constructing a high-dimensional sphere around queried samples such that the
model decision is unchanged for (noisy) variants of the sample within the
sphere. This sphere of label-invariance creates ambiguity and prevents a
querying adversary from correctly determining whether a sample is a member or a
nonmember. We analytically characterize the success rate of an adversary
carrying out a LAB MIA when LDL is deployed, and show that the formulation is
consistent with experimental observations. We evaluate LDL on seven datasets —
CIFAR-10, CIFAR-100, GTSRB, Face, Purchase, Location, and Texas — with varying
sizes of training data. All of these datasets have been used by SOTA LAB MIAs.
Our experiments demonstrate that LDL reduces the success rate of an adversary
carrying out a LAB MIA in each case. We empirically compare LDL with defenses
against LAB MIAs that require retraining of DNN models, and show that LDL
performs favorably despite not needing to retrain the DNNs.