Performing randomized response (RR) over multi-dimensional data is subject to
the curse of dimensionality. As the number of attributes increases, the
exponential growth in the number of attribute-value combinations greatly
impacts the computational cost and the accuracy of the RR estimates. In this
paper, we propose a new multi-dimensional RR scheme that randomizes all
attributes independently, and then aggregates these randomization matrices into
a single aggregated matrix. The multi-dimensional joint probability
distributions are then estimated. The inverse matrix of the aggregated
randomization matrix can be computed efficiently at a lightweight computation
cost (i.e., linear with respect to dimensionality) and with manageable storage
requirements.

To overcome the limitation of accuracy, we propose two extensions to the
baseline protocol, called {em hybrid} and {em truncated} schemes. Finally, we
have conducted experiments using synthetic and major open-source datasets for
various numbers of attributes, domain sizes, and numbers of respondents. The
results using UCI Adult dataset give average distances between the estimated
and the real (2 through 6-way) joint probability are $0.0099$ for {em
truncated} and $0.0155$ for {em hybrid} schemes, whereas they are $0.03$ and
$0.04$ for LoPub, which is the state-of-the-art multi-dimensional LDP scheme.

By admin