TY - JOUR
T1 - A new approach to data differential privacy based on regression models under heteroscedasticity with applications to machine learning repository data
AU - Manchini, Carlos
AU - Ospina, Raydonal
AU - Leiva, Víctor
AU - Martin-Barreiro, Carlos
N1 - Publisher Copyright:
© 2022 Elsevier Inc.
PY - 2023/5
Y1 - 2023/5
N2 - Generation of massive data in the digital age leads to possible violations of individual privacy. The search for personal data becomes an increasingly recurrent exposure today. The present work corresponds to the area of differential privacy, which guarantees data confidentiality and robustness against invasive identification attacks. This area stands out in the literature for its rigorous mathematical basis capable of quantifying the loss of privacy. A differentially private method based on regression models was developed to prevent inversion attacks while retaining model efficacy. In this paper, we propose a novel approach to improve the data privacy based on regression models under heteroscedasticity, a common aspect, but not studied, in practical situations of differential privacy. The influence of privacy restriction on the statistical performance of the estimators of model parameters is evaluated using Monte Carlo simulations, including a study of performance associated with test rejection rates for the proposed approach. The results of the numerical evaluation show high inferential distortion for stricter privacy restrictions. Empirical illustrations with real-world data are presented to show potential applications.
AB - Generation of massive data in the digital age leads to possible violations of individual privacy. The search for personal data becomes an increasingly recurrent exposure today. The present work corresponds to the area of differential privacy, which guarantees data confidentiality and robustness against invasive identification attacks. This area stands out in the literature for its rigorous mathematical basis capable of quantifying the loss of privacy. A differentially private method based on regression models was developed to prevent inversion attacks while retaining model efficacy. In this paper, we propose a novel approach to improve the data privacy based on regression models under heteroscedasticity, a common aspect, but not studied, in practical situations of differential privacy. The influence of privacy restriction on the statistical performance of the estimators of model parameters is evaluated using Monte Carlo simulations, including a study of performance associated with test rejection rates for the proposed approach. The results of the numerical evaluation show high inferential distortion for stricter privacy restrictions. Empirical illustrations with real-world data are presented to show potential applications.
KW - Anonymity
KW - Confidentiality
KW - Data breach and fitting
KW - Linear and logistic regressions
KW - Monte Carlo simulation
KW - Perturbations of data
KW - Statistical consistency and modeling
UR - http://www.scopus.com/inward/record.url?scp=85142765536&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2022.10.076
DO - 10.1016/j.ins.2022.10.076
M3 - Article
AN - SCOPUS:85142765536
SN - 0020-0255
VL - 627
SP - 280
EP - 300
JO - Information Sciences
JF - Information Sciences
ER -