Abstract:The release and analysis of multidimensional data can produce great value. However, privacy disclosure often occurs in the data collection phase. The traditional centralized differential privacy protection method requires a completely trusted third-party data collector, which is quite difficult to be found in practice. With the increase in attribute dimensions, the refinement of data collectors (the calculation of joint distribution) has also become an urgent problem to be solved. To address the above problems, this study proposes a localized differential privacy protection algorithm (RR-LDP) for multi-valued data. Unary coding and instantaneous random response technique are introduced to protect personal privacy in the data collection phase, which reduce communication overhead. With the combination of expectation maximization (EM) algorithm and LASSO regression model, the study puts forward an efficient joint distribution estimation algorithm (LREMH) for multidimensional data, which meets the requirement of LDP. The algorithm uses the LASSO regression model to estimate the initial value and employs the EM algorithm for iterative calculation. Theoretical analysis and experimental results show that the LREMH algorithm achieves a balance between accuracy and efficiency.