Abstract:The problem of the absence of attribute data often occurs in soil analysis and research. To improve the reliability of the research results, it is necessary to study the imputation methods for soil attribute missing data. In this study, a variety of imputation methods have been evaluated to interpolate the soil attribute missing data from the perspective of data mining. Using soil attribute pH as an interpolation object, the Soil Nutrient Database of China’s Major Ecosystems is used as the source of physical and chemical soil attribute data. We evaluate the performance of each method on the dataset of different missing rates in terms of model fitting and imputation error. The result shows that it is feasible to impute soil attribute pH missing data using the optimal parameter K-Nearest Neighbor (KNN) and random forest than other methods, such as multivariable regression, support vector machine, and neural network. The mean value of MAE、RMSE and R2 of the imputed missing data pH of KNN and random forest on the dataset with different missing rates are 0.132 and 0.131, 0. 174 and 0.178, 0.775 and 0.765, respectively.