Abstract:Prediction based on historical data has become essential in many fields, such as environmental management and urban transportation. Prediction accuracy plays a key role in practical production, scheduling, and other tasks. However, due to natural or human factors, some data exhibits high volatility and uncertainty, unable to fully achieve the potential of prediction models. Taking the sediment concentration prediction during the non-ice period as a case study, this study explores optimization methods for predicting high-volatility data. The results show that the feature selection optimization based on the Shapley additive explanations (SHAP), the data smoothing, and early-stage clustering can reduce prediction error of high-volatility data. The mean absolute error (MAE) decreases from 1.502 in the initial model to 0.194, and data smoothing shows the most significant optimization effect with a reduction of 76.51% in MAE. However, the increasing smoothing order results in poorer prediction results, which is because the subsequent rising exponentiation order correspondingly leads to an exponential increase in error. Additionally, employing clustering results as feature inputs can “guide” the parameter learning of multi-layer perceptron.