Abstract:In this study, we propose a forecasting model based on K-means clustering and a machine learning regression algorithm for the sales forecasting of multiple commodities in the retail industry. First, we utilize the clustering technique to identify commodities with similar sales patterns and then divide the whole dataset into different groups. Subsequently, three machine learning regression algorithms, i.e., support vector regression, random forest and XGBoost models, are trained on each sub-dataset. The data size for model training and the scope of forecasting variables are increased by the construction of a data pool. The proposed models are verified on a real sales dataset of a retail company. The experimental results show that the forecasting model based on K-means and support vector regression performs the best, and the forecasting performance of the proposed models is significantly better than that of the benchmark models and the machine learning models without using clustering.