Abstract:Financial institutions are currently grappling with the growth of non-performing assets (NPAs). The prediction accuracy of credit overdue directly determines the size of NPAs. For better prediction of repayment ability, data modeling methods are often introduced, which may cause over-fitting for new businesses with small data samples. This study performs case studies and enriches the small data samples by similarity with random forest, LightGBM, XGBoost, DNN, and TrAdaBoost transfer learning. It aims to provide an effective solution to insufficient samples during the model establishment for small sample businesses. The results show that the area under curve (AUC) of the five machine learning models is greater than 80 for small data samples after similar financial business data are integrated. The AUC of TrAdaBoost is at least 2 points higher than that of LightGBM, XGBoost, DNN, and random forest models on the prediction set. In addition, TrAdaBoost stands out with the highest precision (88%) and recall (73%).