###
计算机系统应用英文版:2022,31(10):382-388
本文二维码信息
码上扫一扫!
信用逾期预测中不同机器学习模型对比分析
(中国人民大学 统计学院, 北京 100872)
Comparison Analysis of Different Machine Learning Models in Credit Overdue Prediction
(School of Statistics, Renmin University of China, Beijing 100872, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 643次   下载 1693
Received:January 03, 2022    Revised:January 29, 2022
中文摘要: 当前金融机构正在努力应对不良资产的增长问题, 在信贷领域借贷逾期预测结果的准确性将直接决定不良资产的规模. 为了更好预测借贷人的还款能力, 通常会引入数据模型方法, 但对于数据样本较少的新业务, 单纯用这类数据容易导致模型结果过拟合. 本文通过实际案例分析, 对小样本业务数据进行相似业务数据补充, 并采用随机森林、LightGBM、XGBoost、DNN和TrAdaBoost 迁移学习方法, 旨在为小样本业务在模型建立过程中样本不足的问题提供一种有效的解决方法. 研究结果表明, 针对数据量少的产品, 结合相似金融业务数据后采用这五种机器学习模型方法的预测结果AUC (area under curve)均大于80, 其中使用迁移学习模型比LightGBM、XGBoost、DNN和随机森林模型在预测集上的AUC至少高出2个点; 此外迁移学习模型的预测结果的精准率(88%)和召回率(73%)也是最高的.
Abstract:Financial institutions are currently grappling with the growth of non-performing assets (NPAs). The prediction accuracy of credit overdue directly determines the size of NPAs. For better prediction of repayment ability, data modeling methods are often introduced, which may cause over-fitting for new businesses with small data samples. This study performs case studies and enriches the small data samples by similarity with random forest, LightGBM, XGBoost, DNN, and TrAdaBoost transfer learning. It aims to provide an effective solution to insufficient samples during the model establishment for small sample businesses. The results show that the area under curve (AUC) of the five machine learning models is greater than 80 for small data samples after similar financial business data are integrated. The AUC of TrAdaBoost is at least 2 points higher than that of LightGBM, XGBoost, DNN, and random forest models on the prediction set. In addition, TrAdaBoost stands out with the highest precision (88%) and recall (73%).
文章编号:     中图分类号:    文献标志码:
基金项目:
引用文本:
陈霞.信用逾期预测中不同机器学习模型对比分析.计算机系统应用,2022,31(10):382-388
CHEN Xia.Comparison Analysis of Different Machine Learning Models in Credit Overdue Prediction.COMPUTER SYSTEMS APPLICATIONS,2022,31(10):382-388