信用逾期预测中不同机器学习模型对比分析

doi:10.15888/j.cnki.csa.008724

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月9日 11:08 星期三

首页 > 过刊浏览>2022年第31卷第10期 >382-388. DOI:10.15888/j.cnki.csa.008724

PDF HTML阅读 XML下载导出引用引用提醒

信用逾期预测中不同机器学习模型对比分析
DOI:
                        10.15888/j.cnki.csa.008724
                    
CSTR:
                        
                    
作者:
                        陈霞陈霞
中国人民大学 统计学院, 北京 100872
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

Comparison Analysis of Different Machine Learning Models in Credit Overdue Prediction

Author:

CHEN Xia
CHEN Xia
School of Statistics, Renmin University of China, Beijing 100872, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

当前金融机构正在努力应对不良资产的增长问题, 在信贷领域借贷逾期预测结果的准确性将直接决定不良资产的规模. 为了更好预测借贷人的还款能力, 通常会引入数据模型方法, 但对于数据样本较少的新业务, 单纯用这类数据容易导致模型结果过拟合. 本文通过实际案例分析, 对小样本业务数据进行相似业务数据补充, 并采用随机森林、LightGBM、XGBoost、DNN和TrAdaBoost 迁移学习方法, 旨在为小样本业务在模型建立过程中样本不足的问题提供一种有效的解决方法. 研究结果表明, 针对数据量少的产品, 结合相似金融业务数据后采用这五种机器学习模型方法的预测结果AUC (area under curve)均大于80, 其中使用迁移学习模型比LightGBM、XGBoost、DNN和随机森林模型在预测集上的AUC至少高出2个点; 此外迁移学习模型的预测结果的精准率(88%)和召回率(73%)也是最高的.

关键词:小样本;信贷业务;逾期风险;机器学习模型;风险预测

Abstract:

Financial institutions are currently grappling with the growth of non-performing assets (NPAs). The prediction accuracy of credit overdue directly determines the size of NPAs. For better prediction of repayment ability, data modeling methods are often introduced, which may cause over-fitting for new businesses with small data samples. This study performs case studies and enriches the small data samples by similarity with random forest, LightGBM, XGBoost, DNN, and TrAdaBoost transfer learning. It aims to provide an effective solution to insufficient samples during the model establishment for small sample businesses. The results show that the area under curve (AUC) of the five machine learning models is greater than 80 for small data samples after similar financial business data are integrated. The AUC of TrAdaBoost is at least 2 points higher than that of LightGBM, XGBoost, DNN, and random forest models on the prediction set. In addition, TrAdaBoost stands out with the highest precision (88%) and recall (73%).

Key words:small sample;credit business;delinquency risk;machine learning models;risk prediction

引用本文

陈霞.信用逾期预测中不同机器学习模型对比分析.计算机系统应用,2022,31(10):382-388

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-01-03
最后修改日期:2022-01-29
录用日期:
在线发布日期: 2022-06-28
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码