基于CatBoost用信预测模型的TreeSHAP解释性研究

doi:10.15888/j.cnki.csa.009003

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月2日 17:33 星期三

首页 > 过刊浏览>2023年第32卷第3期 >338-344. DOI:10.15888/j.cnki.csa.009003

PDF HTML阅读 XML下载导出引用引用提醒

基于CatBoost用信预测模型的TreeSHAP解释性研究
DOI:
                        10.15888/j.cnki.csa.009003
                    
CSTR:
                        
                    
作者:
                        马朔马朔
宁夏大学 信息工程学院, 银川 750021
在期刊界中查找
在百度中查找
在本站中查找
李钊李钊
石嘴山银行股份有限公司 金融大数据实验室, 银川 750011
在期刊界中查找
在百度中查找
在本站中查找
赵军赵军
宁夏大学 前沿交叉学院, 中卫 755099
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(71461025); 宁夏自然科学基金(2020A1166)

Research on Interpretative TreeSHAP Based on CatBoost’s Credit Utilization Prediction Model

Author:

MA Shuo
MA Shuo
School of Information Engineering, Ningxia University, Yinchuan 750021, China
在期刊界中查找
在百度中查找
在本站中查找
LI Zhao
LI Zhao
Laboratory of Financial Big Data, Bank of Shizuishan, Yinchuan 750011, China
在期刊界中查找
在百度中查找
在本站中查找
ZHAO Jun
ZHAO Jun
School of Advanced Interdisciplinary Studies, Ningxia University, Zhongwei 755099, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [12]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

银行客户申请信用贷款在授信通过后, 精准预测客户是否用信及分析影响客户用信的关键因素, 对提高银行客户服务能力及盈利能力具有重要意义. 目前, 机器学习算法鲜有在用信预测方面的应用, 且金融用信领域缺乏模型可解释性的研究, 为此提出一种基于CatBoost的TreeSHAP解释性用信预测模型. 通过CatBoost构建用信预测模型, 利用3种超参数优化算法对该模型进行对比优化, 与基线模型在4项主要性能指标上进行实验对比, 结果表明经TPE算法优化后的模型性能均优于其他模型, 然后结合TreeSHAP方法从全局和局部的层面增强模型的可解释性, 解释性分析客户用信的影响因素, 为银行对客户进行精准化营销提供决策依据.

关键词:用信预测;可解释性;TPE;CatBoost;TreeSHAP;机器学习

Abstract:

It is essential for banks to accurately predict whether clients will use their credit and analyze key factors influencing credit utilization after these clients have been approved for credit, so as to improve their client service level and profitability. Currently, machine learning algorithms are rarely applied to credit utilization prediction, and there is a lack of research on model interpretability in the financial credit utilization field. Therefore, this study proposes an interpretative TreeSHAP credit utilization prediction model based on CatBoost. Specifically, a credit utilization prediction model is constructed by CatBoost and is compared and optimized by using three hyperparameter optimization algorithms. Then, the model is experimentally compared with baseline models in terms of four main performance metrics. The results show that the model optimized by the TPE algorithm outperforms other models. Finally, the interpretability of the model is enhanced locally and globally by the TreeSHAP method. Furthermore, factors influencing client credit utilization are interpretively analyzed, so as to provide a decision-making basis for banks to make accurate marketing to clients.

Key words:credit utilization prediction;interpretability;tree-structured parzen estimator (TPE);CatBoost;TreeSHAP;machine learning

参考文献

[1] 倪政. 基于随机森林的兴农卡农户用信预测模型及应用研究[硕士学位论文]. 武汉: 中南林业科技大学, 2019.

[2] 雷欣南, 林乐凡, 肖斌卿, 等. 小微企业违约特征再探索: 基于SHAP解释方法的机器学习模型. 中国管理科学, 2022: 1–13. [doi: 10.16381/j.cnki.issn1003-207x.2021.0027

[3] 蔡青松, 吴金迪, 白宸宇. 基于可解释集成学习的信贷违约预测. 计算机系统应用, 2021, 30(12): 194–201. [doi: 10.15888/j.cnki.csa.008220

[4] 孔令莹. 基于TPE-LightGBM算法和SHAP值的信贷违约预测[硕士学位论文]. 湘潭: 湘潭大学, 2021.

[5] Chen CF, Lin KC, Rudin C, et al. An interpretable model with globally consistent explanations for credit risk. arXiv:1811.12615, 2018.

[6] Prokhorenkova LO, Gusev G, Vorobev A, et al. CatBoost: Unbiased boosting with categorical features. Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal: Curran Associates Inc., 2018. 6639–6649.

[7] Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 4768–4777.

[8] Lundberg SM, Lee SI. Consistent feature attribution for tree ensembles. arXiv:1706.06060, 2017.

[9] Vinutha HP, Poornima B, Sagar BM. Detection of outliers using interquartile range technique from intrusion dataset. Information and Decision Sciences: Proceedings of the 6th International Conference on FICTA. Singapore: Springer, 2018. 511–518.

[10] Freedman S, Jin GZ. The information value of online social networks: Lessons from peer-to-peer lending. International Journal of Industrial Organization, 2017, 51: 185–222. [doi: 10.1016/j.ijindorg.2016.09.002

[11] Ly A, Marsman M, Wagenmakers EJ. Analytic posteriors for Pearson’s correlation coefficient. Statistica Neerlandica, 2018, 72(1): 4–13. [doi: 10.1111/stan.12111

[12] Erwianda MSF, Kusumawardani SS, Santosa PI, et al. Improving confusion-state classifier model using XGBoost and tree-structured parzen estimator. 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI). Yogyakarta: IEEE, 2019. 309–313.

引用本文

马朔,李钊,赵军.基于CatBoost用信预测模型的TreeSHAP解释性研究.计算机系统应用,2023,32(3):338-344

复制

文章指标

点击次数:698
下载次数: 2183
HTML阅读次数: 2090
引用次数: 0

历史

收稿日期:2022-08-17
最后修改日期:2022-09-15
录用日期:
在线发布日期: 2022-11-29
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码