基于CatBoost用信预测模型的TreeSHAP解释性研究
作者:
基金项目:

国家自然科学基金(71461025); 宁夏自然科学基金(2020A1166)


Research on Interpretative TreeSHAP Based on CatBoost’s Credit Utilization Prediction Model
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [12]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    银行客户申请信用贷款在授信通过后, 精准预测客户是否用信及分析影响客户用信的关键因素, 对提高银行客户服务能力及盈利能力具有重要意义. 目前, 机器学习算法鲜有在用信预测方面的应用, 且金融用信领域缺乏模型可解释性的研究, 为此提出一种基于CatBoost的TreeSHAP解释性用信预测模型. 通过CatBoost构建用信预测模型, 利用3种超参数优化算法对该模型进行对比优化, 与基线模型在4项主要性能指标上进行实验对比, 结果表明经TPE算法优化后的模型性能均优于其他模型, 然后结合TreeSHAP方法从全局和局部的层面增强模型的可解释性, 解释性分析客户用信的影响因素, 为银行对客户进行精准化营销提供决策依据.

    Abstract:

    It is essential for banks to accurately predict whether clients will use their credit and analyze key factors influencing credit utilization after these clients have been approved for credit, so as to improve their client service level and profitability. Currently, machine learning algorithms are rarely applied to credit utilization prediction, and there is a lack of research on model interpretability in the financial credit utilization field. Therefore, this study proposes an interpretative TreeSHAP credit utilization prediction model based on CatBoost. Specifically, a credit utilization prediction model is constructed by CatBoost and is compared and optimized by using three hyperparameter optimization algorithms. Then, the model is experimentally compared with baseline models in terms of four main performance metrics. The results show that the model optimized by the TPE algorithm outperforms other models. Finally, the interpretability of the model is enhanced locally and globally by the TreeSHAP method. Furthermore, factors influencing client credit utilization are interpretively analyzed, so as to provide a decision-making basis for banks to make accurate marketing to clients.

    参考文献
    [1] 倪政. 基于随机森林的兴农卡农户用信预测模型及应用研究[硕士学位论文]. 武汉: 中南林业科技大学, 2019.
    [2] 雷欣南, 林乐凡, 肖斌卿, 等. 小微企业违约特征再探索: 基于SHAP解释方法的机器学习模型. 中国管理科学, 2022: 1–13. [doi: 10.16381/j.cnki.issn1003-207x.2021.0027
    [3] 蔡青松, 吴金迪, 白宸宇. 基于可解释集成学习的信贷违约预测. 计算机系统应用, 2021, 30(12): 194–201. [doi: 10.15888/j.cnki.csa.008220
    [4] 孔令莹. 基于TPE-LightGBM算法和SHAP值的信贷违约预测[硕士学位论文]. 湘潭: 湘潭大学, 2021.
    [5] Chen CF, Lin KC, Rudin C, et al. An interpretable model with globally consistent explanations for credit risk. arXiv:1811.12615, 2018.
    [6] Prokhorenkova LO, Gusev G, Vorobev A, et al. CatBoost: Unbiased boosting with categorical features. Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal: Curran Associates Inc., 2018. 6639–6649.
    [7] Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 4768–4777.
    [8] Lundberg SM, Lee SI. Consistent feature attribution for tree ensembles. arXiv:1706.06060, 2017.
    [9] Vinutha HP, Poornima B, Sagar BM. Detection of outliers using interquartile range technique from intrusion dataset. Information and Decision Sciences: Proceedings of the 6th International Conference on FICTA. Singapore: Springer, 2018. 511–518.
    [10] Freedman S, Jin GZ. The information value of online social networks: Lessons from peer-to-peer lending. International Journal of Industrial Organization, 2017, 51: 185–222. [doi: 10.1016/j.ijindorg.2016.09.002
    [11] Ly A, Marsman M, Wagenmakers EJ. Analytic posteriors for Pearson’s correlation coefficient. Statistica Neerlandica, 2018, 72(1): 4–13. [doi: 10.1111/stan.12111
    [12] Erwianda MSF, Kusumawardani SS, Santosa PI, et al. Improving confusion-state classifier model using XGBoost and tree-structured parzen estimator. 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI). Yogyakarta: IEEE, 2019. 309–313.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

马朔,李钊,赵军.基于CatBoost用信预测模型的TreeSHAP解释性研究.计算机系统应用,2023,32(3):338-344

复制
分享
文章指标
  • 点击次数:698
  • 下载次数: 2183
  • HTML阅读次数: 2090
  • 引用次数: 0
历史
  • 收稿日期:2022-08-17
  • 最后修改日期:2022-09-15
  • 在线发布日期: 2022-11-29
文章二维码
您是第11184995位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号