基于可解释集成学习的信贷违约预测
作者:
基金项目:

国家自然科学基金(61702020); 北京市自然科学基金-海淀原始创新联合基金(L182007)


Prediction of Credit Default Based on Interpretable Integration Learning
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [16]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    人工智能促进了风控行业的发展, 智能风控的核心在于风险控制, 信贷违约预测模型是解决这一问题必须倚靠的手段. 传统的解决方案是基于人工和广义线性模型建立的, 然而现在通过网络完成的交易数据, 具有高维性和多重来源等特点, 远远超出了现有模型的处理能力, 对于传统风控提出了巨大的挑战. 因此, 本文提出一种基于融合方法的可解释信贷违约预测模型, 首先选取LightGBM、DeepFM和CatBoost作为基模型, CatBoost作为次模型, 通过模型融合提升预测结果的准确性, 然后引入基于局部的、与模型无关的可解释性方法LIME, 解释融合模型的预测结果. 基于真实数据集的实验结果显示, 该模型在信贷违约预测任务上具有较好的精确性和可解释性.

    Abstract:

    Artificial intelligence accelerates the development of the risk control industry. Undoubtedly, risk control is the core of intelligent risk control, and a credit default prediction model is its essential means. The traditional access to risk control is based on artificial and generalized linear models. However, the data of transactions completed on the Internet are characterized by high dimensions and multiple sources, which cannot be processed by existing models. This poses a great challenge to traditional risk control. In view of this, this study proposes an interpretable credit default model based on the fusion method. To be specific, the accuracy of the prediction results is first enhanced through the fusion of base models (LightGBM, DeepFM, and CatBoost) and secondary model (CatBoost). Then, the prediction result of the fusion model is interpreted by the introduced local-based interpretability method LIME that is independent of the model. According to the experimental result of a real dataset, the satisfactory accuracy and interpretability of the model can be witnessed on the task of credit default prediction.

    参考文献
    [1] 程大伟, 牛志彬, 张丽清. 大规模不均衡担保网络贷款的风险研究. 计算机学报, 2020, 43(4): 668–682. [doi: 10.11897/SP.J.1016.2020.00668
    [2] 庞素琳. 违约风险下的信贷决策模型与机制. 管理科学学报, 2012, 15(4): 58–70. [doi: 10.3969/j.issn.1007-9807.2012.04.008
    [3] 韦璠, 宋云飞, 邵明莉, 等. 利用特征融合和整体多样性提升单模型鲁棒性. 软件学报, 2020, 31(9): 2756–2769. [doi: 10.13328/j.cnki.jos.005943
    [4] Chen JD, Tao Y, Wang HR, et al. Big data based fraud risk management at Alibaba. The Journal of Finance and Data Science, 2015, 1(1): 1–10. [doi: 10.1016/j.jfds.2015.03.001
    [5] 章宁, 陈钦. 基于AUC及Q统计值的集成学习训练方法. 计算机应用, 2019, 39(4): 935–939
    [6] Deng TN. Study of the prediction of micro-loan default based on logit model. 2019 International Conference on Economic Management and Model Engineering (ICEMME). Malacca: IEEE, 2019. 260–264.
    [7] Kim A, Cho SB. An ensemble semi-supervised learning method for predicting defaults in social lending. Engineering Applications of Artificial Intelligence, 2019, 81: 193–199. [doi: 10.1016/j.engappai.2019.02.014
    [8] 魏力, 王子炫. 结合标签规则的P2P网贷风控模型. 计算机与数字工程, 2020, 48(7): 1687–1692
    [9] Tong ENC, Mues C, Thomas L. A zero-adjusted gamma model for mortgage loan loss given default. International Journal of Forecasting, 2013, 29(4): 548–562. [doi: 10.1016/j.ijforecast.2013.03.003
    [10] 马晓君, 宋嫣琦, 常百舒, 等. 基于CatBoost算法的P2P违约预测模型应用研究. 统计与信息论坛, 2020, 35(7): 9–17. [doi: 10.3969/j.issn.1007-3116.2020.07.002
    [11] Ma XJ, Sha JL, Wang DH, et al. Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning. Electronic Commerce Research and Applications, 2018, 31: 24–39. [doi: 10.1016/j.elerap.2018.08.002
    [12] 盛杰, 刘岳, 尹成语. 基于多特征和Stacking算法的Android恶意软件检测方法. 计算机系统应用, 2018, 27(2): 197–201. [doi: 10.3969/j.issn.1003-3254.2018.02.033
    [13] 徐磊, 孙朝云, 李伟, 等. 基于SSA-LightGBM的交通流量调查数据趋势预测. 计算机系统应用, 2021, 30(1): 243–249. [doi: 10.15888/j.cnki.csa.007750
    [14] 王美, 龙华, 邵玉斌, 等. 基于FM与DeepFM模型对GTD特征的研究. 通信技术, 2019, 52(6): 1495–1499. [doi: 10.3969/j.issn.1002-0802.2019.06.033
    [15] 党存禄, 武文成, 李超锋, 等. 基于CatBoost算法的电力短期负荷预测研究. 电气工程学报, 2020, 15(1): 76–82
    [16] Ribeiro MT, Singh S, Guestrin C. “Why should I trust you?”: Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco: ACM, 2016. 1135–1144.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

蔡青松,吴金迪,白宸宇.基于可解释集成学习的信贷违约预测.计算机系统应用,2021,30(12):194-201

复制
分享
文章指标
  • 点击次数:1065
  • 下载次数: 2205
  • HTML阅读次数: 2343
  • 引用次数: 0
历史
  • 收稿日期:2021-03-02
  • 最后修改日期:2021-03-29
  • 在线发布日期: 2021-12-10
文章二维码
您是第11202861位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号