﻿ 基于群决策的P2P借贷信用风险评估
 计算机系统应用  2019, Vol. 28 Issue (5): 226-231 PDF

Group Decision-Making Method for Credit Risk Assessment in P2P Lending
JIANG Xue-Ying, QIN Jin
School of Management, University of Science and Technology of China, Hefei 230026, China
Foundation item: General Program of National Natural Science Foundation of China (71571175)
Abstract: In this study, we propose a combination approach based on group decision-making method, using random forest, neural network and GBDT as individual learners, to assess credit risk of borrowers in P2P lending. To validate the proposed method, two real-world datasets from PPDai.com and renrendai.com are examined. The results show that, compared with the individual learners, the proposed method has made a better performance.
Key words: group decision-making     P2P lending     credit risk assessment     machine learning     ensemble

1 引言

P2P借贷是指个人用户之间借助专业的互联网借贷平台进行的小额借贷交易. 近年来, P2P借贷行业在中国发展迅猛, 网贷之家的数据显示, 2017年国内P2P借贷交易额达2.8万亿元, 较2016年增长超过40%, 活跃投资人数达440万人. 为维持行业健康发展, 需进行有效的风险控制.

2 基于群决策的P2P借贷信用风险评估算法及模型构建

2.1 基于群决策的P2P借贷信用风险评估集成算法

$P_i^*$ 为受到其余个体学习器预测值的影响后, ${M_i}$ 的修正预测值, 取 $P_i^*$ 为所有个体学习器预测值的线性组合, 即:

 $P_i^* = \sum\limits_{j = 1}^N {{w_{ij}}{P_j}}$ (1)

 ${P^*} = WP$ (2)

 $\begin{array}{l} \pi W = \pi \\ \sum\nolimits_{i = 1}^N {{\pi _i} = 1} \end{array}$ (3)

 $R = \sum\limits_{i = 1}^N {{\pi _i}{P_i}}$ (4)

 ${U_{i|i}} = - {P_i}{\log _2}({P_i}) - (1 - {P_i}){\log _2}(1 - {P_i})$ (5)

${P_i}$ 趋向0或1时, 个体学习器 ${M_i}$ 对借款项目违约与否的判定清晰, 此时不确定度 ${U_{i|i}}$ 趋向于0. 当 ${P_i}$ 趋向0.5时, 个体学习器 ${M_i}$ 对借款项目违约与否的判定近似随机, 此时不确定度 ${U_{i|i}}$ 趋向于1. 因此, 局部不确定度 ${U_{i|i}}$ 能够反映个体学习器自身决策不确定的程度.

 ${U_{i|j}} = - {P_{i|j}}{\log _2}({P_{i|j}}) - (1 - {P_{i|j}}){\log _2}(1 - {P_{i|j}})$ (6)

${P_{i|j}}$ 表示的是个体学习器 ${M_i}$ 在个体学习器 ${M_j}$ 影响下的违约概率预测值, 取 ${P_{i|j}}$ ${P_i}$ ${P_j}$ 的线性组合, 即:

 ${P_{i|j}} = {P_j}{I_{i|j}} + {P_i}(1 - {I_{i|j}})$ (7)

${I_{i|j}}$ 为sigmoid函数, 即:

 ${I_{i|j}} = \frac{1}{{1 + {e^{ - (Ac{c_j} - Ac{c_i})}}}}$ (8)

 $\left\{\begin{array}{l} \min {z_i} = \sum\nolimits_{j = 1}^N {w_{ij}^2U_{i|j}^2} \\ \sum\nolimits_{j = 1}^N {{w_{ij}} = 1} \\ \end{array}\right.$ (9)

 ${L_i} = \sum\limits_{j = 1}^N {w_{ij}^2U_{i|j}^2 - \rho (\sum\limits_{j = 1}^N {{w_{ij}} - 1)} }$ (10)

${L_i}$ ${w_{ij}}$ 的偏导并令结果等于0, 结合 $\sum\nolimits_{j = 1}^N {{w_{ij}} = 1}$ , 得到 ${w_{ij}}$ 的表达式:

 ${w_{ij}} = \frac{1}{{U_{i|j}^2\sum\nolimits_{k = 1}^N {U_{i|k}^{ - 2}} }}$ (11)

2.2 基于群决策的P2P借贷信用风险评估模型构建过程

(1)分别运用N种机器学习算法, 在训练数据中训练出个体学习器 ${M_1},{M_2},\cdots,{M_N}$ , 并得到个体学习器的预测准确率 $Ac{c_1},\cdots,Ac{c_N}$ .

(2)应用个体学习器 ${M_1},{M_2},\cdots,{M_N}$ , 对测试集中借款项目的违约概率进行预测, 预测值为 ${P_1},$ $,{P_N}$ .

(3)运用式(5)求得个体学习器的局部不确定度,运用式(6)–(8)求得个体学习器的全局不确定度.

(4)运用式(11)求得权重 ${w_{ij}}(i=1,\cdots,N,j=1,\cdots,N)$ .

(5)将 ${w_{ij}}$ 代入式(3), 解得向量 $\pi$ 的值.

(6)运用式(4), 最终得到所有个体学习器的集成结果R.

2.3 个体学习器描述

2.3.1 随机森林

2.3.2 神经网络

2.3.3 梯度提升树

3 实验分析 3.1 实验数据及变量描述

3.2 实验结果

4 结语

 [1] Serrano-Cinca C, Gutiérrez-Nieto B, López-Palacios L. Determinants of default in P2P lending. PLoS One, 2015, 10(10): e0139427. DOI:10.1371/journal.pone.0139427 [2] Malekipirbazari M, Aksakalli V. Risk assessment in social lending via random forests. Expert Systems with Applications, 2015, 42(10): 4621-4631. DOI:10.1016/j.eswa.2015.02.001 [3] Byanjankar A, Heikkilä M, Mezei J. Predicting credit risk in peer-to-peer lending: A neural network approach. Proceedings of 2015 IEEE Symposium Series on Computational Intelligence. Cape Town, South Africa. 2015. 719–725. [4] Xia YF, Liu CZ, Liu NN. Cost-sensitive boosted tree for loan evaluation in peer-to-peer lending. Electronic Commerce Research and Applications, 2017, 24: 30-49. DOI:10.1016/j.elerap.2017.06.004 [5] Xia YF, Liu CZ, Da BW, et al. A novel heterogeneous ensemble credit scoring model based on bstacking approach. Expert Systems With Applications, 2018, 93: 182-199. DOI:10.1016/j.eswa.2017.10.022 [6] Choudhury AK, Shankar R, Tiwari MK. Consensus-based intelligent group decision-making model for the selection of advanced technology. Decision Support Systems, 2006, 42(3): 1776-1799. DOI:10.1016/j.dss.2005.05.001 [7] Zhao M, Ma XY, Wei DW. A method considering and adjusting individual consistency and group consensus for group decision making with incomplete linguistic preference relations. Applied Soft Computing, 2017, 54: 322-346. DOI:10.1016/j.asoc.2017.01.010 [8] 贾子文, 顾煜炯, 邢月, 等. 改进专家群决策方法的风电机组故障风险评价研究. 可再生能源, 2018, 36(3): 453-460. DOI:10.3969/j.issn.1671-5292.2018.03.020 [9] 吴胜, 李延来, 陈振颂. 群决策方法及其在供应商选择中的应用. 计算机仿真, 2018, 35(3): 184-189. DOI:10.3969/j.issn.1006-9348.2018.03.041 [10] Degroot MH. Reaching a consensus. Journal of the American Statistical Association, 1974, 69(345): 118-121. DOI:10.1080/01621459.1974.10480137 [11] Basir OA, Shen HC. New approach for aggregating multi-sensory data. Journal of Robotic Systems, 1993, 10(8): 1075-1093. DOI:10.1002/rob.v10:8 [12] Breiman L. Random forests. Machine Learning, 2001, 45(1): 5-32. DOI:10.1023/A:1010933404324 [13] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer, 2009.