Abstract:With the rapid growth of Internet finance and electronic payment business, resulting personal credit problems are also increasing. Personal credit prediction is essentially an imbalanced binary sequence classification issue. Such an issue is faced with a large size and high dimension of data samples and extremely imbalanced data distribution. To effectively distinguish the credit situation of applicants, this study proposes a personal credit prediction method based on feature optimization and ensemble learning (PL-SmoteBoost). This method involves the construction of a personal credit prediction model within the boosting ensemble framework. Specifically, data initialization analysis with the Pearson correlation coefficient is conducted to eliminate redundant data; some features are selected with the least absolute shrinkage and selection operator (Lasso) to reduce data dimension and thereby lower high dimensional risks; linear interpolation among the minority classes in the dimension-reduced data is carried out by SMOTE oversampling to solve the class imbalance problem; finally, to verify the effectiveness of the proposed algorithm, this study takes the algorithms commonly used to deal with binary classification issues as comparison methods and tests the algorithms with the high dimensional imbalance datasets downloaded from the open databases of Kaggle and Microsoft. With the area under the curve (AUC) as the algorithm evaluation index, the test results are analyzed by the statistical test method. The results show that the proposed PL-SmoteBoost algorithm has significant advantages over other algorithms.