计算机系统应用  2019, Vol. 28 Issue (7): 114-120 PDF

Traffic Flow Forecasting Model Based on Data Mining Technology
DENG Jing, ZHANG Qian
School of Computer Engineering, Nanjing Institute of Technology, Nanjing 211167, China
Abstract: In this study, traffic flow forecasting in the field of traffic data mining is studied and implemented. This paper presents an algorithm for feature selection of traffic flow data and establishment of traffic flow prediction model based on data mining technology. After cleaning the sampled data, the classification and regression decision tree are used as base learners, and the gradient lifting decision tree is used for regression fitting to calculate the characteristic importance of traffic data. The importance is used as the basis of adaptive feature selection. Secondly, the clustering algorithm is used to cluster the selected feature data, which reduces the sample size and makes the similar data more similar. Finally, real-time data matching and clustering are used as training data sets, and support vector machine is used to predict traffic flow after parameters optimization by Artificial Fish Swarm Algorithm (AFSA). At the end of this paper, experimental data are presented to demonstrate the proposed algorithm and model.
Key words: data mining     traffic flow forecasting     feature selection     gradient lifting decision tree     Support Vector Machine (SVM)     Artificial Fish Swarm Algorithm (AFSA)

1 基于梯度提升决策树的自适应特征选择算法 1.1 问题的提出与总体解决思路

1.2 单棵CART树中特征的基尼指数和分裂次数

 $Gini\left( {D,A} \right) = \frac{{\left| {{D_1}} \right|}}{{\left| D \right|}}Gini\left( {{D_1}} \right) + \frac{{\left| {{D_2}} \right|}}{{\left| D \right|}}Gini\left( {{D_2}} \right)$ (1)
1.3 梯度提升决策树(GBDT)模型

 $f\left( x \right) = \sum\nolimits_{m = 1}^M {{f_m}} \left( x \right)$ (2)

 $T\left( {x,{ \otimes _m}} \right) = \mathop \sum \nolimits_{j = 1}^J \;\;\;\;\;{{{\gamma }}_{{j}}}\;I:{{x}} \in {{{R}}_{{j}}}$ (3)

 ${f_m}\left( x \right) = {f_{m - 1}}\left( x \right) + T\left( {x,{ \otimes _m}} \right)$ (4)

 ${ \otimes _m} = {\arg _{{ \otimes _m}}}\min \sum\nolimits_{i = 1}^N {L\left( {{y_i},{f_{m - 1}}\left( {{x_i}} \right) + T\left( {{x_i};{ \otimes _m}} \right)} \right)}$ (5)
1.4 基于梯度提升决策树的自适应选择算法

 $I\left( {{x_i}} \right) = \frac{{\sum\nolimits_{j = 1}^M {Gini{{\left( {{x_i}} \right)}_j}} }}{M} + division\pi \left( {{x_i}} \right)$ (6)

 图 1 梯度提升决策树自适应选择算法

2 基于聚类分析和支持向量机回归的交通流预测 2.1 总体思路

2.2 交通流数据的聚类分析

 $P\left( {{x_i}} \right) = \frac{{d{{({x_i})}^2}}}{{\sum\nolimits_{{x_j} \in D} {d{{({x_i})}^2}} }}$ (7)

 ${{SC}}\left( {i,j} \right) = \left| {\frac{{b(i,j) - d(i,j)}}{{\max (b(i,j),d(i,j))}}} \right|$ (8)

2.3 支持向量机回归模型

 \begin{aligned} f(x) = & {\omega ^{\rm{T}}} \cdot \varphi (x) + b\\ = & \sum\nolimits_{i = 1}^n {({\alpha _i} - \alpha _i^*)} K({x_i},x) + b \end{aligned} (9)

 $\begin{array}{l} b = \frac{1}{{{N_{NSV}}}}\left\{ {\displaystyle\sum\nolimits_{0 \le {\alpha _i} < c} {\left[ {{y_i} - \displaystyle\sum\nolimits_{{x_i} \in SV} {({\alpha _j} - \alpha _j^*)K({x_j},{x_i}) - \varepsilon } } \right]} } \right.\\ \left. \;\;\;\;{ + \displaystyle\sum\nolimits_{0 \le \alpha _i^* < c} {\left[ {{y_i} - \displaystyle\sum\nolimits_{{x_i} \in SV} {({\alpha _j} - \alpha _j^*){K^*}({x_j},{x_i}) + \varepsilon } } \right]} } \right\} \end{array}$ (10)

2.4 人工鱼群算法优化支持向量机参数

2.5 基于K-means++聚类算法和支持向量机回归的交通流预测

 图 2 基于聚类分析和支持向量机回归的交通流预测模型

3 实验结果和分析 3.1 数据来源与评价指标

3.2 实验一: 基于梯度提升决策树的自适应特征选择

3.3 实验二: 基于聚类分析和支持向量机回归的交通流预测 3.3.1 聚类分析

3.3.2 支持向量机交通流预测

 图 3 交通流预测结果

3.3.3 实验结果分析

4 结束语

 [1] 李青. 城市交通流预测关键技术研究[硕士学位论文]. 绵阳: 西南科技大学, 2015. [2] Son J, Jung I, Park K, et al. Tracking-by-segmentation with online gradient boosting decision tree. Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile, 2015, 3056-3064. [3] Arthur D, Vassilvitskii S. k-means++: The advantages of careful seeding. Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. New Orleans, LA, USA, 2007, 1027-1035. [4] 朱连江, 马炳先, 赵学泉. 基于轮廓系数的聚类有效性分析. 计算机应用, 2010, 30(S2): 139-141. [5] 李晓磊, 薛云灿, 路飞, 等. 基于人工鱼群算法的参数估计方法. 山东大学学报(工学版), 2014, 34(3): 84-87.