﻿ 基于SSD卷积神经网络的公交车下车人数统计
 计算机系统应用  2019, Vol. 28 Issue (3): 51-58 PDF

Statistics on Number of People Getting off Bus Based on SSD Convolutional Neural Network
LI Ji-Xiu, LI Xiao-Tian, LIU Zi-Yi
Graduate School at Tangshan, Southwest Jiaotong University, Tangshan 063016, China
Foundation item: Fund of Land and Resources Bureau of Sichuan Province (KJ-2018-16)
Abstract: The statistics of traditional and typical bus passengers have some shortcomings in accuracy and speed, and the effect of extracting target features is poor. This study proposes a bus counting system based on deep convolutional neural network to solve the crowd counting problem. The first thing to make a dataset is that all the datasets used for training are hand-labeled. And the bus camera angle is wider than the previous literature. This study first compares the effects of various deep convolutional neural network models on the whole body detection of passengers. Considering the detection rate and accuracy, the single-detector deep convolutional neural network model is used to detect passengers’ heads. The simple online and real-time target tracking algorithm implements multi-target tracking of human heads, and the cross-region crowd counting method is used to count the number of passenger getting off the bus. The system accuracy rate reaches 78.38% and the operating rate is approximately 19.79 frames per second. the passenger count is achieved.
Key words: SSD target detection     convolutional neural network     SORT target tracking     cross-region population statistics

1 系统框架

 图 1 系统框架

2 本文算法

2.1 数据集的分析及处理

 图 2 不同网络模型对乘客进行全身检测结果

2.2 运用Caffe-SSD模型进行公交车人头检测

SSD[12]深度学习目标检测算法是在YOLO[10]算法上改进而来, 基于端对端方法, 无区域提名, 使用VGG-16-Atrous作为基础网络, 沿用了YOLO中直接回归 bbox和分类概率的方法, SSD与YOLO差异之处是除了在选取的5个特征图上进行预测, 还有在最终特征图上做目标检测. SSD还参考了Faster R-CNN目标检测算法, 大批使用anchor来提升识别精确度, 应用全图全部位置的多尺度区域特征并进行回归. 由于结合这两种结构, SSD综合了Faster R-CNN的anchor box和YOLO端对端的单个神经网络检测思路. 所以SSD能保持较高的识别准确度和识别速度.

(1) 训练过程中迭代40 000次, 准确率呈曲线增长, 后逐渐平缓,并逐渐稳定在84.18%左右. 训练迭代40 000次准确率过程变化图如图3(a)所示; 训练过程中迭代80 000次, 准确率在84.20%附近波动, 最终准确率为84.22%, 比迭代40 000次的准确率高了0.04%. 训练迭代80 000次, 准确率过程变化图如图3(b)所示.

(2) 为了获得更好的训练效果, 迭代40 000次训练过程中, 学习率呈梯状逐渐减小, 迭代40 000次的学习率变化如图4(a)所示; 迭代40 000次到80 000次的学习率都是0.000 000 01, 如图4(b)所示.

(3) 迭代40 000次训练过程中, loss函数的值呈曲线逐渐减小, 并渐渐稳定. 在训练过程中loss丢失变化图如图5(a)所示; 迭代40 000次到80 000次训练过程中, loss丢失值在3附近波动, 和迭代40 000次效果差不多. 在训练过程中loss丢失变化图如图5(b)所示.

2.3 公交车人头跟踪和人群计数

 图 3 训练过程准确率变化

3 实验结果及分析

 图 4 训练过程中学习率的设定

 图 5 训练过程中loss丢失变化图

 图 6 跨线人群计数流程图

3.1 正常情况

 图 7 数据集的多样性

 图 8 训练SSD网络模型后的人头检测结果

3.2 误检情况

 图 9 稀疏情况下的目标跟踪和人数统计效果图

 图 10 拥挤情况下的目标跟踪和人数统计效果图

 图 11 黑夜情况下的目标跟踪和人数统计效果图

 图 12 误检情况

3.3 漏检情况

 图 13 漏检情况

4 结论与展望

 [1] Chen CH, Chang YC, Chen TY, et al. People counting system for getting in/out of a bus based on video processing. 2008 Eighth International Conference on Intelligent Systems Design and Applications. Kaohsiung, China. 2008. 565–569. [2] 张雅俊, 高陈强, 李佩, 等. 基于卷积神经网络的人流量统计. 重庆邮电大学学报(自然科学版), 2017, 29(2): 265-271. [3] Xu HZ, Lv P, Meng L. People counting system based on head-shoulder detection and tracking in surveillance video. 2010 International Conference on Computer Design and Applications. Qinhuangdao, China. 2010. 394–398. [4] Zeng CB, Ma HD. Robust head-shoulder detection by PCA-based multilevel HOG-LBP detector for people counting. 2010 20th Conference on Pattern Recognition. Istanbul, Turkey. 1995. 2069–2072. [5] Li JW, Huang L, Liu CP. An efficient self-learning people counting system. The First Asian Conference on Pattern Recognition. Beijing, China. 2011. 125–129. [6] 付敏. 基于卷积神经网络的人群密度估计[硕士学位论文]. 成都: 电子科技大学, 2014. [7] 李衡宇, 何小海, 吴炜, 等. 基于计算机视觉的公交车人流量统计系统. 四川大学学报(自然科学版), 2007, 44(4): 825-830. DOI:10.3969/j.issn.0490-6756.2007.04.022 [8] Rigoll G, Eickeler S, Müller S. Person tracking in real-world scenarios using statistical methods. Proceedings of Fourth IEEE International Conference on Automatic Face and Gesture Recognition. Grenoble, France. 2000. 342–347. [9] 鲜晓东, 石亚麋, 唐云建, 等. 基于乘客多运动行为的公交客流计数判定方法. 计算机工程, 2015, 41(4): 176-180, 186. DOI:10.3969/j.issn.1000-3428.2015.04.033 [10] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 779–788. [11] Ren SQ, He KM, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge, MA, USA. 2015. 91–95. [12] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector. In: Leibe B, Matas J, Sebe N, eds. Lecture Notes in Computer Science. Cham: Springer, 2016. 21–37. [13] Bewley A, Ge ZY, Ott L, et al. Simple online and realtime tracking. 2016 IEEE International Conference on Image Processing. Phoenix, AZ, USA. 2016. 3464–3468.