计算机系统应用  2018, Vol. 27 Issue (11): 78-83 PDF

Fast Abnormal Pedestrians Detection Based on Multi-Task CNN in Surveillance Video
LI Jun-Jie, LIU Cheng-Lin, ZHU Ming
School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China
Foundation item: National Science and Technology Major Project of China (2017ZX03001019)
Abstract: In case that public safety has already caused extensive social concern in recent years, how to use surveillance video to detect abnormal pedestrians and prevent dangerous events becomes a hot topic. Abnormal pedestrians are those who are distinctly different from ordinary pedestrians in appearance, for example, using helmet to cover the face or ducking from the camera. Considering that the characteristics of abnormal pedestrians are mainly concentrated in head and face, this study proposes a fast detection method for abnormal pedestrians based on multi-task Convolutional Neural Network (CNN) and one-class Support Vector Machine (SVM) for head-facial features. First, we detect head-facial regions in surveillance video, then we use the multi-task CNN to extract features of these regions, and then we use one-class SVM to judge whether it is a normal pedestrian or not. In addition, this study designs a convolution kernel splitting method for CNN to accelerate the feature extraction speed. Finally, the experiment shows that the algorithm proposed in this study can effectively and quickly detect abnormal pedestrians in surveillance video.
Key words: surveillance video     abnormal pedestrians     multi-task CNN (Convolutional Neural Network)     convolution kernel splitting method     one-class SVM (Support Vector Machine)

 图 1 监控场景中的异常行人示例

1 异常行人检测概述

 图 2 异常行人检测系统架构

1.1 头面部区域检测

1.2 异常行人判别

2 算法设计与实现 2.1 多任务卷积神经网络

 图 3 多任务卷积神经网络初级模型(输入以120×100为例)

2.2 卷积核拆分

 图 4 卷积核拆分

 $\begin{split} \frac{{n \cdot n \cdot M \cdot k + n \cdot n \cdot M \cdot k + n \cdot n \cdot N \cdot M}}{{n \cdot n \cdot N \cdot k \cdot k \cdot M}}= \frac{2}{{kN}} + \frac{1}{{{k^2}}}\end{split}$

2.3 训练数据集

 图 5 改进后的多任务卷积神经网络模型

1) 使用公开人脸属性数据集CelebA[20]进行网络预训练, 选用了其中十二个属性作为多任务网络模型的输出属性, 分别为眼袋、光头、刘海、黑发、金发、眼镜、性别、年龄段、嘴巴张开、胡子、帽子和领带. 部分属性及对应样本如图6所示.

2) 在预训练得到的参数基础上, 用实际监控视频中的样本进行微调, 多任务网络的输出部分改为如下四个分类任务: 是否戴眼镜、是否戴帽子、是否露出嘴巴和人脸方位(正面、侧面和背面), 如图7所示.

 图 6 CelebA数据集部分样本示例

 图 7 实际监控视频部分样本示例

2.4 单分类算法

3 实验与分析

3.1 多任务卷积神经网络的训练与评估

3.2 图像特征与单分类器的组合

3.3 异常行人检测系统

 图 8 异常行人检测示例

4 总结与展望

 [1] Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Kauai, HI, USA. 2001. [2] Liao SC, Jain AK, Li SZ. A fast and accurate unconstrained face detector. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(2): 211-223. DOI:10.1109/TPAMI.2015.2448075 [3] Wang K, Dong Y, Bai HL, et al. Use fast R-CNN and cascade structure for face detection. Proceedings of 2016 Visual Communications and Image Processing. Chengdu, China. 2016. 1–4. [4] Li JJ, Karmoshi S, Zhu M. Unconstrained face detection based on cascaded convolutional neural networks in surveillance video. Proceedings of the 2nd International Conference on Image, Vision and Computing. Chengdu, China. 2017. 46–52. [5] Ishii Y, Hongo H, Yamamoto K, et al. Face and head detection for a real-time surveillance system. Proceedings of the 17th International Conference on Pattern Recognition. Cambridge, UK. 2004. 298–301. [6] Ding XF, Xu H, Cui P, et al. A cascade SVM approach for head-shoulder detection using histograms of oriented gradients. Proceedings of 2009 IEEE International Symposium on Circuits and Systems. Taipei, China. 2009. 1791–1794. [7] Ji PF, Kim Y, Yang Y, et al. Face occlusion detection using skin color ratio and LBP features for intelligent video surveillance systems. Proceedings of 2016 Federated Conference on Computer Science and Information Systems. Gdansk, Poland. 2016. 253–259. [8] Zhang XH, Zhou L, Zhang T, et al. A novel efficient method for abnormal face detection in ATM. Proceedings of 2014 International Conference on Audio, Language and Image Processing. Shanghai, China. 2014. 695–700. [9] 张伟峰, 朱明. 基于巡逻小车的人脸遮挡异常事件实时检测. 计算机系统应用, 2017, 26(12): 175-180. [10] Zhang YL, Lu Y, Wu HT, et al. Face occlusion detection using cascaded convolutional neural network. Proceedings of the 11th Chinese Conference on Biometric Recognition. Chengdu, China. 2016. 720–727. [11] Xia YZ, Zhang BL, Coenen F. Face occlusion detection based on multi-task convolution neural network. Proceedings of the 12th International Conference on Fuzzy Systems and Knowledge Discovery. Zhangjiajie, China. 2015. 375–379. [12] Dalal N, Triggs B. Histograms of oriented gradients for human detection. Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA. 2005. 886–893. [13] Razavian AS, Azizpour H, Sullivan J, et al. CNN features off-the-shelf: An astounding baseline for recognition. Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Columbus, OH, USA. 2014. 512–519. [14] Schölkopf B, Platt JC, Shawe-Taylor J, et al. Estimating the support of a high-dimensional distribution. Neural Computation, 2001, 13(7): 1443-1471. DOI:10.1162/089976601750264965 [15] Rousseeuw PJ, Van Driessen K. A fast algorithm for the minimum covariance determinant estimator. Technometrics, 1999, 41(3): 212-223. DOI:10.1080/00401706.1999.10485670 [16] Liu FT, Ting KM, Zhou ZH. Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data, 2012, 6(1): 1-39. DOI:10.1145/2133360.2133363 [17] He KM, Zhang XY, Ren SQ, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. Proceedings of the 13th European Conference on Computer Vision– ECCV 2014. Zurich, Switzerland, 2014, 346-361. [18] Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 2818–2826. [19] Howard AG, Zhu ML, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017. [20] Liu ZW, Luo P, Wang XG, et al. Deep learning face attributes in the wild. Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile. 2015. 3730–3738. [21] 汪廷华, 陈峻婷. 核函数的选择研究综述. 计算机工程与设计, 2012, 33(3): 1181-1186. DOI:10.3969/j.issn.1000-7024.2012.03.068 [22] Utkin LV, Chekh AI. A new robust model of one-class classification by interval-valued training data using the triangular kernel. Neural Networks, 2015, 69: 99-110. DOI:10.1016/j.neunet.2015.05.004 [23] Jia YQ, Shelhamer E, Donahue J, et al. Caffe: Convolutional architecture for fast feature embedding. Proceedings of the 22nd ACM International Conference on Multimedia. Orlando, FL, USA. 2014. 675–678. [24] Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 2011, 12(10): 2825-2830.