计算机系统应用  2019, Vol. 28 Issue (1): 107-112 PDF

1. 复旦大学 计算机科学技术学院, 上海 201203;
2. 上海视频技术与系统工程研究中心, 上海 201203;
3. 复旦大学 上海市智能信息处理实验室, 上海 201203;
4. 物联网技术研发中心, 上海 201204

Dilated Fully Convolutional Network with Grouped Proposals for Vehicle Detection
CHENG Ya-Hui1,2,3, CAI Xuan4, FENG Rui1,2,3
1. School of Computer Science, Fudan University, Shanghai 201203, China;
2. Shanghai Engineering Research Center for Video Technology and System, Shanghai 201203, China;
3. Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai 201203, China;
4. Internet of Things Technology Research and Development Center, Shanghai 201204, China
Abstract: Although deep learning based vehicle detection approaches have achieved remarkable success recently, they are still likely to miss comparatively small-sized vehicle. To address this problem, we propose a novel Dilated Fully Convolutional Network with Grouped Proposals (DFCN-GP) for vehicle detection. Specifically, we invented a grouped network structure to combine feature maps from both lower and higher level convolutional layers for the generation of object proposal and focusing more on lower level features, which are more sensitive to discovering small object. In addition, we increase the size and reception field of the feature map in the last convolutional layers to keep more detailed information via dilated convolution, which is used in both object proposal and vehicle detection sub-networks. In the experiment, we conducted ablation studies to demonstrate the effectiveness of the grouped proposals and dilated convolutional layer. We also show that the proposed approach outperforms other state-of-the-art methods on the UA-DETRAC vehicle detection.
Key words: machine vision     vehicle detection     grouped region proposals     dilated convolutional networks

 图 1 DFCN-GP算法框架图

1 算法原理

Step1. 图像特征提取. 利用Resnet-101和扩张卷积层提取输入图像的特征, 用于后续目标框的提取和车辆检测.

Step2. 区域候选目标框提取. 提出一种两组组合方式, 即从不同卷积层生成候选目标框.

Step3. 车辆分类和定位. 利用多任务学习方式, 在最后一层扩张卷积层同时判别候选框是否为车辆以及估计车辆的位置.

1.1 图像特征提取

1.2 区域候选目标框提取

1.3 车辆分类和定位

 $L(s,{s^*},t,{t^*}) = {L_{cls}}(s,{s^*}) + \lambda {L_{reg}}(t,{t^*})$ (1)

 $\begin{array}{l}{t_x} = ({G_x} - {A_x})/{A_w}\\{t_y} = ({G_y} - {A_y})/{A_h}\\{t_w} = \log ({G_w}/{A_w})\\{t_h} = \log ({G_h}/{A_h})\end{array}$ (2)

2 实验部分 2.1 数据集和实验设置

2.2 控制变量对比试验

2.2.1 区域提取目标框网络输入设置

2.2.2 全卷积检测网络的输入设置

2.3 和现有模型比较

 图 2 对比检测算法在UA-DETRAC测试集上的精度-召回率曲线

 图 3 本文提出的车辆检测算法在UA-DETRAC测试集上的结果(红色框标出)

3 总结

 [1] 董春利, 董育宁. 基于视频的车辆检测与跟踪算法综述. 南京邮电大学学报(自然科学版), 2009, 29(2): 88-94. DOI:10.3969/j.issn.1673-5439.2009.02.018 [2] Kong T, Yao AB, Chen YR, et al. HyperNet: Towards accurate region proposal generation and joint object detection. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 845–853. [3] Wang L, Lu Y, Wang H, et al. Evolving boxes for fast vehicle detection. Proceedings of 2017 IEEE International Conference on Multimedia and Expo. Hong Kong, China. 2017. 1135–1140. [4] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 770–778. [5] Yu F, Koltun V, Funkhouser T. Dilated residual networks. arXiv preprint arXiv: 1705.09914, 2017. [6] Ghodrati A, Diba A, Pedersoli M, et al. DeepProposal: Hunting objects by cascading deep convolutional layers. Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile. 2015. 2578–2586. [7] Ranjan R, Patel VM, Chellappa R. HyperFace: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. arXiv preprint arXiv: 1603.01249, 2016. [8] Dai JF, Li Y, He KM, et al. R-FCN: Object detection via region-based fully convolutional networks. Advances in Neural Information Processing Systems 29. Springer. 2016. 379–387. [9] Girshick R. Fast R-CNN. Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile. 2015. 1440–1448. [10] Wen LY, Du DW, Cai ZW, et al. UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking. arXiv preprint arXiv:1511.04136, 2015. [11] Felzenszwalb PF, Girshick R, McAllester D, et al. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627-1645. DOI:10.1109/TPAMI.2009.167 [12] Dollár P, Appel R, Belongie S, et al. Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(8): 1532-1545. DOI:10.1109/TPAMI.2014.2300479 [13] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA. 2014. 580–587. [14] Ren SQ, He KM, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada. 2015. 91–99. [15] Cai ZW, Saberian M, Vasconcelos N. Learning complexity-aware cascades for deep pedestrian detection. Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile. 2015. 3361–3369.