计算机系统应用  2019, Vol. 28 Issue (1): 239-244 PDF

Road Scene Segmentation Based on NVIDIA Jetson TX2
LI Shi-Jing, QING Lin-Bo, HE Xiao-Hai, HAN Jie
College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China
Foundation item: Science and Technology Program of Chengdu Municipality (2016-XT00-00015-GX); Science and Technology Research Program of Education Bureau, Sichuan Province (18ZB0355)
Abstract: Image semantic segmentation is one of the most important research directions of computer vision. Compared with traditional algorithms, image segmentation based on deep-learning performs better, and can be applied to the scene understanding stage of traffic monitoring and automatic drive. However, the speed of complex segmentation network on embedded platform is too low to be practically applied. Therefore, in view of the application of traffic monitoring and automatic drive, the image segmentation network based on deep convolutional encoder-decoder architecture was used to complete the road scene segmentation on the embedded platform NVIDIA Jetson TX2. Meanwhile, in order to accelerate the network, the model was simplified and transformed to engine based on TensorRT2 provided by NVIDIA, which including plugin layers adding and CUDA parallel optimization. The experimental results show that the speed-up ratio can reach ten, which provides support for the application of the complex structure segmentation network on the embedded platform.
Key words: scene understanding     deep-learning     Tensor RT2 semantic segmentation     NVIDIA Jetson TX2

1 网络结构与网络模型简化 1.1 网络结构

SegNet网络基本结构为自动编-解码器结构, 采用VGG16前13层卷积层作为编码器, 后接13层解码网络与一个分类层. SegNet关键在于存储了编码网络中每个的池化层中的最大值与其空间信息, 用于对应解码器的非线性上采样, 这极大精确了分割中的边界定位, 减少了编码器到解码器的参数量, 使得SegNet在速度与内存利用上都具有很大优势.

1.2 网络模型简化

 图 1 BN层合并

 ${y_i} \leftarrow \gamma \frac{{{x_i} - E(x)}}{{\sqrt {\operatorname{var} (x) + \varepsilon } }} + \beta$ (1)

2 基于TensorRT2的加速引擎构建

SegNet在TX2上的推理速度极慢, 因此本文采用NVIDIA推出的TensorRT2对网络模型进行加速.

2.1 基本流程

TensorRT是NVIDIA公司推出的深度学习网络推理引擎, 可优化已有网络模型, 大幅提升神经网络在如机器人、自动驾驶平台上的推理速度. 目前NVIDIA公司已推出四个版本的TensorRT. 不同版本TensorRT可支持的深度学习框架不同.

 图 2 加速引擎构建流程

2.2 自定义层添加

 图 3 plugin层插入流程

 图 4 plugin层结构

 图 5 plugin层实现过程

2.3 优化措施

 图 6 水平层集成

3 实验测试 3.1 网络训练

3.2 测试结果

 图 7 分割结果

4 结语

 [1] 陈鸿翔. 基于卷积神经网络的图像语义分割[硕士学位论文]. 杭州: 浙江大学, 2016. 8–10. [2] 吴宗胜, 傅卫平, 韩改宁. 基于深度卷积神经网络的道路场景理解. 计算机工程与应用, 2017, 53(22): 8-15. DOI:10.3778/j.issn.1002-8331.1708-0195 [3] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA. 2015. 3431–3440. [4] Badrinarayanan V, Kendall A, Cipolla R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12): 2481-2495. DOI:10.1109/TPAMI.2016.2644615 [5] Zhao HS, Shi JP, Qi XJ, et al. Pyramid scene parsing network. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA. 2017. 6230–6239. [6] Zhang C, Li P, Sun GY, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks. Proceedings of 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Monterey, CA, USA. 2015: 161–170. [7] 卢冶, 陈瑶, 李涛, 等. 面向边缘计算的嵌入式FPGA卷积神经网络构建方法. 计算机研究与发展, 2018, 55(3): 551-562. [8] Tijtgat N, Van Ranst W, Volckaert B, et al. Embedded real-time object detection for a UAV warning system. Proceedings of 2017 IEEE International Conference on Computer Vision Workshops. Venice, Italy. 2017. 2110–2118. [9] Hui XL, Bian J, Yu YJ, et al. A novel autonomous navigation approach for UAV power line inspection. Proceedings of 2017 IEEE International Conference on Robotics and Biomimetics. Macau, China. 2017. 634–639. [10] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA. 2012. 3354–3361. [11] Brostow G J, Fauqueur J, Cipolla R. Semantic object classes in video: A high-definition ground truth database. Pattern Recognition Letters, 2009, 30(2): 88-97. DOI:10.1016/j.patrec.2008.04.005 [12] Cordts M, Omran M, Ramos S, et al. The cityscapes dataset for semantic urban scene understanding. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 3213–3223.