Road Scene Segmentation Based on NVIDIA Jetson TX2
LI Shi-Jing, QING Lin-Bo, HE Xiao-Hai, HAN Jie
College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China
Foundation item: Science and Technology Program of Chengdu Municipality (2016-XT00-00015-GX); Science and Technology Research Program of Education Bureau, Sichuan Province (18ZB0355)
Abstract: Image semantic segmentation is one of the most important research directions of computer vision. Compared with traditional algorithms, image segmentation based on deep-learning performs better, and can be applied to the scene understanding stage of traffic monitoring and automatic drive. However, the speed of complex segmentation network on embedded platform is too low to be practically applied. Therefore, in view of the application of traffic monitoring and automatic drive, the image segmentation network based on deep convolutional encoder-decoder architecture was used to complete the road scene segmentation on the embedded platform NVIDIA Jetson TX2. Meanwhile, in order to accelerate the network, the model was simplified and transformed to engine based on TensorRT2 provided by NVIDIA, which including plugin layers adding and CUDA parallel optimization. The experimental results show that the speed-up ratio can reach ten, which provides support for the application of the complex structure segmentation network on the embedded platform.
Key words: scene understanding     deep-learning     Tensor RT2 semantic segmentation     NVIDIA Jetson TX2

1 网络结构与网络模型简化 1.1 网络结构

SegNet网络基本结构为自动编-解码器结构, 采用VGG16前13层卷积层作为编码器, 后接13层解码网络与一个分类层. SegNet关键在于存储了编码网络中每个的池化层中的最大值与其空间信息, 用于对应解码器的非线性上采样, 这极大精确了分割中的边界定位, 减少了编码器到解码器的参数量, 使得SegNet在速度与内存利用上都具有很大优势.

1.2 网络模型简化

 图 1 BN层合并

 ${y_i} \leftarrow \gamma \frac{{{x_i} - E(x)}}{{\sqrt {\operatorname{var} (x) + \varepsilon } }} + \beta$ (1)

2 基于TensorRT2的加速引擎构建

SegNet在TX2上的推理速度极慢, 因此本文采用NVIDIA推出的TensorRT2对网络模型进行加速.

2.1 基本流程

TensorRT是NVIDIA公司推出的深度学习网络推理引擎, 可优化已有网络模型, 大幅提升神经网络在如机器人、自动驾驶平台上的推理速度. 目前NVIDIA公司已推出四个版本的TensorRT. 不同版本TensorRT可支持的深度学习框架不同.

 图 2 加速引擎构建流程

2.2 自定义层添加

 图 3 plugin层插入流程

 图 4 plugin层结构

 图 5 plugin层实现过程

2.3 优化措施

 图 6 水平层集成

3 实验测试 3.1 网络训练

3.2 测试结果

 图 7 分割结果

4 结语

