﻿ 面向嵌入式设备的深度学习物体检测优化算法
 计算机系统应用  2019, Vol. 28 Issue (4): 163-169 PDF

Deep Learning Object Detection Optimization Algorithm for Embedded Devices
DAI Lei-Yan, FENG Jie, DONG Hui, YANG Xiao-Li
School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou 310018, China
Foundation item: Young Scientists Fund of National Natural Science Foundation of China (61501402)
Abstract: Along with the deep research on neural network, the object detection precision and speed are improved. But, computational cost is higher and higher with the deepening of network layer and increasing model volume, it cannot meet the needs that the neural network realizes fast forward reasoning directly in the embedded devices. In order to solve this problem, we study deep learning object detection optimization algorithm for embedded devices in this study. First, we choose the appropriate object detection algorithm and neural network frame structure. Then, the training and model pruning are carried out for the images collected under the specific detection scenario. Finally, the assembly instruction is optimized for the pruned object detection model transplanted to the embedded device. Compared with the original network model, the proposed model volume is reduced by 9.96% and the speed is accelerated by 8.82 times after comprehensive optimization.
Key words: deep learning     object detection     pruning     assembly optimization     embedded device

1 引言

2 优化流程

 图 1 优化算法流程图

2.1 网络结构选择

 $C = {D_K} \times {D_K} \times M \times N \times {D_F} \times {D_F}$ (1)

 $C' = {D_K} \times {D_K} \times M \times {D_F} \times {D_F} + M \times N \times {D_F} \times {D_F}$ (2)

 \begin{aligned} \frac{{C'}}{C} &= \frac{{{D_K} \times {D_K} \times M \times {D_F} \times {D_F} + M \times N \times {D_F} \times {D_F}}}{{{D_K} \times {D_K} \times M \times N \times {D_F} \times {D_F}}}\\ & = \frac{{\text{1}}}{N} + \frac{1}{{D_K^2}} \\ \end{aligned} (3)

 图 2 网络结构变化

2.2 模型剪枝

 $\left| {\Delta C({h_i})} \right| = \left| {C(D|{W'}) - C\left( {D|W} \right)} \right|$ (4)

 $\left| {\Delta C({h_i})} \right| = \left| {C(D,{h_i} = 0) - C\left( {D,{h_i}} \right)} \right|$ (5)

 $C(D,{h_i} = 0) = C(D,h{}_i) - \frac{{\delta C}}{{\delta {h_i}}}{h_i} + {R_1}({h_i} = 0)$ (6)

 图 3 模型剪枝流程图

 \begin{aligned} {\Theta _{TE}}({h_i}) &= \left| {\Delta C({h_i})} \right| = \left| {C(D,{h_i}) - \frac{{\delta C}}{{\delta {h_i}}}{h_i} - C(D,{h_i})} \right| \\ & = \left| {\frac{{\delta C}}{{\delta {h_i}}}{h_i}} \right| \\ \end{aligned} (7)

 ${\Theta _{TE}}(z_l^{(k)}) = \left| {\frac{1}{M}\sum\limits_m {\frac{{\delta C}}{{\delta z_{l,m}^{(k)}}}z_{l,m}^{(k)}} } \right|$ (8)
2.3 汇编优化

(1) 指令调整: 通过展开循环对装载指令进行人工优化, 仔细安排装载指令的时间次序, 防止流水线终止.

(2) 寄存器分配: 限制局部变量的个数; 把多个局部变量存放在一个寄存器中.

(3) 条件执行: 使用ARM处理器特有的条件执行指令来减少判断跳转和分支等对流水线影响较大的操作.

3 实验分析 3.1 数据集制作

 图 4 训练样本

 图 5 数据标注

3.2 训练优化

MobileNets-SSD完成训练后进行基于一阶泰勒展开的模型剪枝. 控制裁剪的整体过程为: 1)前向传播2)获取排序后的卷积窗口3)计算需要剪枝的卷积窗口个数4)裁剪. 主要函数如下所示:

(1) forward

(2) compute_rank

(3) normalize_ranks_per_layer

(4) get_prunning_plan

(5) lowest_ranking_filters

3.3 ARM平台优化

(1) .macro MobileNets-SSD

(2) vldl.32 {d16-d19},[BO]!

(3) vldl.32 {d0-d3},[AO]!

(4) vldl.32 {d16-d19},[BO]!

(5) vldl.32 {d4-d7},[AO]!

(6) vmla.f32 q12,q0,d16[0]

(7) vmla.f32 q12,q2,d18[0]

(8) vmla.f32 q12,q3,d20[0]

(9) vmla.f32 q12,q4,d22[0]

(10) ...

(11) vstl.32 {d24-d27},[CO]!

(12) vstl.32 {d28-d31},[CO]!

(13) .endm

3.4 检测结果

(1) 遮挡物过多, 未能检测到整个物体;

(2) 离USB摄像头太远, 暴露面积太小;

(3) 只暴露物体部分特征, 前向推理困难等.

 图 6 检测结果

4 结论与展望

(1) MobileNets V2的准确率和速率都有提高, 可以在此网络结构上进行物体检测算法优化.

(2) 因为模型中参数的存储精度为32位的浮点数, 可以在ARM平台上针对模型参数进行量化, 更大程度地压缩模型, 加快前向推理速率.

 [1] 胡仕玲, 顾爽, 陈启军. 基于HOG的物体分类方法. 华中科技大学学报(自然科学版), 2011, 39(S2): 124-126, 130. [2] 潘子昂. 基于SIFT算法的图像匹配研究[硕士学位论文]. 西安: 西安电子科技大学, 2012. [3] 艾扬利, 赵忠芹, 杨兵. 基于新核函数的支持向量机在物体分类中的应用. 中国测试技术, 2008, 34(1): 80-83. [4] Lowd D, Domingos P. Naive Bayes models for probability estimation. Proceedings of the 22nd International Conference on Machine Learning. Bonn, Germany. 2005. 529–536. [5] 邓江帆. 基于学习的目标检测及应用[硕士学位论文]. 北京: 北京邮电大学, 2017. [6] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA. 2014. 580–587. [7] Girshick R. Fast R-CNN. Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile. 2015. 1440–1448. [8] Ren SQ, He KM, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada. 2015. 91–99. [9] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 779–788. [10] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot MultiBox detector. Proceedings of 14th European Conference on Computer Vision. Amsterdam, The Netherlands. 2016. 21–37. [11] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278-2324. DOI:10.1109/5.726791 [12] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, NV, USA. 2012. 1097–1105. [13] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: 1409.1556, 2014. [14] Lin M, Chen Q, Yan S. Network in network. arXiv preprint arXiv: 1312.4400, 2013. [15] Szegedy C, Liu W, Jia Y Q, et al. Going deeper with convolutions. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA. 2015. 1–9. [16] He KM, Zhang XY, Ren SQ, et al. Deep residual learning for image recognition. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 770–778. [17] Howard AG, Zhu ML, Chen B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv: 1704.04861, 2017. [18] 黄萱昆. 基于深度学习的移动端图像识别算法[硕士学位论文]. 北京: 北京邮电大学, 2018. [19] 李晓云, 周聪. 基于ARM9TDMI的汇编优化方法. 计算机与现代化, 2007(2): 25-27, 31. DOI:10.3969/j.issn.1006-2475.2007.02.009 [20] Molchanov P, Tyree S, Karras T, et al. Pruning convolutional neural networks for resource efficient transfer learning. arxiv preprint arXiv: 1611.06440, 2017.