计算机系统应用  2020, Vol. 29 Issue (8): 158-164 PDF

1. 国网山东省电力公司, 济南 250001;
2. 山东鲁能软件技术有限公司, 济南 250001;
3. 中国石油大学(华东) 计算机科学与技术学院, 青岛 266580

Illegal Operation Detection in Electric Maintenance Based on Improved Mask RCNN
SHEN Mao-Dong1, ZHOU Wei1, SONG Xiao-Dong1, PEI Jian1, DENG Hao2, MA Chao2, FANG Kai3
1. State Grid Shandong Electric Power Company, Jinan 250001, China;
2. Shandong Luneng Software Technology Co.Ltd, Jinan 250001, China;
3. College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
Foundation item: National Key Research and Development Program of China (2017ZX05013-002); Natural Science Foundation of Shandong Province (ZR2019MF049)
Abstract: The norm of opereation in electric power maintenance is related to the personal safety of the staff, and is very im-portant to the development of electric power industry. In order to detect the illegal operation behavior of power maintenance workers from the perspective of computer vision, a multi-tasking and multi-branch illegal behavior detection algorithm was designed based on the Mask RCNN algorithm. It integrates target detection, key point detection and instance segmentation tasks, and performs parallel target detection. Detect and obtain the frame coordinates, keypoints, and mask information of the target. The experimental result demonstrates that this algorithm has significantly improved the precision in instance segmentation and key point detection, has higher accuracy and robustness compared with Mask RCNN. And it meets the accuracy requirements of actual deployment in power maintenance violation detection.
Key words: multi-branch network     deep learning     behavior analysis     object detection     illegal operation detection

1 引言

2 相关工作

1)用AoIAlign代替RoIPooling, 实现了特征图的像素级对齐;

3.1 数据增强

 图 1 添加高斯噪声图像示例

 图 2 图像生成模型结构示意图

 图 3 生成图像示例

3.3 损失函数设计

 $Loss={L}_{\rm cls}+{\delta L}_{\rm bbox}+{\varepsilon L}_{\rm mask}+{\theta L}_{\rm kps}$ (1)

${L}_{\rm cls}$ ${L}_{\rm bbox}$ 即全连接层预测出的每个RoI的所属类别及其矩形框坐标值误差. 类别损失函数如式(2)所示, 其中 ${p}_{i}$ 表示预测为目标的概率, ${p}_{i}^{*}$ 用0和1表示是否为真实目标.

 ${L_{\rm cls}} = \sum\nolimits_i {\left\{ { - {\rm{log}}\left[ {p_i^*{p_i} + \left( {1 - p_i^*} \right)\left( {1 - {p_i}} \right)} \right]} \right\}}$ (2)

 ${L_{\rm bbox}} = \sum\nolimits_i {p_i^*} \left[ {\begin{array}{*{20}{c}} {{x_i}\log x_i^* + }\\ {{y_i}\log y_i^* + }\\ {\sqrt {{w_i}} \log \sqrt {w_i^*} + }\\ {\sqrt {{h_i}} \log \sqrt {h_i^*} + }\\ {\left( {1 - {x_i}} \right)\left( {1 - \log x_i^*} \right) + }\\ {\left( {1 - {y_i}} \right)\left( {1 - \log y_i^*} \right) + }\\ {\left( {1 - \sqrt {{w_i}} } \right)\left( {1 - \log \sqrt {w_i^*} } \right) + }\\ {\left( {1 - \sqrt {{h_i}} } \right)\left( {1 - \log \sqrt {h_i^*} } \right)} \end{array}} \right]$ (3)

Mask损失函数的设计主要侧重为: 将Mask划分至正确的类别, 并对前景对象类别的Mask进行回归. 但是在一个损失函数中难以同时实现两个功能, 并且Mask类别与目标分类分支的类别是一致的. 为了简化Mask类别损失可以直接取分类损失 ${L}_{\rm cls}$ , 而边界损失则可以用真实Mask与经过某一阈值二值化的预测Mask的交并比表示. 此时Mask损失函数可如下所示:

 ${L}_{\rm mask}={L}_{\rm cls}{*}MaskIoU$ (4)

 $MaskIoU=\dfrac{{mask}_{\rm gt}\cap {mask}_{\rm pre}}{{mask}_{\rm gt}\cup {mask}_{\rm pre}}$ (5)

 ${L}_{\rm kps}={Smooth}_{L1}\left({kp}_{i}-{kp}_{i}^{*}\right)$ (6)

 $Smoot{h_{L1}}\left( x \right) = \left\{ {\begin{array}{*{20}{c}} {0.5{x^2},\left| x \right| < 1}\\ {\left| x \right| - 0.5,\left| x \right| > 1} \end{array}} \right.$ (7)
4 实验与分析

4.2 测试超参数选择

 $Precision=\dfrac{TP}{TP+FP}$ (8)
 $Recall=\dfrac{TP}{TP+FN}$ (9)

4.3 数据增强

5 结论与展望

 [1] 刘洋. 略论电力施工和检修的安全控制策略. 河南科技, 2015(22): 193. DOI:10.3969/j.issn.1003-5168.2015.22.155 [2] He KM, Gkioxari G, Dollár P, et al. Mask R-CNN. Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy. 2017. 2980–2988. [3] Lamburt L, Koyfman L. Data enhancement techniques: USA, 6397228. 2002-05-28. [4] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, NV, USA. 2012. 1097–1105. [5] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436-444. DOI:10.1038/nature14539 [6] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector. Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands. 2016. 21–37. [7] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 779–788. [8] Law H, Deng J. Cornernet: Detecting objects as paired keypoints. Proceedings of the 15th European Conference on Computer Vision. Munich, Germany. 2018. 765–781. [9] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA. 2014. 580–587. [10] Girshick R. Fast R-CNN. Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile. 2015. 1440–1448. [11] Ren SQ, He KM, Girshick R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks. Proceedings of Advances in Neural Information Processing Systems. Montreal, QB, Canada. 2015. 91–99. [12] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA. 2015. 3431–3440. [13] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany. 2015. 234–241. [14] Amos B, Ludwiczuk B, Satyanarayanan M. OpenFace: A general-purpose face recognition library with mobile applications. Pittsburgh: Carnegie Mellon University, 2016. 6. [15] Schroff F, Kalenichenko D, Philbin J. Facenet: A unified embedding for face recognition and clustering. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA. 2015. 815–823. [16] Cao Z, Simon T, Wei SE, et al. Realtime multi-person 2D pose estimation using part affinity fields. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA. 2017. 1302–1310. [17] Goodfellow IJ, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge, UK. 2014. 2672–2680. [18] Zhang HY, Cisse M, Dauphin YN, et al. Mixup: Beyond empirical risk minimization. arXiv: 1710.09412, 2017. [19] Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv: 1511.06434, 2015.