计算机系统应用  2018, Vol. 27 Issue (12): 109-115 PDF

1. 四川大学 电子信息学院, 成都 610065;
2. 东莞前沿技术研究院, 东莞 523000

Fast Human Pose Estimation Based on Optical Flow
ZHOU Wen-Jun1, ZHENG Xin-Bo2, QING Lin-Bo1, XIONG Wen-Shi1, WU Xiao-Hong1
1. College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China;
2. Dongguan Institute of Advanced Technology, Dongguan 523000, China
Foundation item: Social Science and Technology Development Project of Dongguan City (2017507102428)
Abstract: Aiming at the problem of high computational complexity of human pose estimation algorithm in deep learning field, a fast human pose estimation algorithm based on optical flow is proposed. Based on the original algorithm, using the time correlation between video frames, the original video sequence is divided into key frames and non-key frames, which are processed respectively (the images between two adjacent key frames and the forward key frame compose a video frame group, which is similar to the frames in the same video frame group), the human pose estimation algorithm is applied only to the key frames, and the key frame recognition result is propagated to other non-key frames through the lightweight optical flow field. Secondly, aiming at the dynamic characteristics of the video field, this study proposes an adaptive key frame detection algorithm based on local optical flow to determine the position of the key frame of video according to the local time-domain characteristics of the video. The experimental results in OutdoorPose and HumanEvaI data sets show that the detection performance of the proposed algorithm is slightly higher than the original algorithm in the video sequences with complex background and component occlusion. The detection speed is increased by 89.6% in average.
Key words: human pose estimation     deep learning     optical flow     adaptive key frame

1 基于光流的快速人体姿态估计 1.1 视频帧姿态相关性分析

1.2 基于光流的快速人体姿态估计框架

 图 1 视频帧间相关性及人体姿态相关性效果图

 图 2 基于光流的快速人体姿态估计

 $\left\{ \begin{gathered} Flo{w_i} = flow(Fram{e_I},Fram{e_i}) \\ Pose_i' = add(Pos{e_I},Flo{w_i}) \\ \end{gathered} \right.$ (1)

1.2.1 自适应关键帧检测算法

 ${f_i}(x,y) = ({v_{\bar x}}(x,y),{v_{\bar y}}(x,y))$ (2)
 $Local\_sum(f) = \mathop {add}\limits_{(x,y) \in mask} \sqrt {{v_{\bar x}}{{(x,y)}^2} + {v_{\bar y}}{{(x,y)}^2}}$ (3)
 $Local\_max\left( {x,y} \right) = \mathop {max}\limits_{(x,y) \in s} \sqrt {{v_{\bar x}}{{(x,y)}^2} + {v_{\bar y}}{{(x,y)}^2}}$ (4)

 图 3 矩形掩模区域

 $\left\{ \begin{gathered} Local\_sum{\rm{\_}}T = mask\_sum*m \\ Local\_max\_T = 10 \\ \end{gathered} \right.$ (5)

 $\left\{ \begin{array}{l}Local\_sum\left( f \right) < = Local\_sum\_T\\Local\_max\left( {x,y} \right) < = Local\_max\_T\end{array} \right.$ (6)

1.2.2 关键点局部融合优化

Flownet2-c算法效果

 $\left\{ \begin{array}{l}Df({x_i},{y_i}) = \displaystyle\frac{1}{{25}}\displaystyle\sum\limits_{l = - 2}^2 {\displaystyle\sum\limits_{n = - 2}^2 {f({x_i} + l,{y_i} + n)} } \\{P'}\left( {{x_i},{y_i}} \right) = add(P({x_K},{y_K}) + Df({x_i},{y_i}))\end{array} \right.$ (7)

2 实验结果及分析 2.1 实验设置

 图 4 Flownet2-c算法效果

 $Fps = nFrame/\sum\limits_{i = 1}^{nFrame} {{t_i}}$ (8)
 $PCP = \frac{{pos{e_{\rm{true}}}}}{{pos{e_{\rm{all}}}}} \times 100\%$ (9)

2.2 结果分析

 图 5 部分姿态估计效果图

 图 6 数据集部分效果图

3 结论与展望

 [1] 代钦, 石祥滨, 乔建忠, 等. 结合遮挡级别的人体姿态估计方法. 计算机辅助设计与图形学学报, 2017, 29(2): 279-289. DOI:10.3969/j.issn.1003-9775.2017.02.009 [2] 田国会, 尹建芹, 韩旭, 等. 一种基于关节点信息的人体行为识别新方法. 机器人, 2014, 36(3): 285-292. [3] 韩贵金, 朱虹. 基于HOG和颜色特征融合的人体姿态估计. 模式识别与人工智能, 2014, 27(9): 769-777. DOI:10.3969/j.issn.1003-6059.2014.09.001 [4] Zhang ZY. Microsoft kinect sensor and its effect. IEEE Multimedia, 2012, 19(2): 4-10. DOI:10.1109/MMUL.2012.24 [5] 范国娟, 范国卿, 柳絮青. HOGG: 基于Gabor变换与HOG特征的人体检测. 微型机与应用, 2016, 35(21): 14-15, 19. [6] 薄一航, Hao J. 视频中旋转与尺度不变的人体分割方法. 自动化学报, 2017, 43(10): 1799-1809. [7] 徐建强, 陆耀. 一种基于加权时空上下文的鲁棒视觉跟踪算法. 自动化学报, 2015, 41(11): 1901-1912. [8] Cao Z, Simon T, Wei SE, et al. Realtime Multi-Person 2D pose estimation using part affinity fields. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA. 2017. 1302–1310. [9] Pfister T, Charles J, Zisserman A. Flowing convnets for human pose estimation in videos. Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile. 2015. 1913–1921. [10] He KM, Gkioxari G, Dollár P, et al. Mask R-CNN. Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy. 2017. 2980–2988. [11] Charles J, Pfister T, Magee D, et al. Personalizing human video pose estimation. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA. 2016. 3063–3072. [12] Han S, Mao HZ, Dally WJ. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv:1510.00149, 2016. [13] Zuffi S, Romero J, Schmid C, et al. Estimating human pose with flowing puppets. Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, NSW, Australia. 2014. 3312–3319. [14] Kang D, Emmons J, Abuzaid F, et al. NoScope: Optimizing neural network queries over video at scale. Proceedings of the VLDB Endowment, 2017, 10(11): 1586-1597. DOI:10.14778/3137628 [15] Mabrouk AB, Zagrouba E. Spatio-temporal feature using optical flow based distribution for violence detection. Pattern Recognition Letters, 2017, 92: 62-67. DOI:10.1016/j.patrec.2017.04.015 [16] llg E, Mayer N, Saikia T, et al. Flownet 2.0: Evolution of optical flow estimation with deep networks. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA. 2017. 1647–1655. [17] Jia YQ, Shelhamer E, Donahue J, et al. Caffe: Convolutional architecture for fast feature embedding. Proceedings of the 22nd ACM international conference on Multimedia. Orlando, FL, USA. 2014. 675–678. [18] Ramakrishna V, Kanade T, Sheikh Y. Tracking human pose by tracking symmetric parts. Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA. 2013. 3728–3735. [19] Sigal L, Balan AO, Black MJ. Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision, 2010, 87(1–2): 4-27. [20] Ferrari V, Marin-Jimenez M, Zisserman A. Progressive search space reduction for human pose estimation. Proceedings of 2018 IEEE Computer Vision and Pattern Recognition. Anchorage, AK, USA. 2008. 1–8.