1. 复旦大学 计算机科学与技术学院, 上海 201203;
2. 上海视频技术与系统工程研究中心, 上海 201203

XU Rui1,2, FENG Rui1,2
1. School of Computer Science, Fudan University, Shanghai 201203, China;
2. Shanghai Engineering Research Center for Video Technology and System, Shanghai 201203, China
Foundation item: National Key Research and Development Program of China (2017YFC0803702)
Abstract: In order to improve the accuracy of the human pose estimation task of convolutional neural networks, we propose an improved loss function based on Mean Squared Error (MSE) to deal with the pixel imbalance between foreground (Gaussian kernel) and background in heatmaps, assign different weights to the loss function according to different pixel values in the foreground and background, and named it Focus Mean Squared Error (FMSE). Compared with the mean squared loss function, the proposed focused mean squared loss function can effectively reduce the impact of pixel imbalance between foreground and background on network performance, help the network locate the spatial location of key points, improve network performance, and make the loss function converge faster in the training phase. Experiments are performed on public data sets to verify the effectiveness of the proposed focused mean square loss function.
Key words: deep learning     loss function     human pose estimation     key point detection     sample imbalance

1 相关工作

2 聚焦均方损失函数

 $f = {e^{ - \frac{{{{\left( {x - {x_0}} \right)}^2} + {{\left( {y - {y_0}} \right)}^2}}}{{2 \times {\delta ^2}}}}}$ (1)

 图 1 热点图

 ${{ Cross}}\_{{E}}ntropy\_Loss = \left\{ {\begin{array}{*{20}{l}} {{{\log }_2}y'}&{y = 1}\\ {{{\log }_2}(1 - y')}&{y = 0} \end{array}} \right.$ (2)
 ${{Focal}}\_{\mathop{ Loss}\nolimits} = \left\{ {\begin{array}{*{20}{l}} { - \alpha {{\left( {1 - {y^{\prime} }} \right)}^\gamma }{{\log }_2}{y^{\prime} }}&{y = 1}\\ { - (1 - \alpha ){y^{\prime \gamma }}{{\log }_2}\left( {1 - {y^{\prime} }} \right)}&{y = 0} \end{array}} \right.$ (3)

 $MSE\_Loss = \frac{1}{2}\sum\limits_{i = 1}^n {\left( {{{y'}_i} - {y_i}} \right)}$ (4)
 $FMSE\_Loss = \frac{1}{2}\sum\limits_{i = 1}^n {{{\left( {{y_i} + \delta } \right)}^\gamma }\left( {{{y'}_i} - {y_i}} \right)}$ (5)

 图 2 均方损失函数与聚焦均方损失函数图像

 图 3 聚焦均方损失函数的γ值影响

3 实验及分析

3.1 实验所选用的网络

 图 4 沙漏网络结构

 图 5 高分辨率网络结构

HRNet结构分为纵向Depth和横向Scale两个维度, 横向上不同分辨率子网络并行, 纵向上进行多分辨率信息融合, 从上到下, 每个stages分辨率减半, 通道数加倍.

3.2 MPII和MSCOCO数据集

3.3 训练与测试信息

3.4 实验环境

3.5 实验结果及分析

 图 6 MSCOCO数据集上训练与验证信息

 图 7 MSCOCO数据集上训练与验证信息

 图 8 关键点检测结果示例

4 总结与展望

