﻿ 基于Cascade结构的牛脸姿态估计
 计算机系统应用  2019, Vol. 28 Issue (7): 240-245 PDF

Gesture Estimation of Cattle Face Based on Cascade Structure
GOU Xian-Tai, HUANG Wei, LIU Qi-Fen
School of Electrical Engineering, Southwest Jiaotong University, Chengdu 611756, China
Foundation item: Science and Technology Major Program of Sichuan Province (18ZDZX0162); Major Science and Technology Research and Development Plan of Sichuan Province (2017GZ0159)
Abstract: The single-angle feature coding identity authentication method cannot meet the current demand in terms of data capacity because of the increasing number of cattle. A cascade structure is used to detect the cattle’s face and then estimate the angle of the cattle’s face, which build a solid feature base for multi-angle feature coding. The result of experiments shows that the cascade structure can obtain a higher accuracy in both the face detection and attitude angle estimation tasks.
Key words: cascade     cattle face detection     gesture angle

1 cascade结构组成及原理

cascade结构流程如图1所示.

 图 1 cascade结构主流程图

1.1 牲畜脸部检测

1.1.1 SSD网络结构

SSD网络结构如图2所示, 模型基于VGG-16, 将最后两层全连接层变为卷积层, 之后添加FCN网络进行特征提取. 分别抽取Conv4_3、Conv7、Conv8_2、Conv9_2、Conv10_2、Conv11_2层的feature map进行多尺度特征提取.

 图 2 SSD网络结构

1.1.2 损失函数

SSD的损失函数由两部分组成: 分类置信度 ${L_{conf}}$ 和坐标误差 ${L_{loc}}$ .

 $\left\{\begin{split} &L\left( {x,c,l,g} \right) = \frac{1}{N}\left( {{L_{conf}}\left( {x,c} \right) + a{L_{loc}}\left( {x,l,g} \right)} \right)\\ &L\left( {x,c,l,g} \right) = \frac{1}{N}\left( {{L_{conf}}\left( {x,c} \right) + \alpha {L_{loc}}\left( {x,l,g} \right)} \right)\\ &L\left( {x,c,l,g} \right) = \frac{1}{N}\left( {{L_{conf}}\left( {x,c} \right) + a{L_{loc}}\left( {x,l,g} \right)} \right) \end{split}\right.$ (1)

 $\begin{split}{L_{conf}}\left( {x,c} \right) &= - \mathop \sum \limits_{i \in Pos}^N x_{ij}^p\log \left( {\hat c_i^p} \right) - \mathop \sum \limits_{i \in Neg} \log \left( {\hat c_i^0} \right)\;where\;\hat c_i^p \!\!\!\!\!\!\!\!\!\!\!\!\\ &= \frac{{{\rm{exp}}\left( {c_i^p} \right)}}{{\mathop \sum \nolimits_p {\rm{exp}}\left( {c_i^p} \right)}}\end{split}$ (2)

 $\left\{ \begin{split} &{{L_{loc}}\left( {x,l,g} \right) = \mathop \sum \limits_{i \in Pos}^N \mathop \sum \limits_{m \in \left\{ {cx,cy,w,h} \right\}} x_{ij}^k\;{\rm{smoot}}{{\rm{h}}_{L1}}\left( {l_i^m - \hat g_j^m} \right)}\\ &{\hat g_j^{cx} = \left( {g_j^{cx} - d_i^{cx}} \right)/d_i^w\;\;\;\;\hat g_j^{cy} = \left( {g_j^{cy} - d_i^{cy}} \right)/d_i^h}\\ &{\hat g_j^w = {\rm{log}}\left( {\frac{{g_j^w}}{{d_i^w}}} \right)\;\;\;\;\;\;\hat g_j^h = {\rm{log}}\left( {\frac{{g_j^h}}{{d_i^h}}} \right)}\\ &{where\;{\rm{smoot}}{{\rm{h}}_{L1}}\left( x \right) = \left\{ {\begin{array}{*{20}{l}} {0.5{x^2}\;\;\;\;\;\;\;if\left| x \right| < 1}\\ {\left| x \right| - 0.5\;\;\;otherwise} \end{array}} \right.} \end{split}\right.$ (3)

1.1.3 困难负样本挖掘

1.2 牛脸姿态估计

 图 3 MobileNet网络模型

MobileNet的网络结构如图3所示, 使用了大量的1×1卷积与深度可分离卷积, 减少了大量参数.

1.2.1 深度可分离卷积

 图 4 深度可分离卷积

 ${D_k} \times {D_k} \times M \times {D_F} \times {D_F} + N \times M \times {D_F} \times {D_F}$ (4)
 图 5 1×1卷积

 ${D_k} \times {D_k} \times N \times M \times {D_F} \times {D_F}$ (5)

 $\frac{{{D_k} \times {D_k} \times M \times {D_F} \times {D_F} + N \times M \times {D_F} \times {D_F}}}{{{D_k} \times {D_k} \times M \times N \times {D_F} \times {D_F}}} = \frac{1}{N} + \frac{1}{{D_k^2}}$ (6)

 ${D_k} \times {D_k} \times \alpha M \times \beta {D_F} \times \beta {D_F} + \alpha N \times \alpha M \times \beta {D_F} \times \beta {D_F}$ (7)

1.2.2 损失函数

 ${L_\delta }\left( {y,f\left( x \right)} \right) = \left\{ {\begin{array}{*{20}{l}} {\dfrac{1}{2}{{\left( {y - f\left( x \right)} \right)}^2}{\rm{ }}if\left| {y - f\left( x \right)} \right| \le \delta }\\ {\delta \cdot \left( {\left| {y - f\left( x \right)} \right| - \dfrac{1}{2}\delta } \right),\;{\rm{otherwise}}} \end{array}} \right.$

Huber loss相较于传统的L2 loss有更强的鲁棒性, 当残差(residual)很小的时候, loss函数为L2范数, 残差大的时候, 为L1范数的线性函数, 所以Huber loss对于离群点不敏感, 不易发生梯度爆炸的问题. 同时, 超参 $\delta$ 可以对Huber loss的函数曲线进行调整, 使之更适合模型的训练.

2 实验结果与分析 2.1 训练集与测试集

2.2 实验平台

2.3 训练

2.4 SSD模型试验结果与分析

 $P = \frac{{{\text{预测正确的牛脸框数}}}}{{{\text{预测的牛脸框数}}}}$
 $R = \frac{{{\text{预测正确牛脸框数}}}}{{{\text{真实牛脸框数}}}}$
 ${{F}} = \frac{{2 \cdot P \cdot R}}{{P + R}}$

 图 6 SSD检测效果图

2.5 MobileNet角度预测实验结果与分析

MobileNet的训练数据为SSD训练数据按照groundtruth进行切分, 切下来的牛脸使用blender软件进行标注, 获得牛脸的x, y, z三个角度. 为增加实验的对比度, 本文使用不同的宽度因子α以及分辨率因子β. 可以评价指标为预测得到的x, y, z三个角度与其对应ground truth的平均误差. 实验结果如下:

3 结论与展望

