﻿ 基于融合机制的多模型神经网络人物群体分类模型
 计算机系统应用  2020, Vol. 29 Issue (8): 127-134 PDF

1. 北京师范大学珠海分校 工程技术学院, 珠海 519087;
2. 北京师范大学研究生院珠海分院, 珠海 519087

Human Group Classification Model Based on Multi-Model-Integrated CNN
LANG Bo1, ZHANG Na2, DUAN Xin-Xin2
1. School of Engineering Technology, Beijing Normal University, Zhuhai, Zhuhai 519087, China;
2. Zhuhai Branch of Graduate School, Beijing Normal University, Zhuhai 519087, China
Foundation item: National Natural Science Foundation of China (61375122); Innovation Empowering School Characteristics Innovation Project of Guangdong Province (201712009QX)
Abstract: Effectively identifying the different group of human in an image or video is an important part of intelligent image analysis. It is how to obtain “effective features” in the image. Based on the convolution neural network model, this study proposes a multi-model fusion convolution neural network method. The model trained by ImageNet participates in the initialization of the weights of the neural network model, achieves more effective features on the premise of effectively saving time and resource calculating costs. Experiments prove that the model can maintain the recognition accuracy of adult males, adult females, and children in natural scenes at about 85%, which improves the accuracy and reliability of group classification.
Key words: image analysis     efficient feature     Convolution Neural Network (CNN)     multi-model-integrated     group classification

1 解决问题

(1)选择卷积神经网络能否对人物特征(年龄、性别)的分类有促进作用?

(2)如何得到较为全面的人物特征来促进年龄和性别的分类问题?

2 算法模型

 $P_j^t = f\left( {\sum\limits_{i \in {P_j}}^N {K_{i,j}^t * P_j^{t - 1} + q_j^t} } \right)$ (1)

 $F = f\left( {simple\left( {y_i^{t - 1}} \right) \cdot \omega _j^t + b_j^t} \right)$ (2)

 图 1 多模型融合卷积神经网络模型

2.1 特征抽取层

2.2 全局平均池化层

2.3 Dropout+ReLU层

 图 2 神经网络训练流程

Hinton等的研究表明过拟合现象可以通过阻止某些特征的协同作用来进行缓解[14]. 通过dropout过程的参与, 可以有效减少神经元之间的共适应性, 降低神经元的互相依赖性, 从而增强计算模型的鲁棒性. 经过实验验证, 为了减少特征之间的相关性, 本文中的dropout比率设置为0.5, 假设网络中的神经元个数为N, 其激活函数输出值个数也为N, 则该层神经元经过dropout后, 大约0.5×N的值被置为0. 其各部分的数学定义为:

 $r_j^{(r)}\sim Bernoulli(p){\rm{ }}$ (3)

 ${y^{*(l)}} = {r^{(l)}} \times {y^{(l)}}$ (4)

 ${\rm{Z}}_i^{(l + 1)} = w_i^{(l + 1)}{y^{*(l)}} + b_i^{(l + 1)}$ (5)

 $y_i^{(l + 1)} = f\left( {{\textit{z}}_i^{(l + 1)}} \right)$ (6)

2.4 全连接层

 ${P_j} = \frac{{{e^{x_j^L}}}}{{\displaystyle\sum\limits_{i = 1}^M {{e^{x_i^L}}} }}{\rm{ }}\left( {j = 1, \cdot \cdot \cdot ,M} \right)$ (7)

 $L = - \frac{1}{N}\sum\limits_{j = 1}^N {{{\overset{\frown} p}_j}} \log \left( {{p_j}} \right)$ (8)

2.5 误差修正

 $Vt = - \eta \times d\theta + {V_{t - 1}}\times momentum$ (9)

3 实验方法 3.1 测试库的选择

3.2 数据预处理

 图 3 本文采用的数据集部分样例

(1) train (train_nb_samples,model_ouput).

(2) val (val_nb_samples, model_ouput).

(3) test(test_nb_samples, model_ouput).

(4) train_label (train_nb_samples).

(5) val_label(val_nb_samples).

3.3 融合训练

① 加载在ImageNet数据集上预先训练好的模型参数, 进行基准模型权重迁移;

② 利用预训练的模型不同层结构, 进行模型训练获取数据集(训练集, 验证集, 测试集)的特征向量;

③ 载入并融合不同模型训练的特征向量, 完成对样本特征更全面提取;

④ 使用两层ReLU+dropout层和一层Softmax形成新的网络进行训练, 计算每个样本图像的实际输出类别;

⑤ 根据损失函数loss function(衡量在单个训练样本上的表现)及成本函数cost function(衡量在全体样本训练样本上的表现), 计算出网络训练反馈误差;

⑥ 计算网络模型中的权重和偏置得到梯度值;

⑦ 使用SGD with momentum 动量随机梯度下降法, 更新模型参数(权值和偏置)以及模型参数的更新变量 ${v_t}$ ;

⑧ 输出并保存网络模型的权重与偏置.

(1) 定义模型评估指标值:

 $Accuracy = \frac{{TP + TN}}{{TP + TN + FP + FN}}$ (10)

(2) 定义 $F - Score$ $precision$ (精确率)与 $recall$ (召回率)的加权调和平均值:

 $F - score = \left( {1 + {\beta ^2}} \right) \times \frac{{precision \times recall}}{{\left( {{\beta ^2} \times precision} \right) + recall}}$ (11)

$\beta = 1$ 时, 将 $F - Score$ 设为:

 $F1 = \frac{{2 \times precision \times recall}}{{precision + recall}}$

4 实验结果与性能评估

5 结论与展望

 图 4 融合模型对自然场景下的人物性别热力分析

 图 5 融合模型对特殊条件下的人物性别热力分析

 图 6 损失率、准确率和F1曲线图

 [1] Graves A, Mohamed AR, Hinton G. Speech recognition with deep recurrent neural networks. Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver, BC, Canada. 2013. 6645−6649. [2] 刘建伟, 刘媛, 罗雄麟. 深度学习研究进展. 计算机应用研究, 2014, 31(7): 1921-1930, 1942. DOI:10.3969/j.issn.1001-3695.2014.07.001 [3] Najafabadi MM, Villanustre F, Khoshgoftaar TM, et al. Deep learning applications and challenges in big data analytics. Journal of Big Data, 2015, 2(1): 1. DOI:10.1186/s40537-014-0007-7 [4] Tian Q, Arbel T, Clark J. Deep LDA-pruned nets for efficient facial gender classification. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, HI, USA. 2017. 512–521. [5] Eidinger E, Enbar R, Hassner T. Age and gender estimation of unfiltered faces. IEEE Transactions on Information Forensics and Security, 2014, 9(12): 2170-2179. DOI:10.1109/TIFS.2014.2359646 [6] Brunelli R, Poggio T. HyberBF networks for gender classification. Proceedings of DARPA Image Understanding Workshop. Detroit, MI, USA. 1995. 311–314. [7] Tamura S, Kawai H, Mitsumoto H. Male/female identification from 8×6 very low resolution face images by neural network. Pattern Recognition, 1996, 29(2): 331-335. DOI:10.1016/0031-3203(95)00073-9 [8] Jiao YB, Yang JC, Fang ZJ, et al. Comparing studies of learning methods for human face gender recognition. Proceedings of the 7th Chinese Conference on Biometric Recognition. Guangzhou, China. 2012. 67−74, doi: 10.1007/978-3-642-35136-5_9. [9] 张婷, 李玉鑑, 胡海鹤, 等. 基于跨连卷积神经网络的性别分类模型. 自动化学报, 2016, 42(6): 858-865. DOI:10.16383/j.aas.2016.c150658 [10] Levi G, Hassncer T. Age and gender classification using convolutional neural networks. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Boston, MA, USA. 2015. 34–42. [11] 郎波, 黄静, 危辉. 利用多层视觉网络模型进行图像局部特征表征的方法. 计算机辅助设计与图形学学报, 2015, 27(4): 703-712. [12] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, NV, USA. 2012. 1097−1105. [13] Deng J, Dong W, Socher R, et al. ImageNet: A large-scale hierarchical image database. Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, FL, USA. 2009. 248−255. [14] Hinton GE, Srivastava N, Krizhevsky A, et al. Improving neural networks by preventing co-adaptation of feature detectors. arXiv: 1207.0580, 2012.