﻿ 二值网络的分阶段残差二值化算法
 计算机系统应用  2019, Vol. 28 Issue (1): 38-46 PDF

Staged Residual Binarization Algorithm for Binary Networks
REN Hong-Ping, CHEN Min-Jie, WANG Zi-Hao, YANG Chun, YIN Xu-Cheng
School of Computer & Communication Engineering, University of Science & Technology Beijing, Beijing 100083, China
Abstract: Binary networks have obvious advantages in terms of speed, energy consumption, and memory consumption, but they cause a great loss of accuracy for the deep network model. In order to solve the problems above, this study proposes a staged residual binarization optimization algorithm for binary networks to obtain a better binary neural network model. In this study, we combine the random quantification method with XNOR-net, and propose two improved algorithms, namely applying weights approximation factor and deterministic quantization networks, and a new staged residual binarization BNN training optimization algorithm, in order to obtain the recognition accuracy of the full-accuracy neural network. Experimental results show that staged residual binarization algorithm can effectively improve the training accuracy of binary model, and does not increase the computational complexity of the related network in the testing process, thus maintaining the advantages of high speed, low memory usage, and small energy consumption.
Key words: deep learning     binary networks     random quantification     high-order residual quantization     staged residual binarization

1 相关工作概述

1.1 随机量化网络介绍

 图 1 随机量化流程图

1) 预训练. 将二值化比例置0, 保留全部的原始精度权重, 以得到一个准确率较高的预训练模型.

2) 随机选取二值化权重. 提高二值化比例, 使得一部分训练中的权重会被二值化. 如何选取这些权重呢？首先统计二值化给不同权重带来的误差大小, 按“误差越小, 被二值化的几率越大”原则, 用“轮盘赌”的方式随机选取一部分权重进行二值化训练. 训练过程跟普通BWN一样.

3) 将提高了二值化比例的网络在预训的模型上进行finetune, 直到训练的准确率达到与之前全精度网络模型接近的水准.

4) 在finetune成功的模型上继续提高网络二值化的比例, 重复2)、3)的步骤, 直到整个网络中的模型权重被二值化的比例达到100%, 且finetune的结果达到可接受范围.

1.2 随机权重二值化及近似因子

XNOR-net是一种有效的二值训练精度优化算法, 在这里我们考虑将“随机量化”这一思想应用在XNOR-net权重二值化中, 进一步提升二值网络的训练精度. 随机量化网络在实现时并不会具体到每个权重, 而是以整个通道为单位来操作的, 而XNOR-net算法的处理也是通道级别的, 这就为我们研究如何将两种训练优化算法合理融合提供了便利.

 图 2 权重矩阵随机求取 $\alpha$ 示意图

 $e\_XNO{R_i} = \frac{{\parallel {w_i} - Q\_XNO{R_i}{\parallel _1}}}{{\parallel {w_i}{\parallel _1}}} = \frac{{\parallel {w_i} - {B_i} * {\alpha _i}{\parallel _1}}}{{\parallel {w_i}{\parallel _1}}}$ (1)

1) 计算每一组通道权重的近似因子 ${\alpha _i}$ , 求 ${\alpha _i}$ 的方法与XNOR-net相同;

2) 求出每组输出通道上添加了近似因子后的权重量化、二值化误差, 计算的定量表达见式(1);

3) 根据量化误差, 计算每个通道的权重被进行二值化的概率;

4) 以概率大小为依据, 采用轮盘赌的方式选取可以进行权重二值化的通道, 选中的通道被二值化. 在训练时, 需要二值化这些通道的权重并乘上相应的近似因子 ${\alpha _i}$ , 没有被选中的通道则继续保留全精度, 计算过程中不使用对应的近似因子.

1.3 确定权重二值化

“阶段量化”则是一种纯线性的训练模式, 由于只在初始时对每个通道计算一次二值化误差, 每次训练只二值化固定的通道, 其他通道不会出现二值化, 因此整体上看每次迭代的训练难度只与二值化比例有关. 这样做虽然会导致每次二值化比例提升后都需要对所有新产生的二值化权重进行调整, 看上去后期训练收敛的过程会相对变慢, 但是因为二值化的选取变成了确定式的, 使得前期训练收敛变得相对更容易, 因此整体上看并不会增加训练的难度. 改进后的阶段量化方式示意图如图3所示.

 图 3 改进后的阶段量化方式示意图

1.4 高阶残差量化网络

HORQ方法可以作为一个基础的二值量化的方法用于网络的输入二值化中, 能够在保证网络模型精度的前提下, 利用二值量化的技术提升网络的计算速度, 而且同时可以根据实际的硬件需要来调整残差阶数以适应需求.

HORQ可以看做是XNOR-net的改进版, 要对权重和输入进行高阶残差近似, 先按照 XNOR-net的方法对权重和输入进行一阶二值近似, 下面以输入为例进行高阶残差近似的分析:

 $X \approx {\beta _1}{H_1}$ (2)

 ${R_1}(X) = X - {\beta _1}{H_1}$ (3)

 ${R_1}(X) \approx {\beta _2}{H_2}$ (4)

 $X \approx {\beta _1}{H_1} + {\beta _2}{H_2}$ (5)

 $X \approx \sum\nolimits_{i = 1}^K {{\beta _i}{H_i}}$ (6)

2 “分阶段残差二值化”算法

2.1 “分阶段二值化”算法的实现 2.1.1 “分阶段二值化”算法的推导流程

 图 4 关于中间值矩阵的“分阶段二值化”示意图

 ${X^{\rm T}} * W \approx (\beta * {H^{\rm T}} * \alpha * \beta )$ (7)
 $\beta = (\frac{1}{n}\parallel X{\parallel _{{l_1}}})$ (8)

2.1.2 “分阶段二值化”算法的实现过程

1) 首先, 实现对权重的阶段二值化, 将模型在BWN结构中达到接近单精度浮点数据的模型准确率. 阶段二值化根据网络中设定的“二值化比例”ratio这一参数, 选取权重矩阵的前ratio行, 即总共n组通道权重中的前ratio进行二值化, n为该层神经网络输出的特征数.

2) 然后, 当权重二值化的比例达到100%, 并将其模型finetune到接近全精度模型的准确率后, 开始提高中间值的二值化比例. 关于二值化中间值的选取方式与权重类似, 按照中间值矩阵的列进行确定式选择. 恢复到特征图的形状, 则可以理解成从特征图的左上角开始, 以滑动窗口的方式向右、向下逐渐覆盖整个特征图.

3) 最后, 当中间值的二值化比例也达到100%后, 继续finetune直到模型的准确率不再提高.

2.2 “分阶段残差二值化”算法的实现

 ${X^{\rm T}} * W = \beta * ({H^{\rm T}} * W)$ (9)

 $\beta = X - H$ (10)

 ${X^{\rm T}} * W = ({H^{\rm T}} + \beta ) * W = {H^{\rm T}} * W + \beta * W$ (11)

3 实验结果与分析

3.1 数据集及实验设置 3.1.1 数据集及评判标准

CITYSCAPES评测集包含5000多张高质量的像素级标注街景图片, 其主要被应用在图像语义分割任务[15]中, 用于评估视觉算法在城区场景语义理解方面的性能. 这其中共包含50个城市不同情况下的街景, 以及30类物体标注. CITYSCAPES主要专注于像素级别的分割和识别, 虽然其图像相对于真实的驾驶场景来说较干净, 但像素级别的分割和识别却提出了更高的要求. CITYSCAPES使用标准的PASCAL VOC IoU(intersection-over-union)得分来评估预测结果与真实场景之间的匹配准确度, 每个像素点的预测结果都会直接影响到最终得分.

3.1.2 实验设置

3.2 实验结果 3.2.1 “分阶段残差二值化”算法在测试集上的性能

 图 5 高阶残差量化、“分阶段残差二值化”、原始全精度分割模型效果对比

3.2.2 “分阶段残差二值化”算法在训练集上的性能

 图 6 分阶段残差二值化训练结果

3.2.3 “分阶段残差二值化”算法在特殊层上的实现

1) 在DECONV层上的实现

2) 速度对比

4 结论

 [1] Hou L, Yao QM, Kwok JT. Loss-aware binarization of deep networks. arXiv: 1611.01600, 2016. [2] Coates A, Huval B, Wang T, et al. Deep learning with COTS HPC systems. Proceedings of the 30th International Conference on International Conference on Machine Learning. Atlanta, GA, USA. 2013. III-1337–III-1345. [3] Vanhoucke V, Mao MZ. Improving the speed of neural networks on CPUs. Proceedings of Deep Learning and Unsupervised Feature Learning (NIPS Workshop 2011). Grarada, Spain. 2011. 1–4. [4] Gong YC, Liu L, Yang M, et al. Compressing deep convolutional networks using vector quantization. arXiv: 1412.6115, 2014. [5] Romero A, Ballas N, Kahou SE, et al. FitNets: Hints for thin deep nets. arXiv: 1412.6550, 2014. [6] Farabet C, LeCun Y, Kavukcuoglu K, et al. Large-scale FPGA-based convolutional networks. Bekkerman R, Bilenko M, Langford J. Machine Learning on Very Large Data Sets. Cambridge: Cambridge University Press, 2011. [7] Pham PH, Jelaca D, Farabet C, et al. NeuFlow: Dataflow vision processing system-on-a-chip. Proceedings of the 2012 IEEE 55th International Midwest Symposium on Circuits and Systems. Boise, ID, USA. 2012. 1044–1047. [8] Chen TS, Du ZD, Sun NH, et al. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGPLAN Notices, 2014, 49(4): 269-284. [9] Esser SK, Appuswamy R, Merolla PA, et al. Backpropagation for energy-efficient neuromorphic computing. Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada. 2015. 1117–1125. [10] Kamiya R, Yamashita T, Ambai M, et al. Binary-decomposed DCNN for accelerating computation and compressing model without retraining. Proceedings of 2017 IEEE International Conference on Computer Vision Workshops. Venice, Italy. 2017. 1095–1102. [11] Umuroglu Y, Fraser NJ, Gambardella G, et al. FINN: A framework for fast, scalable binarized neural network inference. arXiv: 1612.07119, 2016. [12] Courbariaux M, Bengio Y, David JP. Binaryconnect: Training deep neural networks with binary weights during propagations. Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal, Canada. 2015. 3123–3131. [13] Courbariaux M, Bengio Y, David JP. Training deep neural networks with low precision multiplications. arXiv:1412.7024, 2015. [14] Rastegari M, Ordonez V, Redmon J, et al. XNOR-Net: Imagenet classification using binary convolutional neural networks. Proceedings of the 14th European Conference on Computer Vision. Annsterdam, The Netherland. 2016. 525–542. [15] Paszke A, Chaurasia A, Kim S, et al. ENet: A deep neural network architecture for real-time semantic segmentation. arXiv: 1606.02147, 2016.