###
计算机系统应用英文版:2024,33(12):161-169
本文二维码信息
码上扫一扫!
基于金字塔池化权值印记的训练后混合精度量化算法
(西安工业大学 计算机科学与工程学院, 西安 710021)
Post-training Mixed-accuracy Quantization Algorithm Based on Pyramid-pooled Weight Imprinting
(School of Computer Science and Engineering, Xi’an Technological University, Xi’an 710021, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 17次   下载 156
Received:May 29, 2024    Revised:June 26, 2024
中文摘要: 模型量化方法现已广泛应用于深度神经网络模型快速推理和部署中. 由于训练后量化重新训练所需时间少, 性能损失小而备受研究人员关注, 但现有训练后量化方法在量化过程中大多以理论假设或是固定分配网络层的比特位宽, 导致量化后的网络会出现显著的性能损失, 尤其是在低位情况下. 为了提升训练后量化网络模型的精度, 本文提出一种训练后混合精度量化方法(MSQ), 该方法通过在网络模型每一层后插入一个融合了金字塔池化模块和权值印记技术的任务预测器模块, 来对网络每一层进行准确度估计, 从而评估每一层网络的重要性, 根据重要性评估来确定每一层的量化比特位宽. 实验表明, 本文所提出的MSQ算法在多个流行的网络架构上都优于现有的一些混合精度量化方法, 量化后的网络模型在边缘硬件设备上测试性能更好, 延迟更低.
Abstract:Model quantization is widely used for fast inference and deployment of deep neural network models. Post-training quantization has attracted much attention from researchers due to its reduced retraining time and low performance loss. However, most existing post-training quantization methods rely on theoretical assumptions or use fixed bit-width allocations for network layers during the quantization process, which results in significant performance loss in the quantized network, especially in low-bit scenarios. To improve the accuracy of post-training quantized network models, this study proposes a novel post-training mixed-accuracy quantization method (MSQ). This method estimates the accuracy of each layer of the network by inserting a task predictor module, which incorporates the pyramid pooling module and weight imprinting, after each layer of the network model. With the estimations, it assesses the importance of each layer of the network and determines the quantization bit-width of each layer based on the assessment. Experiments show that the MSQ algorithm proposed in this study outperforms some existing mixed-accuracy quantization methods on several popular network architectures, and the quantized network model tested on edge hardware devices shows better performance and lower latency.
文章编号:     中图分类号:    文献标志码:
基金项目:陕西省科技厅区域创新能力引导计划(2022QFY01-14)
引用文本:
张瑞轩,赵宇峰,徐飞,禹婷婷,张乐怡.基于金字塔池化权值印记的训练后混合精度量化算法.计算机系统应用,2024,33(12):161-169
ZHANG Rui-Xuan,ZHAO Yu-Feng,XU Fei,YU Ting-Ting,ZHANG Le-Yi.Post-training Mixed-accuracy Quantization Algorithm Based on Pyramid-pooled Weight Imprinting.COMPUTER SYSTEMS APPLICATIONS,2024,33(12):161-169