基于能量和熵平衡转移的知识蒸馏

doi:10.15888/j.cnki.csa.009719

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月16日 7:34 星期三

首页 > 过刊浏览>2025年第34卷第1期 >171-178. DOI:10.15888/j.cnki.csa.009719

PDF HTML阅读 XML下载导出引用引用提醒

基于能量和熵平衡转移的知识蒸馏
DOI:
                        10.15888/j.cnki.csa.009719
                    
CSTR:
                        32024.14.csa.009719
                    
作者:
                        盛自强盛自强
武汉科技大学 计算机科学与技术学院, 武汉 430065
在期刊界中查找
在百度中查找
在本站中查找
朱子奇朱子奇
武汉科技大学 计算机科学与技术学院, 武汉 430065
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:公安部科技计划(2022JSM08)

Knowledge Distillation Based on Energy and Entropy Balanced Transfer

Author:

SHENG Zi-Qiang
SHENG Zi-Qiang
School of Computer Science & Technology, Wuhan University of Science and Technology, Wuhan 430065, China
在期刊界中查找
在百度中查找
在本站中查找
ZHU Zi-Qi
ZHU Zi-Qi
School of Computer Science & Technology, Wuhan University of Science and Technology, Wuhan 430065, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

知识蒸馏(KD)中的温度在以前的大多数工作中被设置为蒸馏过程的固定值. 然而, 重新研究温度时, 发现固定的温度限制了对每个样本中固有知识的利用. 本文根据能量得分将数据集分为低能量样本和高能量样本, 通过实验证实了低能量样本的置信度得分高, 表明其预测是确定的, 而高能量样本的置信度得分低, 意味着预测是不确定的. 为了通过调整非目标类预测来提取最佳的知识, 本文对低能量样本应用较高的温度以创建更平滑的分布, 并对高能量样本应用较低的温度以获得更清晰的分布. 此外, 为解决学生对突出特征的不平衡依赖和对暗知识的疏忽, 本文引入熵重加权的知识蒸馏, 这是利用教师预测中的熵在样本基础上重新加权能量蒸馏损失的方法. 本文方法可以很容易地应用于其他基于逻辑的知识蒸馏方法中, 并获得更好的性能, 可以更接近甚至优于基于特征的方法. 本文在图像分类数据集(CIFAR-100、ImageNet)上进行了广泛的实验, 证明了该方法的有效性.

关键词:知识蒸馏;能量;熵;暗知识;蒸馏温度

Abstract:

The temperature in knowledge distillation (KD) is set as a fixed value during the distillation process in most previous work. However, when the temperature is reexamined, it is found that the fixed temperature restricts inherent knowledge utilization in each sample. This study divides the dataset into low-energy and high-energy samples based on energy scores. Through experiments, it is confirmed that the confidence score of low-energy samples is high, indicating that predictions are deterministic, while the confidence score of high-energy samples is low, indicating that predictions are uncertain. To extract the best knowledge by adjusting non-target class predictions, this study applies higher temperatures to low-energy samples to generate smoother distributions and applies lower temperatures to high-energy samples to obtain clearer distributions. In addition, to address the imbalanced dependence of students on prominent features and their neglect of dark knowledge, this study introduces entropy-reweighted knowledge distillation, which utilizes the entropy predicted by teachers to reweight the energy distillation loss on a sample basis. This method can be easily applied to other logic-based knowledge distillation methods and achieve better performance, which can be closer or even better than feature-based methods. This study conducts extensive experiments on image classification datasets (CIFAR-100, ImageNet) to validate the effectiveness of this method.

Key words:knowledge distillation;energy;entropy;dark knowledge;distillation temperature

引用本文

盛自强,朱子奇.基于能量和熵平衡转移的知识蒸馏.计算机系统应用,2025,34(1):171-178

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-05-29
最后修改日期:2024-06-26
录用日期:
在线发布日期: 2024-11-15
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码