###

计算机系统应用英文版:,():1-8

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

基于能量和熵平衡转移的知识蒸馏

盛自强, 朱子奇

(武汉科技大学计算机科学与技术学院, 武汉 430065)

Knowledge Distillation Based on Energy and Entropy Balanced Transfer

SHENG Zi-Qiang, ZHU Zi-Qi

(School of Computer Science & Technology, Wuhan University of Science and Technology, Wuhan 430065, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 5次下载 161次
Received:May 29, 2024 Revised:June 26, 2024

中文摘要: 知识蒸馏(KD)中的温度在以前的大多数工作中被设置为蒸馏过程的固定值. 然而, 重新研究温度时, 发现固定的温度限制了对每个样本中固有知识的利用. 本文根据能量得分将数据集分为低能量样本和高能量样本, 通过实验证实了低能量样本的置信度得分高, 表明其预测是确定的, 而高能量样本的置信度得分低, 意味着预测是不确定的. 为了通过调整非目标类预测来提取最佳的知识, 本文对低能量样本应用较高的温度以创建更平滑的分布, 并对高能量样本应用较低的温度以获得更清晰的分布. 此外, 为解决学生对突出特征的不平衡依赖和对暗知识的疏忽, 本文引入熵重加权的知识蒸馏, 这是利用教师预测中的熵在样本基础上重新加权能量蒸馏损失的方法. 本文方法可以很容易地应用于其他基于逻辑的知识蒸馏方法中, 并获得更好的性能, 可以更接近甚至优于基于特征的方法. 本文在图像分类数据集(CIFAR-100、ImageNet)上进行了广泛的实验, 证明了该方法的有效性.

中文关键词: 知识蒸馏能量熵暗知识蒸馏温度

Abstract:The temperature in knowledge distillation (KD) is set as a fixed value during the distillation process in most previous work. However, when the temperature is reexamined, it is found that the fixed temperature restricts inherent knowledge utilization in each sample. This study divides the dataset into low-energy and high-energy samples based on energy scores. Through experiments, it is confirmed that the confidence score of low-energy samples is high, indicating that predictions are deterministic, while the confidence score of high-energy samples is low, indicating that predictions are uncertain. To extract the best knowledge by adjusting non-target class predictions, this study applies higher temperatures to low-energy samples to generate smoother distributions and applies lower temperatures to high-energy samples to obtain clearer distributions. In addition, to address the imbalanced dependence of students on prominent features and their neglect of dark knowledge, this study introduces entropy-reweighted knowledge distillation, which utilizes the entropy predicted by teachers to reweight the energy distillation loss on a sample basis. This method can be easily applied to other logic-based knowledge distillation methods and achieve better performance, which can be closer or even better than feature-based methods. This study conducts extensive experiments on image classification datasets (CIFAR-100, ImageNet) to validate the effectiveness of this method.

keywords: knowledge distillation energy entropy dark knowledge distillation temperature

文章编号： 中图分类号： 文献标志码：

基金项目:公安部科技计划(2022JSM08)

Author Name	Affiliation
SHENG Zi-Qiang	School of Computer Science & Technology, Wuhan University of Science and Technology, Wuhan 430065, China
ZHU Zi-Qi	School of Computer Science & Technology, Wuhan University of Science and Technology, Wuhan 430065, China

Author Name	Affiliation
SHENG Zi-Qiang	School of Computer Science & Technology, Wuhan University of Science and Technology, Wuhan 430065, China
ZHU Zi-Qi	School of Computer Science & Technology, Wuhan University of Science and Technology, Wuhan 430065, China

引用文本：
盛自强,朱子奇.基于能量和熵平衡转移的知识蒸馏.计算机系统应用,,():1-8
SHENG Zi-Qiang,ZHU Zi-Qi.Knowledge Distillation Based on Energy and Entropy Balanced Transfer.COMPUTER SYSTEMS APPLICATIONS,,():1-8