基于能量和熵平衡转移的知识蒸馏
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

公安部科技计划(2022JSM08)


Knowledge Distillation Based on Energy and Entropy Balanced Transfer
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    知识蒸馏(KD)中的温度在以前的大多数工作中被设置为蒸馏过程的固定值. 然而, 重新研究温度时, 发现固定的温度限制了对每个样本中固有知识的利用. 本文根据能量得分将数据集分为低能量样本和高能量样本, 通过实验证实了低能量样本的置信度得分高, 表明其预测是确定的, 而高能量样本的置信度得分低, 意味着预测是不确定的. 为了通过调整非目标类预测来提取最佳的知识, 本文对低能量样本应用较高的温度以创建更平滑的分布, 并对高能量样本应用较低的温度以获得更清晰的分布. 此外, 为解决学生对突出特征的不平衡依赖和对暗知识的疏忽, 本文引入熵重加权的知识蒸馏, 这是利用教师预测中的熵在样本基础上重新加权能量蒸馏损失的方法. 本文方法可以很容易地应用于其他基于逻辑的知识蒸馏方法中, 并获得更好的性能, 可以更接近甚至优于基于特征的方法. 本文在图像分类数据集(CIFAR-100、ImageNet)上进行了广泛的实验, 证明了该方法的有效性.

    Abstract:

    The temperature in knowledge distillation (KD) is set as a fixed value during the distillation process in most previous work. However, when the temperature is reexamined, it is found that the fixed temperature restricts inherent knowledge utilization in each sample. This study divides the dataset into low-energy and high-energy samples based on energy scores. Through experiments, it is confirmed that the confidence score of low-energy samples is high, indicating that predictions are deterministic, while the confidence score of high-energy samples is low, indicating that predictions are uncertain. To extract the best knowledge by adjusting non-target class predictions, this study applies higher temperatures to low-energy samples to generate smoother distributions and applies lower temperatures to high-energy samples to obtain clearer distributions. In addition, to address the imbalanced dependence of students on prominent features and their neglect of dark knowledge, this study introduces entropy-reweighted knowledge distillation, which utilizes the entropy predicted by teachers to reweight the energy distillation loss on a sample basis. This method can be easily applied to other logic-based knowledge distillation methods and achieve better performance, which can be closer or even better than feature-based methods. This study conducts extensive experiments on image classification datasets (CIFAR-100, ImageNet) to validate the effectiveness of this method.

    参考文献
    相似文献
    引证文献
引用本文

盛自强,朱子奇.基于能量和熵平衡转移的知识蒸馏.计算机系统应用,,():1-8

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-05-29
  • 最后修改日期:2024-06-26
  • 录用日期:
  • 在线发布日期: 2024-11-15
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号