基于感知重构的解耦知识蒸馏

doi:10.15888/j.cnki.csa.009773

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月13日 3:42 星期日

首页 > 过刊浏览>2025年第34卷第2期 >11-18. DOI:10.15888/j.cnki.csa.009773

PDF HTML阅读 XML下载导出引用引用提醒

基于感知重构的解耦知识蒸馏
DOI:
                        10.15888/j.cnki.csa.009773
                    
CSTR:
                        
                    
作者:
                        祝英策祝英策
武汉科技大学 计算机科学与技术学院, 武汉 430065
在期刊界中查找
在百度中查找
在本站中查找
朱子奇朱子奇
武汉科技大学 计算机科学与技术学院, 武汉 430065
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:公安部科技计划(2022JSM08)

Decoupled Knowledge Distillation Based on Perception Reconstruction

Author:

ZHU Ying-Ce
ZHU Ying-Ce
School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, China
在期刊界中查找
在百度中查找
在本站中查找
ZHU Zi-Qi
ZHU Zi-Qi
School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

在知识蒸馏(knowledge distillation, KD)领域中, 基于特征的方法可以有效挖掘教师模型所蕴含的丰富知识. 然而, 基于Logit的方法常面临着知识传递不充分和效率低下等问题. 解耦知识蒸馏(decoupled knowledge distillation, DKD)通过将教师模型和学生模型输出的Logit划分为目标类和非目标类进行蒸馏. 这种方式虽然提升了蒸馏精度, 但其基于单实例的蒸馏方式使得批次内样本间的动态关系无法被捕捉到, 尤其是当教师模型和学生模型的输出分布存在显著差异时, 仅依靠解耦蒸馏无法有效弥合这种差异. 为了解决DKD中存在的问题, 本文提出感知重构的方法. 该方法引入一个感知矩阵, 利用模型的表征能力对Logit进行重新校准, 细致分析类内动态关系, 重建更细粒度的类间关系. 由于学生模型的目标是最小化表征差异, 因此将该方法扩展到解耦知识蒸馏中, 把教师模型和学生模型的输出映射到感知矩阵上, 从而使学生模型能够学习到教师模型中更加丰富的知识. 本文方法在CIFAR-100和ImageNet-1K数据集上进行了一系列的验证, 实验结果表明, 该方法训练的学生模型在CIFAR-100数据集上的分类准确率达到了74.98%, 相较于基准方法提升了0.87个百分点, 提升了学生模型的图像分类效果. 此外, 通过对多种方法进行对比实验, 进一步验证了该方法的优越性.

关键词:模型压缩;知识蒸馏;解耦知识蒸馏;感知重构;类内关系匹配

Abstract:

In the field of knowledge distillation (KD), feature-based methods can effectively extract the rich knowledge embedded in the teacher model. However, Logit-based methods often face issues such as insufficient knowledge transfer and low efficiency. Decoupled knowledge distillation (DKD) conducts distillation by dividing the Logits output by the teacher and student models into target and non-target classes. While this method improves distillation accuracy, its single-instance-based distillation approach fails to capture the dynamic relationships among samples within a batch. Especially when there are significant differences in the output distributions of the teacher and student models, relying solely on decoupled distillation cannot effectively bridge these differences. To address the issues inherent in DKD, this study proposes a perception reconstruction method. This method introduces a perception matrix. By utilizing the representational capabilities of the model, it recalibrates Logits, meticulously analyzes intra-class dynamic relationships, and reconstructs finer-grained inter-class relationships. Since the objective of the student model is to minimize representational disparity, this method is extended to decoupled knowledge distillation. The outputs of the teacher and student models are mapped onto the perception matrix, enabling the student model to learn richer knowledge from the teacher model. A series of validations on the CIFAR-100 and ImageNet-1K datasets demonstrate that the student model trained with this method achieves a classification accuracy of 74.98% on the CIFAR-100 dataset, which is 0.87 percentage points higher than that of baseline methods, thereby enhancing the image classification performance of the student model. Additionally, comparative experiments with various methods further verify the superiority of this method.

Key words:model compression;knowledge distillation (KD);decoupled knowledge distillation (DKD);perception reconstruction;intra-class relationship matching

引用本文

祝英策,朱子奇.基于感知重构的解耦知识蒸馏.计算机系统应用,2025,34(2):11-18

复制

文章指标

点击次数:213
下载次数: 690
HTML阅读次数: 157
引用次数: 0

历史

收稿日期:2024-07-29
最后修改日期:2024-08-20
录用日期:
在线发布日期: 2024-12-19
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码