基于感知重构的解耦知识蒸馏

doi:10.15888/j.cnki.csa.009773

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月1日 5:18 星期二

首页 > 过刊浏览>2025年第34卷第2期 >11-18. DOI:10.15888/j.cnki.csa.009773

PDF HTML阅读 XML下载导出引用引用提醒

基于感知重构的解耦知识蒸馏
DOI:
                        10.15888/j.cnki.csa.009773
                    
CSTR:
                        
                    
作者:
                        祝英策祝英策
武汉科技大学 计算机科学与技术学院, 武汉 430065
在期刊界中查找
在百度中查找
在本站中查找
朱子奇朱子奇
武汉科技大学 计算机科学与技术学院, 武汉 430065
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:公安部科技计划(2022JSM08)

Decoupled Knowledge Distillation Based on Perception Reconstruction

Author:

ZHU Ying-Ce
ZHU Ying-Ce
School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, China
在期刊界中查找
在百度中查找
在本站中查找
ZHU Zi-Qi
ZHU Zi-Qi
School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [31]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

在知识蒸馏(knowledge distillation, KD)领域中, 基于特征的方法可以有效挖掘教师模型所蕴含的丰富知识. 然而, 基于Logit的方法常面临着知识传递不充分和效率低下等问题. 解耦知识蒸馏(decoupled knowledge distillation, DKD)通过将教师模型和学生模型输出的Logit划分为目标类和非目标类进行蒸馏. 这种方式虽然提升了蒸馏精度, 但其基于单实例的蒸馏方式使得批次内样本间的动态关系无法被捕捉到, 尤其是当教师模型和学生模型的输出分布存在显著差异时, 仅依靠解耦蒸馏无法有效弥合这种差异. 为了解决DKD中存在的问题, 本文提出感知重构的方法. 该方法引入一个感知矩阵, 利用模型的表征能力对Logit进行重新校准, 细致分析类内动态关系, 重建更细粒度的类间关系. 由于学生模型的目标是最小化表征差异, 因此将该方法扩展到解耦知识蒸馏中, 把教师模型和学生模型的输出映射到感知矩阵上, 从而使学生模型能够学习到教师模型中更加丰富的知识. 本文方法在CIFAR-100和ImageNet-1K数据集上进行了一系列的验证, 实验结果表明, 该方法训练的学生模型在CIFAR-100数据集上的分类准确率达到了74.98%, 相较于基准方法提升了0.87个百分点, 提升了学生模型的图像分类效果. 此外, 通过对多种方法进行对比实验, 进一步验证了该方法的优越性.

关键词:模型压缩;知识蒸馏;解耦知识蒸馏;感知重构;类内关系匹配

Abstract:

In the field of knowledge distillation (KD), feature-based methods can effectively extract the rich knowledge embedded in the teacher model. However, Logit-based methods often face issues such as insufficient knowledge transfer and low efficiency. Decoupled knowledge distillation (DKD) conducts distillation by dividing the Logits output by the teacher and student models into target and non-target classes. While this method improves distillation accuracy, its single-instance-based distillation approach fails to capture the dynamic relationships among samples within a batch. Especially when there are significant differences in the output distributions of the teacher and student models, relying solely on decoupled distillation cannot effectively bridge these differences. To address the issues inherent in DKD, this study proposes a perception reconstruction method. This method introduces a perception matrix. By utilizing the representational capabilities of the model, it recalibrates Logits, meticulously analyzes intra-class dynamic relationships, and reconstructs finer-grained inter-class relationships. Since the objective of the student model is to minimize representational disparity, this method is extended to decoupled knowledge distillation. The outputs of the teacher and student models are mapped onto the perception matrix, enabling the student model to learn richer knowledge from the teacher model. A series of validations on the CIFAR-100 and ImageNet-1K datasets demonstrate that the student model trained with this method achieves a classification accuracy of 74.98% on the CIFAR-100 dataset, which is 0.87 percentage points higher than that of baseline methods, thereby enhancing the image classification performance of the student model. Additionally, comparative experiments with various methods further verify the superiority of this method.

Key words:model compression;knowledge distillation (KD);decoupled knowledge distillation (DKD);perception reconstruction;intra-class relationship matching

参考文献

[1] Dhanya VG, Subeesh A, Kushwaha NL, et al. Deep learning based computer vision approaches for smart agricultural applications. Artificial Intelligence in Agriculture, 2022, 6: 211–229.

[2] Lauriola I, Lavelli A, Aiolli F. An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing, 2022, 470: 443–456.

[3] Balhara S, Gupta N, Alkhayyat A, et al. A survey on deep reinforcement learning architectures, applications and emerging trends. IET Communications, 2022.

[4] Munikoti S, Agarwal D, Das L, et al. Challenges and opportunities in deep reinforcement learning with graph neural networks: A comprehensive review of algorithms and applications. IEEE Transactions on Neural Networks and Learning Systems, 2023.

[5] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv:1503.02531, 2015.

[6] Romero A, Ballas N, Kahou SE, et al. Fitnets: Hints for thin deep nets. Proceedings of the 3rd International Conference on Learning Representations. San Diego: OpenReview.net, 2015.

[7] Tung F, Mori G. Similarity-preserving knowledge distillation. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019. 1365–1374.

[8] Chen YD, Wang S, Liu JJ, et al. Improved feature distillation via projector ensemble. Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2022. 878.

[9] Zagoruyko S, Komodakis N. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. Proceedings of the 5th International Conference on Learning Representations. Toulon: OpenReview.net, 2017.

[10] Müller R, Kornblith S, Hinton G. When does label smoothing help? Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2019. 422.

[11] Phuong M, Lampert C. Towards understanding knowledge distillation. Proceedings of the 36th International Conference on Machine Learning. Long Beach: PMLR, 2019. 5142–5151.

[12] Liu YF, Chen K, Liu C, et al. Structured knowledge distillation for semantic segmentation. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 2599–2608.

[13] He T, Shen CH, Tian Z, et al. Knowledge adaptation for efficient semantic segmentation. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 578–587.

[14] Chen GB, Choi W, Yu X, et al. Learning efficient object detection models with knowledge distillation. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 742–751.

[15] Tang ST, Feng LT, Shao WQ, et al. Learning efficient detector with semi-supervised adaptive distillation. Proceedings of the 30th British Machine Vision Conference. Cardiff: BMVA Press, 2019. 215.

[16] Zhao BR, Cui Q, Song RJ, et al. Decoupled knowledge distillation. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 11943–11952.

[17] Kim Y, Rush AM. Sequence-level knowledge distillation. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin: ACL, 2016. 1317–1327.

[18] Wang L, Yoon KJ. Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6): 3048–3068.

[19] Heo B, Lee M, Yun S, et al. Knowledge transfer via distillation of activation boundaries formed by hidden neurons. Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu: AAAI, 2019. 3779–3787.

[20] Li YT, Sun LY, Gou JP, et al. Feature fusion-based collaborative learning for knowledge distillation. International Journal of Distributed Sensor Networks, 2021, 17(11): 15501477211057037.

[21] Passalis N, Tefas A. Learning deep representations with probabilistic knowledge transfer. Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 283–299.

[22] Ahn S, Hu SX, Damianou A, et al. Variational information distillation for knowledge transfer. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2020. 9155–9163.

[23] Park W, Kim D, Lu Y, et al. Relational knowledge distillation. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 3963–3971.

[24] Liu YF, Cao JJ, Li B, et al. Knowledge distillation via instance relationship graph. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 7089–7097.

[25] Hou YN, Ma Z, Liu CX, et al. Inter-region affinity distillation for road marking segmentation. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 12483–12492.

[26] Tao XY, Hong XP, Chang XY, et al. Few-shot class-incremental learning. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 12180–12189.

[27] Krizhevsky A. Learning multiple layers of features from tiny images. Technical Report, 2009. http://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf

[28] Russakovsky O, Deng J, Su H, et al. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211–252.

[29] Tian YL, Krishnan D, Isola P. Contrastive representation distillation. arXiv:1910.10699, 2022.

[30] Heo B, Kim J, Yun S, et al. A comprehensive overhaul of feature distillation. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019. 1921–1930.

[31] Chen PG, Liu S, Zhao HS, et al. Distilling knowledge via knowledge review. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 5006–5015.

引用本文

祝英策,朱子奇.基于感知重构的解耦知识蒸馏.计算机系统应用,2025,34(2):11-18

复制

文章指标

点击次数:182
下载次数: 660
HTML阅读次数: 145
引用次数: 0

历史

收稿日期:2024-07-29
最后修改日期:2024-08-20
录用日期:
在线发布日期: 2024-12-19
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码