基于感知重构的解耦知识蒸馏
作者:
基金项目:

公安部科技计划(2022JSM08)


Decoupled Knowledge Distillation Based on Perception Reconstruction
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [31]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    在知识蒸馏(knowledge distillation, KD)领域中, 基于特征的方法可以有效挖掘教师模型所蕴含的丰富知识. 然而, 基于Logit的方法常面临着知识传递不充分和效率低下等问题. 解耦知识蒸馏(decoupled knowledge distillation, DKD)通过将教师模型和学生模型输出的Logit划分为目标类和非目标类进行蒸馏. 这种方式虽然提升了蒸馏精度, 但其基于单实例的蒸馏方式使得批次内样本间的动态关系无法被捕捉到, 尤其是当教师模型和学生模型的输出分布存在显著差异时, 仅依靠解耦蒸馏无法有效弥合这种差异. 为了解决DKD中存在的问题, 本文提出感知重构的方法. 该方法引入一个感知矩阵, 利用模型的表征能力对Logit进行重新校准, 细致分析类内动态关系, 重建更细粒度的类间关系. 由于学生模型的目标是最小化表征差异, 因此将该方法扩展到解耦知识蒸馏中, 把教师模型和学生模型的输出映射到感知矩阵上, 从而使学生模型能够学习到教师模型中更加丰富的知识. 本文方法在CIFAR-100和ImageNet-1K数据集上进行了一系列的验证, 实验结果表明, 该方法训练的学生模型在CIFAR-100数据集上的分类准确率达到了74.98%, 相较于基准方法提升了0.87个百分点, 提升了学生模型的图像分类效果. 此外, 通过对多种方法进行对比实验, 进一步验证了该方法的优越性.

    Abstract:

    In the field of knowledge distillation (KD), feature-based methods can effectively extract the rich knowledge embedded in the teacher model. However, Logit-based methods often face issues such as insufficient knowledge transfer and low efficiency. Decoupled knowledge distillation (DKD) conducts distillation by dividing the Logits output by the teacher and student models into target and non-target classes. While this method improves distillation accuracy, its single-instance-based distillation approach fails to capture the dynamic relationships among samples within a batch. Especially when there are significant differences in the output distributions of the teacher and student models, relying solely on decoupled distillation cannot effectively bridge these differences. To address the issues inherent in DKD, this study proposes a perception reconstruction method. This method introduces a perception matrix. By utilizing the representational capabilities of the model, it recalibrates Logits, meticulously analyzes intra-class dynamic relationships, and reconstructs finer-grained inter-class relationships. Since the objective of the student model is to minimize representational disparity, this method is extended to decoupled knowledge distillation. The outputs of the teacher and student models are mapped onto the perception matrix, enabling the student model to learn richer knowledge from the teacher model. A series of validations on the CIFAR-100 and ImageNet-1K datasets demonstrate that the student model trained with this method achieves a classification accuracy of 74.98% on the CIFAR-100 dataset, which is 0.87 percentage points higher than that of baseline methods, thereby enhancing the image classification performance of the student model. Additionally, comparative experiments with various methods further verify the superiority of this method.

    参考文献
    [1] Dhanya VG, Subeesh A, Kushwaha NL, et al. Deep learning based computer vision approaches for smart agricultural applications. Artificial Intelligence in Agriculture, 2022, 6: 211–229.
    [2] Lauriola I, Lavelli A, Aiolli F. An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing, 2022, 470: 443–456.
    [3] Balhara S, Gupta N, Alkhayyat A, et al. A survey on deep reinforcement learning architectures, applications and emerging trends. IET Communications, 2022.
    [4] Munikoti S, Agarwal D, Das L, et al. Challenges and opportunities in deep reinforcement learning with graph neural networks: A comprehensive review of algorithms and applications. IEEE Transactions on Neural Networks and Learning Systems, 2023.
    [5] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv:1503.02531, 2015.
    [6] Romero A, Ballas N, Kahou SE, et al. Fitnets: Hints for thin deep nets. Proceedings of the 3rd International Conference on Learning Representations. San Diego: OpenReview.net, 2015.
    [7] Tung F, Mori G. Similarity-preserving knowledge distillation. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019. 1365–1374.
    [8] Chen YD, Wang S, Liu JJ, et al. Improved feature distillation via projector ensemble. Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans: Curran Associates Inc., 2022. 878.
    [9] Zagoruyko S, Komodakis N. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. Proceedings of the 5th International Conference on Learning Representations. Toulon: OpenReview.net, 2017.
    [10] Müller R, Kornblith S, Hinton G. When does label smoothing help? Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2019. 422.
    [11] Phuong M, Lampert C. Towards understanding knowledge distillation. Proceedings of the 36th International Conference on Machine Learning. Long Beach: PMLR, 2019. 5142–5151.
    [12] Liu YF, Chen K, Liu C, et al. Structured knowledge distillation for semantic segmentation. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 2599–2608.
    [13] He T, Shen CH, Tian Z, et al. Knowledge adaptation for efficient semantic segmentation. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 578–587.
    [14] Chen GB, Choi W, Yu X, et al. Learning efficient object detection models with knowledge distillation. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 742–751.
    [15] Tang ST, Feng LT, Shao WQ, et al. Learning efficient detector with semi-supervised adaptive distillation. Proceedings of the 30th British Machine Vision Conference. Cardiff: BMVA Press, 2019. 215.
    [16] Zhao BR, Cui Q, Song RJ, et al. Decoupled knowledge distillation. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022. 11943–11952.
    [17] Kim Y, Rush AM. Sequence-level knowledge distillation. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin: ACL, 2016. 1317–1327.
    [18] Wang L, Yoon KJ. Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6): 3048–3068.
    [19] Heo B, Lee M, Yun S, et al. Knowledge transfer via distillation of activation boundaries formed by hidden neurons. Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu: AAAI, 2019. 3779–3787.
    [20] Li YT, Sun LY, Gou JP, et al. Feature fusion-based collaborative learning for knowledge distillation. International Journal of Distributed Sensor Networks, 2021, 17(11): 15501477211057037.
    [21] Passalis N, Tefas A. Learning deep representations with probabilistic knowledge transfer. Proceedings of the 15th European Conference on Computer Vision. Munich: Springer, 2018. 283–299.
    [22] Ahn S, Hu SX, Damianou A, et al. Variational information distillation for knowledge transfer. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2020. 9155–9163.
    [23] Park W, Kim D, Lu Y, et al. Relational knowledge distillation. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 3963–3971.
    [24] Liu YF, Cao JJ, Li B, et al. Knowledge distillation via instance relationship graph. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 7089–7097.
    [25] Hou YN, Ma Z, Liu CX, et al. Inter-region affinity distillation for road marking segmentation. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 12483–12492.
    [26] Tao XY, Hong XP, Chang XY, et al. Few-shot class-incremental learning. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 12180–12189.
    [27] Krizhevsky A. Learning multiple layers of features from tiny images. Technical Report, 2009. http://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf
    [28] Russakovsky O, Deng J, Su H, et al. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211–252.
    [29] Tian YL, Krishnan D, Isola P. Contrastive representation distillation. arXiv:1910.10699, 2022.
    [30] Heo B, Kim J, Yun S, et al. A comprehensive overhaul of feature distillation. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019. 1921–1930.
    [31] Chen PG, Liu S, Zhao HS, et al. Distilling knowledge via knowledge review. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021. 5006–5015.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

祝英策,朱子奇.基于感知重构的解耦知识蒸馏.计算机系统应用,2025,34(2):11-18

复制
分享
文章指标
  • 点击次数:182
  • 下载次数: 660
  • HTML阅读次数: 145
  • 引用次数: 0
历史
  • 收稿日期:2024-07-29
  • 最后修改日期:2024-08-20
  • 在线发布日期: 2024-12-19
文章二维码
您是第11117360位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号