本文已被:浏览 292次 下载 1703次
Received:March 13, 2024 Revised:April 10, 2024
Received:March 13, 2024 Revised:April 10, 2024
中文摘要: 知识蒸馏(KD)是一种将复杂模型(教师模型)的知识传递给简单模型(学生模型)的技术, 目前比较受欢迎的蒸馏方法大多停留在基于中间特征层, 继解耦知识蒸馏(DKD)提出后基于响应的知识蒸馏又重新回到SOTA行列, 这种使用强一致性约束条件的策略, 将经典的知识蒸馏拆分为两个部分, 解决了高度耦合的问题. 然而, 这种方法忽略了师生网络架构差距较大所引起的表征差距过大, 进而导致学生模型由于体量较小无法更有效的学习到教师模型的知识的问题. 为了解决这个问题, 本文提出了使用扩散模型来缩小师生模型之间的表征差距, 这种方法将教师特征传输到扩散模型中训练, 然后通过一个轻量级的扩散模型对学生模型进行降噪从而缩小了师生模型的表征差距. 大量的实验表明这种方法对比于基准方法在CIFAR-100、ImageNet数据集上均有较大的提升, 在师生网络架构差距较大时依然能够保持较好的性能.
Abstract:Knowledge distillation (KD) is a technique that transfers knowledge from a complex model (teacher model) to a simpler model (student model). While many popular distillation methods currently focus on intermediate feature layers, response-based knowledge distillation (RKD) has regained its position among the SOTA models after decoupled knowledge distillation (DKD) was introduced. RKD leverages strong consistency constraints to split classic knowledge distillation into two parts, addressing the issue of high coupling. However, this approach overlooks the significant representation gap caused by the disparity in teacher-student network architectures, leading to the problem where smaller student models cannot effectively learn knowledge from teacher models. To solve this problem, this study proposes a diffusion model to narrow the representation gap between teacher and student models. This model transfers teacher features to train a lightweight diffusion model, which is then used to denoise the student model, thus reducing the representation gap between teacher and student models. Extensive experiments demonstrate that the proposed model achieves significant improvements over baseline models on CIFAR-100 and ImageNet datasets, maintaining good performance even when there is a large gap in teacher-student network architectures.
keywords: knowledge distillation (KD) decoupled knowledge distillation diffusion model representation gap teacher-student network
文章编号: 中图分类号: 文献标志码:
基金项目:公安部科技计划(2022JSM08)
引用文本:
王鹏宇,朱子奇.基于扩散模型的解耦知识蒸馏.计算机系统应用,2024,33(9):58-64
WANG Peng-Yu,ZHU Zi-Qi.Decoupled Knowledge Distillation Based on Diffusion Model.COMPUTER SYSTEMS APPLICATIONS,2024,33(9):58-64
王鹏宇,朱子奇.基于扩散模型的解耦知识蒸馏.计算机系统应用,2024,33(9):58-64
WANG Peng-Yu,ZHU Zi-Qi.Decoupled Knowledge Distillation Based on Diffusion Model.COMPUTER SYSTEMS APPLICATIONS,2024,33(9):58-64