基于扩散模型的解耦知识蒸馏
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

公安部科技计划(2022JSM08)


Decoupled Knowledge Distillation Based on Diffusion Model
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    知识蒸馏(KD)是一种将复杂模型(教师模型)的知识传递给简单模型(学生模型)的技术, 目前比较受欢迎的蒸馏方法大多停留在基于中间特征层, 继解耦知识蒸馏(DKD)提出后基于响应的知识蒸馏又重新回到SOTA行列, 这种使用强一致性约束条件的策略, 将经典的知识蒸馏拆分为两个部分, 解决了高度耦合的问题. 然而, 这种方法忽略了师生网络架构差距较大所引起的表征差距过大, 进而导致学生模型由于体量较小无法更有效的学习到教师模型的知识的问题. 为了解决这个问题, 本文提出了使用扩散模型来缩小师生模型之间的表征差距, 这种方法将教师特征传输到扩散模型中训练, 然后通过一个轻量级的扩散模型对学生模型进行降噪从而缩小了师生模型的表征差距. 大量的实验表明这种方法对比于基准方法在CIFAR-100、ImageNet数据集上均有较大的提升, 在师生网络架构差距较大时依然能够保持较好的性能.

    Abstract:

    Knowledge distillation (KD) is a technique that transfers knowledge from a complex model (teacher model) to a simpler model (student model). While many popular distillation methods currently focus on intermediate feature layers, response-based knowledge distillation (RKD) has regained its position among the SOTA models after decoupled knowledge distillation (DKD) was introduced. RKD leverages strong consistency constraints to split classic knowledge distillation into two parts, addressing the issue of high coupling. However, this approach overlooks the significant representation gap caused by the disparity in teacher-student network architectures, leading to the problem where smaller student models cannot effectively learn knowledge from teacher models. To solve this problem, this study proposes a diffusion model to narrow the representation gap between teacher and student models. This model transfers teacher features to train a lightweight diffusion model, which is then used to denoise the student model, thus reducing the representation gap between teacher and student models. Extensive experiments demonstrate that the proposed model achieves significant improvements over baseline models on CIFAR-100 and ImageNet datasets, maintaining good performance even when there is a large gap in teacher-student network architectures.

    参考文献
    相似文献
    引证文献
引用本文

王鹏宇,朱子奇.基于扩散模型的解耦知识蒸馏.计算机系统应用,2024,33(9):58-64

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-03-13
  • 最后修改日期:2024-04-10
  • 录用日期:
  • 在线发布日期: 2024-07-24
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号