本文已被:浏览 562次 下载 2027次
Received:September 19, 2022 Revised:October 19, 2022
Received:September 19, 2022 Revised:October 19, 2022
中文摘要: 本文提出一个新的无监督图像翻译模型, 该模型结合了生成对抗网络和多角度注意力, 称为MAGAN. 多角度注意力引导翻译模型将注意力集中在不同域间最具有判别性的区域. 与现存的注意力方法不同的是, 空间激活映射一方面捕获通道间的依赖, 减少翻译图像的特征扭曲; 另一方面决定网络对最具判别性区域的空间位置的关注程度, 使翻译的图像更具有目标域风格. 在空间激活映射的基础上, 结合类激活映射, 可以获得图像的全局语义信息. 此外, 根据空间激活程度对图像特征信息的影响, 设计不同的注意力结构分别训练生成器和判别器. 实验结果表明, 本文模型在selfie2anime、cat2dog、horse2zebra和vangogh2photo这4个数据集上的KID分数分别达到9.48、6.32、6.42和4.28, 性能优于大部分主流模型, 并且与基线模型UGATIT相比, 在selfie2anime、cat2dog和horse2zebra这3个数据集上的距离值分别减少了2.13、0.75和0.64, 具有明显的性能优势.
Abstract:This study proposes a new unsupervised image-to-image translation model that combines generative adversarial networks (GAN) and multi-angle attention, and it is called MAGAN for short. The multi-angle attention guides the translation model to focus its attention on the most discriminative regions among different domains. Unlike the existing attention-based methods, spatial activation mapping (SAM) not only captures the dependencies among channels to reduce the feature distortion of the translated image but also determines the extent to which the network focuses on the spatial location of the most discriminative regions so that the translated image is more in the style of the target domain. On the basis of SAM, the global semantic information of the image can be obtained by class activation mapping (CAM). In addition, different attention structures are designed to train the generator and the discriminator, respectively, according to the influence of spatial activation degree on the feature information of the image. Experimental results show that the model proposed in this study outperforms most mainstream models with kernel inception distance (KID) scores of 9.48, 6.32, 6.42, and 4.28 on the four datasets selfie2anime, cat2dog, horse2zebra, and vangogh2photo, respectively. Moreover, compared with the baseline model, namely, unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation (UGATIT), the proposed model has significant performance advantages in that it reduces the distances on the selfie2anime, cat2dog, and horse2zebra datasets by 2.13, 0.75, and 0.64, respectively.
keywords: generative adversarial network (GAN) image-to-image translation image style transfer multi-angle attention (MA) unsupervised network image generation
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(61872153, 61972288)
引用文本:
杨百冰,陈泯融,叶勇森.结合生成对抗网络及多角度注意力的图像翻译模型.计算机系统应用,2023,32(4):283-292
YANG Bai-Bing,CHEN Min-Rong,YE Yong-Sen.Image-to-image Translation Model Combining GAN and Multi-angle Attention.COMPUTER SYSTEMS APPLICATIONS,2023,32(4):283-292
杨百冰,陈泯融,叶勇森.结合生成对抗网络及多角度注意力的图像翻译模型.计算机系统应用,2023,32(4):283-292
YANG Bai-Bing,CHEN Min-Rong,YE Yong-Sen.Image-to-image Translation Model Combining GAN and Multi-angle Attention.COMPUTER SYSTEMS APPLICATIONS,2023,32(4):283-292