Abstract:This study proposes a new unsupervised image-to-image translation model that combines generative adversarial networks (GAN) and multi-angle attention, and it is called MAGAN for short. The multi-angle attention guides the translation model to focus its attention on the most discriminative regions among different domains. Unlike the existing attention-based methods, spatial activation mapping (SAM) not only captures the dependencies among channels to reduce the feature distortion of the translated image but also determines the extent to which the network focuses on the spatial location of the most discriminative regions so that the translated image is more in the style of the target domain. On the basis of SAM, the global semantic information of the image can be obtained by class activation mapping (CAM). In addition, different attention structures are designed to train the generator and the discriminator, respectively, according to the influence of spatial activation degree on the feature information of the image. Experimental results show that the model proposed in this study outperforms most mainstream models with kernel inception distance (KID) scores of 9.48, 6.32, 6.42, and 4.28 on the four datasets selfie2anime, cat2dog, horse2zebra, and vangogh2photo, respectively. Moreover, compared with the baseline model, namely, unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation (UGATIT), the proposed model has significant performance advantages in that it reduces the distances on the selfie2anime, cat2dog, and horse2zebra datasets by 2.13, 0.75, and 0.64, respectively.