本文已被:浏览 488次 下载 1107次
Received:September 03, 2021 Revised:September 26, 2021
Received:September 03, 2021 Revised:September 26, 2021
中文摘要: 针对文本和图像模态在高维空间中相互映射的困难问题, 提出以全局句子向量为输入, 以堆叠式结构为基础的生成对抗网络(GAN), 应用于文本生成图像任务. 该网络融入双重注意力机制, 在空间和通道两大维度上寻求特征融合的更大化, 同时增加真实度损失判别器作为约束. 所提方法在加利福尼亚理工学院的CUB鸟类数据集上实验验证, 用Inception Score和SSIM作为评估指标. 结果表明, 生成图像具有更真实的细节纹理, 视觉效果更加接近于真实图像.
Abstract:Considering the difficulty in mutual mapping between text and image modalities in high-dimensional space, this study proposes a generative adversarial network (GAN) based on a stacked structure with global sentence vectors as input for the application of text-to-image generation tasks. The network incorporates a dual attention mechanism for greater integration of features in the two dimensions of space and channel. At the same time, we add the discriminator for fidelity loss as a constraint. The proposed method is experimentally verified on the Caltech-UCSD Birds (CUB) dataset, with Inception Score and SSIM as the evaluation indexes. The results show that the generated image has more realistic detail textures, and the visual effect is closer to the real image.
keywords: text-to-image generation stacked GAN dual attention mechanism fidelity loss text detection
文章编号: 中图分类号: 文献标志码:
基金项目:
引用文本:
胡成,胡莹晖,刘兴云.关注全局真实度的文本到图像生成.计算机系统应用,2022,31(6):388-393
HU Cheng,HU Ying-Hui,LIU Xing-Yun.Text-to-image Generation Focusing on Global Fidelity.COMPUTER SYSTEMS APPLICATIONS,2022,31(6):388-393
胡成,胡莹晖,刘兴云.关注全局真实度的文本到图像生成.计算机系统应用,2022,31(6):388-393
HU Cheng,HU Ying-Hui,LIU Xing-Yun.Text-to-image Generation Focusing on Global Fidelity.COMPUTER SYSTEMS APPLICATIONS,2022,31(6):388-393