本文已被:浏览 102次 下载 865次
Received:April 06, 2024 Revised:May 06, 2024
Received:April 06, 2024 Revised:May 06, 2024
中文摘要: 人脸图像生成对生成人脸的真实度和可控性有较高要求. 本文提出了一种由文本和脸部关键点协同控制的人脸图像生成算法. 其中文本主要是在语义层面对生成人脸进行约束; 脸部关键点使模型根据给定的脸部位置信息, 控制生成人脸的脸型、表情和细节等属性. 本文算法在现有的扩散模型基础上加以改进, 并额外引入了文本处理模块(CM)、关键点控制网络(KCN)和自编码网络(ACN). 其中, 扩散模型是一种基于扩散理论的噪声推理算法; CM基于注意力机制设计, 可以对文本信息进行编码和存储; KCN接收的是关键点的位置信息, 使生成人脸的可控性得以增强; ACN缓解了扩散模型的生成压力, 减少生成样本所需的时间. 此外, 为了适配人脸图像这一生成任务, 我们构建一个包含30000张人脸图像的数据集. 本文算法实现了: 给定一段先决条件文本和一张人脸关键点图, 模型可以提取出文本中的特征信息和关键点的位置信息, 生成高真实度和可控性强的目标人脸图像. 通过与目前主流方法进行对比, 本文算法的FID指标提高了约5%–23%, IS指标提高了约3%–14%, 证明了算法的先进性和优越性.
Abstract:Face image generation requires high realism and controllability. This study proposes an algorithm for face image generation that is jointly controlled by text and facial key points. The text constrains the generation of faces at a semantic level, while facial key points enable the model to control the generation of facial features, expressions, and details based on given facial information. The proposed algorithm improves the existing diffusion model and introduces additional components: text processing models (CM), keypoint control networks (KCN), and autoencoder networks (ACN). Specifically, the diffusion model is a noise inference algorithm based on the diffusion theory; CM is designed based on an attention mechanism to encode and store text information; KCN receives the location information of key points, enhancing the controllability of face generation; ACN alleviates the generation pressure of the diffusion model and reduces the time required to generate samples. In addition, to adapt to generating face images, this research constructs a dataset containing 30000 face images. In the proposed algorithm, given prerequisite text and a facial keypoint image, the model extracts feature information and keypoint information from the text, generating a highly realistic and controllable target face image. Compared with mainstream methods, the proposed algorithm improves the FID index by about 5%–23% and the IS index by about 3%–14%, which proves its superiority.
keywords: facial generation diffusion model generative artificial intelligence text encoding autoencoder
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(62276018)
引用文本:
刘宇同,王一丁.文本与关键点协同控制的人脸图像生成.计算机系统应用,2024,33(10):174-182
LIU Yu-Tong,WANG Yi-Ding.Facial Image Generation Based on Collaborative Control of Text and Key Points.COMPUTER SYSTEMS APPLICATIONS,2024,33(10):174-182
刘宇同,王一丁.文本与关键点协同控制的人脸图像生成.计算机系统应用,2024,33(10):174-182
LIU Yu-Tong,WANG Yi-Ding.Facial Image Generation Based on Collaborative Control of Text and Key Points.COMPUTER SYSTEMS APPLICATIONS,2024,33(10):174-182