###
计算机系统应用英文版:2024,33(2):276-283
本文二维码信息
码上扫一扫!
改进Wav2Lip的文本音频驱动人脸动画生成
(西安工程大学 计算机科学学院, 西安 710600)
Text Audio Driven Facial Animation Generation Based on Improved Wav2Lip
(School of Computer Science, Xi’an Polytechnic University, Xi’an 710600, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 656次   下载 1731
Received:August 17, 2023    Revised:September 26, 2023
中文摘要: 为了提高中文唇音同步人脸动画视频的真实性, 本文提出一种基于改进Wav2Lip模型的文本音频驱动人脸动画生成技术. 首先, 构建了一个中文唇音同步数据集, 使用该数据集来预训练唇部判别器, 使其判别中文唇音同步人脸动画更加准确. 然后, 在Wav2Lip模型中, 引入文本特征, 提升唇音时间同步性从而提高人脸动画视频的真实性. 本文模型综合提取到的文本信息、音频信息和说话人面部信息, 在预训练的唇部判别器和视频质量判别器的监督下, 生成高真实感的唇音同步人脸动画视频. 与ATVGnet模型和Wav2Lip模型的对比实验表明, 本文模型生成的唇音同步人脸动画视频提升了唇形和音频之间的同步性, 提高了人脸动画视频整体的真实感. 本文成果为当前人脸动画生成需求提供一种解决方案.
Abstract:In order to improve the authenticity of Chinese lip synchronized facial animation videos, this study proposes a text audio-driven facial animation generation technology based on the improved Wav2Lip model. Firstly, a Chinese lip synchronized dataset is constructed, which is used to pre-train the lip discriminator to make it more accurate in discriminating Chinese lip synchronized facial animations. Then, in the Wav2Lip model, text features are introduced to improve lip time synchronization and thus improve the authenticity of facial animation videos. The model in this study synthesizes the extracted text information, audio information, and speaker facial information and generates a highly realistic lip synchronized facial animation video under the supervision of a pre-trained lip discriminator and video quality discriminator. The comparative experiments with the ATVGnet model and Wav2Lip model show that the lip synchronized facial animation video generated by the proposed model improves the synchronization between lip shape and audio and enhances the overall realism of the facial animation video. The paper provides a solution for the current facial animation generation.
文章编号:     中图分类号:    文献标志码:
基金项目:国家重点研发计划(2019YFC1521400)
引用文本:
孙瑜,朱欣娟.改进Wav2Lip的文本音频驱动人脸动画生成.计算机系统应用,2024,33(2):276-283
SUN Yu,ZHU Xin-Juan.Text Audio Driven Facial Animation Generation Based on Improved Wav2Lip.COMPUTER SYSTEMS APPLICATIONS,2024,33(2):276-283