Text Audio Driven Facial Animation Generation Based on Improved Wav2Lip

doi:10.15888/j.cnki.csa.009405

AIPUB归智期刊联盟

WeChat

Mobile website

2025-4-24- 20

Home > Archive>Volume 33, Issue 2, 2024 >276-283. DOI:10.15888/j.cnki.csa.009405

PDF HTML XML Export Cite reminder

Text Audio Driven Facial Animation Generation Based on Improved Wav2Lip
DOI:
                        10.15888/j.cnki.csa.009405
                    
CSTR:
                        [cstr]
                    
Author:
                        SUN YuSUN Yu
School of Computer Science, Xi’an Polytechnic University, Xi’an 710600, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
ZHU Xin-JuanZHU Xin-Juan
School of Computer Science, Xi’an Polytechnic University, Xi’an 710600, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

In order to improve the authenticity of Chinese lip synchronized facial animation videos, this study proposes a text audio-driven facial animation generation technology based on the improved Wav2Lip model. Firstly, a Chinese lip synchronized dataset is constructed, which is used to pre-train the lip discriminator to make it more accurate in discriminating Chinese lip synchronized facial animations. Then, in the Wav2Lip model, text features are introduced to improve lip time synchronization and thus improve the authenticity of facial animation videos. The model in this study synthesizes the extracted text information, audio information, and speaker facial information and generates a highly realistic lip synchronized facial animation video under the supervision of a pre-trained lip discriminator and video quality discriminator. The comparative experiments with the ATVGnet model and Wav2Lip model show that the lip synchronized facial animation video generated by the proposed model improves the synchronization between lip shape and audio and enhances the overall realism of the facial animation video. The paper provides a solution for the current facial animation generation.

Key words:text audio drive;facial animation;Wav2Lip model;animation generation

Get Citation

孙瑜,朱欣娟.改进Wav2Lip的文本音频驱动人脸动画生成.计算机系统应用,2024,33(2):276-283

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:August 17,2023
Revised:September 26,2023
Adopted:
Online: December 18,2023
Published: February 05,2023

Article QR Code

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063