Extraction Scheme of Target Speaker's Speech Under Multi-Speaker Environment

AIPUB归智期刊联盟

WeChat

Mobile website

2025-4-4- 1

Home > Archive>Volume 25, Issue 4, 2016 >8-15

PDF HTML XML Export Cite reminder

Extraction Scheme of Target Speaker's Speech Under Multi-Speaker Environment
DOI:
                        
                    
CSTR:
                        [cstr]
                    
Author:
                        YE Yu-LinYE Yu-Lin
78438 Troops of the Chinese Peoples Liberation Army, Chengdu 610066, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
MO Jian-HuaMO Jian-Hua
78438 Troops of the Chinese Peoples Liberation Army, Chengdu 610066, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
LIU XiaLIU Xia
78438 Troops of the Chinese Peoples Liberation Army, Chengdu 610066, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Aiming at multi-speaker communication scene in real life, an effective solution is designed and proposed based on researches of underdetermined blind speech separation method of target sound source's azimuth information and nonlinear time-frequency masking and BP speaker recognition technology, which can extract any target speaker's speech in any orientation. The solution is generally divided into two stages, one is target speech search and the other is target speech extraction. The search stage uses BP speaker recognition technology. The speech extraction stage uses the method of underdetermined blind speech separation based on sound source azimuth information by an improved potential function clustering and nonlinear time-frequency masking. The results show that the solution is feasible. It can effectively extract the target speaker's speech in any position from the mixed speech stream. The average SNRG is 8.68dB, the similarity coefficient is 85%, the recognition rate is 61%, and the running time is 20.6S.

Key words:underdetermined blind source separation;potential function clustering;nonlinear time-frequency masking;BP speaker recognition

Get Citation

叶于林,莫建华,刘夏.多说话人环境下目标说话人语音提取方案.计算机系统应用,2016,25(4):8-15

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:July 10,2015
Revised:August 12,2015
Adopted:
Online: April 19,2016
Published:

Article QR Code

You are the first990540Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063