不完全匹配的语音和文本语句级对齐

doi:10.15888/j.cnki.csa.009043

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年7月27日 22:32 星期日

首页 > 过刊浏览>2023年第32卷第4期 >300-307. DOI:10.15888/j.cnki.csa.009043

PDF HTML阅读 XML下载导出引用引用提醒

不完全匹配的语音和文本语句级对齐
DOI:
                        10.15888/j.cnki.csa.009043
                    
CSTR:
                        
                    
作者:
                        徐锴徐锴
青岛科技大学 信息科学技术学院, 青岛 266061
在期刊界中查找
在百度中查找
在本站中查找
陶冶陶冶
青岛科技大学 信息科学技术学院, 青岛 266061
在期刊界中查找
在百度中查找
在本站中查找
李辉李辉
青岛科技大学 信息科学技术学院, 青岛 266061
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家重点研发计划(2018YFB1702902); 山东省高等学校青创科技支持计划(2019KJN047)

Sentence Level Text-speech Alignment for Imperfect Transcriptions

Author:

XU Kai
XU Kai
School of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China
在期刊界中查找
在百度中查找
在本站中查找
TAO Ye
TAO Ye
School of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China
在期刊界中查找
在百度中查找
在本站中查找
LI Hui
LI Hui
School of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

语音文本自动对齐技术广泛应用于语音识别与合成、内容制作等领域, 其主要目的是将语音和相应的参考文本在语句、单词、音素等级别的单元进行对齐, 并获得语音与参考文本之间的时间对位信息. 最新的先进对齐方法大多基于语音识别, 一方面, 准确率受限于语音识别效果, 识别字错误率高时文语对齐精度明显下降, 识别字错误率对对齐精度影响较大; 另一方面, 这种对齐方法不能有效处理不完全匹配的长篇幅语音和文本的对齐. 该文提出一种基于锚点和韵律信息的文语对齐方法, 通过基于边界锚点加权的片段标注将语料划分为对齐段和未对齐段, 针对未对齐段使用双门限端点检测方法提取韵律信息, 并检测语句边界, 降低了基于语音识别的对齐方法对语音识别效果的依赖程度. 实验结果表明, 与目前先进的基于语音识别的文语对齐方法比较, 即使在识别字错误率为0.52时, 该文所提方法的对齐准确率仍能提升45%以上; 在音频文本不匹配程度为0.5时, 该文所提方法能提高3%.

关键词:语音文本对齐;韵律信息;锚点;自动语音识别;端点检测

Abstract:

Automatic text-speech alignment technology is widely used in speech recognition and synthesis, content production, and other fields. Automatic text-speech alignment aims to align speech with text in sentence, word, and phoneme units and obtain the time alignment information. Most of the recent alignment methods are based on automatic speech recognition (ASR). On the one hand, the alignment accuracy is limited by the word error rate (WER) of ASR. On the other hand, such methods cannot effectively align imperfect transcriptions. This study proposes a text-speech alignment method based on anchor and prosodic information. Through fragment annotation based on boundary anchor weighting, speech is divided into aligned and unaligned fragments. For unaligned fragments, this study extracts their prosodic information by a dual-threshold endpoint detection method and detects the boundaries of sentences. This approach reduces the dependence of ASR-based text-speech alignment on the speech recognition effect. Compared with the current advanced ASR-based text-speech alignment methods, the proposed method can improve alignment accuracy by more than 45% when the WER is 0.52 and by at least 3% when the degree of incomplete matching is 0.5.

Key words:text-speech alignment;prosodic information;anchor;automatic speech recognition (ASR);endpoint detection

引用本文

徐锴,陶冶,李辉.不完全匹配的语音和文本语句级对齐.计算机系统应用,2023,32(4):300-307

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-09-07
最后修改日期:2022-10-21
录用日期:
在线发布日期: 2022-12-23
出版日期:

微信公众号

网站二维码

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码