基于RoBERTa-ND的中文实词辨析

doi:10.15888/j.cnki.csa.009099

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年7月25日 11:13 星期五

首页 > 过刊浏览>2023年第32卷第5期 >157-163. DOI:10.15888/j.cnki.csa.009099

PDF HTML阅读 XML下载导出引用引用提醒

基于RoBERTa-ND的中文实词辨析
DOI:
                        10.15888/j.cnki.csa.009099
                    
CSTR:
                        
                    
作者:
                        孙晨瑜孙晨瑜
中国石油大学(华东) 计算机科学与技术学院, 青岛 266580
在期刊界中查找
在百度中查找
在本站中查找
王振琦王振琦
中国石油大学(华东) 计算机科学与技术学院, 青岛 266580
在期刊界中查找
在百度中查找
在本站中查找
张宝宇张宝宇
中国石油大学(华东) 计算机科学与技术学院, 青岛 266580
在期刊界中查找
在百度中查找
在本站中查找
张卫山张卫山
中国石油大学(华东) 计算机科学与技术学院, 青岛 266580
在期刊界中查找
在百度中查找
在本站中查找
侯召祥侯召祥
中国石油大学(华东) 计算机科学与技术学院, 青岛 266580
在期刊界中查找
在百度中查找
在本站中查找
陈涛陈涛
中国石油大学(华东) 计算机科学与技术学院, 青岛 266580
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金（62072469）；中国科学院自动化研究所复杂系统管理与控制国家重点实验室2021年开放课题（20210114）

Chinese Notional Word Discrimination Based on RoBERTa-ND

Author:

SUN Chen-Yu
SUN Chen-Yu
College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Zhen-Qi
WANG Zhen-Qi
College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Bao-Yu
ZHANG Bao-Yu
College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Wei-Shan
ZHANG Wei-Shan
College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
在期刊界中查找
在百度中查找
在本站中查找
HOU Zhao-Xiang
HOU Zhao-Xiang
College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Tao
CHEN Tao
College of Computer Science and Technology, China University of Petroleum, Qingdao 266580, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

在机器阅读理解任务中, 由于中文实词的组合性和隐喻性, 且缺乏有关中文实词辨析的数据集, 因此传统方法对中文实词的理解程度和辨析能力仍然有限. 为此, 构建了一个大规模(600k)的中文实词辨析数据集(Chinese notional word discrimination cloze data set, CND). 在数据集中, 一句话中的一个实词被替换成了空白占位符, 需要从提供的两个候选实词中选择正确答案. 设计了一个基线模型RoBERTa-ND (RoBERTa-based notional word discrimination model)来对候选词进行选择. 模型首先利用预训练语言模型提取语境中的语义信息. 其次, 融合候选实词语义并通过分类任务计算候选词得分. 最后, 通过增强模型对位置及方向信息的感知, 进一步加强了模型的中文实词的辨析能力. 实验表明, 该模型在CND上准确率达到90.21%, 战胜了DUMA (87.59%), GNN-QA (84.23%)等主流的完形填空模型. 该工作填补了中文隐喻语义理解研究的空白, 可以在提高中文对话机器人认知能力等方向开发更多实用价值. 数据集CND及RoBERTa-ND代码均已开源: https://github.com/2572926348/CND-Large-scale-Chinese-National-word-discrimination-dataset.

关键词:隐喻语义理解;中文实词辨析;机器阅读理解

Abstract:

Chinese notional words are combinatorial and metaphorical in nature, and there is a lack of data sets on Chinese notional word discrimination. As a result, the understanding and discriminative capability of traditional methods for Chinese notional words are still limited in machine reading comprehension tasks. For this reason, a large-scale (600k) Chinese notional word discrimination cloze data set (CND) is constructed. In the dataset, a notional word in a sentence is replaced with a blank placeholder, and the correct answer needs to be selected from the two candidate notional words provided. A baseline model, RoBERTa-based notional word discrimination model (RoBERTa-ND), is designed to select candidate words. The model first extracts semantic information in the context using a pre-trained language model. Second, the semantics of candidate notional words are fused, and the scores of candidate words are computed by a classification task. Finally, the model’s ability to discriminate Chinese notional words is further enhanced by enhancing the model’s perception of locations and orientation information. Experiments show that the model achieves the accuracy of 90.21% on CND, beating mainstream cloze test models such as DUMA (87.59%) and GNN-QA (84.23%). This work fills the gap in the research on Chinese metaphorical semantic understanding and can develop more practical value in improving the cognitive ability of Chinese Quiz Bot. The codes of CND and RoBERTa-ND are open-source: https://github.com/2572926348/CND-Large-scale-Chinese-National-word-discrimination-dataset.

Key words:metaphorical semantic understanding;Chinese notional word discrimination;machine reading comprehension

引用本文

孙晨瑜,王振琦,张宝宇,张卫山,侯召祥,陈涛.基于RoBERTa-ND的中文实词辨析.计算机系统应用,2023,32(5):157-163

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-11-03
最后修改日期:2022-12-10
录用日期:
在线发布日期: 2023-03-17
出版日期:

微信公众号

网站二维码

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码