###
计算机系统应用英文版:2017,26(10):29-35
本文二维码信息
码上扫一扫!
基于字词联合的变体词规范化研究
施振辉1,2, 沙灜1,2, 梁棋1,2, 李锐1,2, 邱泳钦1,2, 王斌1,2
(1.中国科学院 信息工程研究所, 北京 100093;2.
中国科学院大学, 北京 100049)
Research on Morph Normalization Based on Joint Learning of Character and Word
(1.Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;2.
University of Chinese Academy of Sciences, Beijing 100049, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 1383次   下载 2191
Received:January 10, 2017    
中文摘要: 社交网络中的文本具有随意性和非正规性等特点,一种常见现象是社交网络文本中存在大量变体词.人们往往为了避免审查、表达情感等将原来的词用变体词替代,原来的词成为目标词.本文研究变体词的规范化任务,即找到变体词所对应的初始目标词.本文利用变体词所在文本的时间和语义,结合变体词词性,提出了一种时间和语义结合的方法获取候选目标词,然后提出基于字词联合的词向量方法对候选目标词排序.我们的方法不需要额外的标注数据,实验结果表明,相比于当前最好的方法在准确性上具有一定的提升,针对与目标词存在相同的字的变体词其性能更好.
Abstract:The text is informal in social networks. One of the common phenomena is that there are a lot of morphs in social networks. People are keen on creating morphs to replace their real targets to avoid censorship and express strong sentiment. In this paper we aim to solve the problem of finding real targets corresponding to their entity morphs. We exploit the temporal and semantic and POS constraints to collect target candidates. Then we propose a method based on joint character-word training to sort the target candidates. Our method does not need any additional annotation corpora. Experimental results demonstrate that our approach achieved some improvement over state-of-the-art method. The results also show that the performance is better when morphs share the same character as targets.
文章编号:     中图分类号:    文献标志码:
基金项目:国家重点研发计划(2016YFB0801003);青年科学基金项目(61402466)
引用文本:
施振辉,沙灜,梁棋,李锐,邱泳钦,王斌.基于字词联合的变体词规范化研究.计算机系统应用,2017,26(10):29-35
SHI Zhen-Hui,SHA Ying,LIANG Qi,LI Rui,QIU Yong-Qin,WANG Bin.Research on Morph Normalization Based on Joint Learning of Character and Word.COMPUTER SYSTEMS APPLICATIONS,2017,26(10):29-35