本文已被:浏览 1383次 下载 2191次
Received:January 10, 2017
Received:January 10, 2017
中文摘要: 社交网络中的文本具有随意性和非正规性等特点,一种常见现象是社交网络文本中存在大量变体词.人们往往为了避免审查、表达情感等将原来的词用变体词替代,原来的词成为目标词.本文研究变体词的规范化任务,即找到变体词所对应的初始目标词.本文利用变体词所在文本的时间和语义,结合变体词词性,提出了一种时间和语义结合的方法获取候选目标词,然后提出基于字词联合的词向量方法对候选目标词排序.我们的方法不需要额外的标注数据,实验结果表明,相比于当前最好的方法在准确性上具有一定的提升,针对与目标词存在相同的字的变体词其性能更好.
Abstract:The text is informal in social networks. One of the common phenomena is that there are a lot of morphs in social networks. People are keen on creating morphs to replace their real targets to avoid censorship and express strong sentiment. In this paper we aim to solve the problem of finding real targets corresponding to their entity morphs. We exploit the temporal and semantic and POS constraints to collect target candidates. Then we propose a method based on joint character-word training to sort the target candidates. Our method does not need any additional annotation corpora. Experimental results demonstrate that our approach achieved some improvement over state-of-the-art method. The results also show that the performance is better when morphs share the same character as targets.
文章编号: 中图分类号: 文献标志码:
基金项目:国家重点研发计划(2016YFB0801003);青年科学基金项目(61402466)
引用文本:
施振辉,沙灜,梁棋,李锐,邱泳钦,王斌.基于字词联合的变体词规范化研究.计算机系统应用,2017,26(10):29-35
SHI Zhen-Hui,SHA Ying,LIANG Qi,LI Rui,QIU Yong-Qin,WANG Bin.Research on Morph Normalization Based on Joint Learning of Character and Word.COMPUTER SYSTEMS APPLICATIONS,2017,26(10):29-35
施振辉,沙灜,梁棋,李锐,邱泳钦,王斌.基于字词联合的变体词规范化研究.计算机系统应用,2017,26(10):29-35
SHI Zhen-Hui,SHA Ying,LIANG Qi,LI Rui,QIU Yong-Qin,WANG Bin.Research on Morph Normalization Based on Joint Learning of Character and Word.COMPUTER SYSTEMS APPLICATIONS,2017,26(10):29-35