本文已被:浏览 1763次 下载 2655次
Received:January 17, 2018 Revised:February 09, 2018
Received:January 17, 2018 Revised:February 09, 2018
中文摘要: 许多自然语言应用需要将输入的文本表示成一个固定长度的向量,现有的技术如词嵌入(Word Embeddings)和文档表示(Document Representation)为自然语言任务提供特征表示,但是它们没有考虑句子中每个单词的重要性差别,同时也忽略一个句子在一篇文档中的重要性差别.本文提出一个基于层级注意力机制的文档表示模型(HADR),而且考虑文档中重要的句子和句子中重要的单词因素.实验结果表明,在考虑了单词的重要和句子重要性的文档表示具有更好的性能.该模型在文档(IMBD)的情感分类上的正确率高于Doc2Vec和Word2Vec模型.
Abstract:Many natural language applications need to represent the input text into a fixed-length vector. Existing technologies such as word embeddings and document representation provide natural representation for natural language tasks, but they do not consider the importance of each word in the sentence, and also ignore the significance of a sentence in a document. This study proposes a Document Representation model based on a Hierarchical Attention (HADR) mechanism, taking into account important sentences in document and important words in sentence. Experimental results show that documents that take into account the importance of words and importance of sentences have better performance. The accuracy of this model in the sentiment classification of documents (IMBD) is higher than that of Doc2Vec and Word2Vec models.
keywords: document representation word embeddings attention hierarchical unsupervised learning document classification
文章编号: 中图分类号: 文献标志码:
基金项目:国家自然科学基金(61673364)
引用文本:
欧阳文俊,徐林莉.基于层级注意力模型的无监督文档表示学习.计算机系统应用,2018,27(9):40-46
OUYANG Wen-Jun,XU Lin-Li.Unsupervised Document Representation Learning Based on Hierarchical Attention Model.COMPUTER SYSTEMS APPLICATIONS,2018,27(9):40-46
欧阳文俊,徐林莉.基于层级注意力模型的无监督文档表示学习.计算机系统应用,2018,27(9):40-46
OUYANG Wen-Jun,XU Lin-Li.Unsupervised Document Representation Learning Based on Hierarchical Attention Model.COMPUTER SYSTEMS APPLICATIONS,2018,27(9):40-46