###

计算机系统应用英文版:2018,27(9):40-46

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

基于层级注意力模型的无监督文档表示学习

欧阳文俊, 徐林莉

(中国科学技术大学计算机科学与技术学院, 合肥 230027)

Unsupervised Document Representation Learning Based on Hierarchical Attention Model

OUYANG Wen-Jun, XU Lin-Li

(School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 1763次下载 2655次
Received:January 17, 2018 Revised:February 09, 2018

中文摘要: 许多自然语言应用需要将输入的文本表示成一个固定长度的向量，现有的技术如词嵌入（Word Embeddings）和文档表示（Document Representation）为自然语言任务提供特征表示，但是它们没有考虑句子中每个单词的重要性差别，同时也忽略一个句子在一篇文档中的重要性差别.本文提出一个基于层级注意力机制的文档表示模型（HADR），而且考虑文档中重要的句子和句子中重要的单词因素.实验结果表明，在考虑了单词的重要和句子重要性的文档表示具有更好的性能.该模型在文档（IMBD）的情感分类上的正确率高于Doc2Vec和Word2Vec模型.

中文关键词: 文档表示词嵌入注意力层级无监督学习文档分类

Abstract:Many natural language applications need to represent the input text into a fixed-length vector. Existing technologies such as word embeddings and document representation provide natural representation for natural language tasks, but they do not consider the importance of each word in the sentence, and also ignore the significance of a sentence in a document. This study proposes a Document Representation model based on a Hierarchical Attention (HADR) mechanism, taking into account important sentences in document and important words in sentence. Experimental results show that documents that take into account the importance of words and importance of sentences have better performance. The accuracy of this model in the sentiment classification of documents (IMBD) is higher than that of Doc2Vec and Word2Vec models.

keywords: document representation word embeddings attention hierarchical unsupervised learning document classification

文章编号： 中图分类号： 文献标志码：

基金项目:国家自然科学基金（61673364）

Author Name	Affiliation	E-mail
OUYANG Wen-Jun	School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China	oy01@mail.ustc.edu.cn
XU Lin-Li	School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China

Author Name	Affiliation	E-mail
OUYANG Wen-Jun	School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China	oy01@mail.ustc.edu.cn
XU Lin-Li	School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China

引用文本：
欧阳文俊,徐林莉.基于层级注意力模型的无监督文档表示学习.计算机系统应用,2018,27(9):40-46
OUYANG Wen-Jun,XU Lin-Li.Unsupervised Document Representation Learning Based on Hierarchical Attention Model.COMPUTER SYSTEMS APPLICATIONS,2018,27(9):40-46