GM-FastText多通道词向量短文本分类模型
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金面上项目(61977021)


Short Text Classification Model of GM-FastText Multi-channel Word Vector
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 增强出版
  • |
  • 文章评论
    摘要:

    在针对短文本分类中文本特征稀疏难以提取、用词不规范导致OOV (out of vocabulary)等问题, 提出了基于FastText模型多通道嵌入词向量, 和GRU (gate recurrent unit)与多层感知机(multi-layer perceptron, MLP)混合网络结构(GRU-MLP hybrid network architecture, GM)的短文本分类模型GM-FastText. 该模型使用FastText模型以N-gram方式分别产生不同的嵌入词向量送入GRU层和MLP层获取短文本特征, 通过GRU对文本的特征提取和MLP层混合提取不同通道的文本特征, 最后映射到各个分类中. 多组对比实验结果表明: 与TextCNN、TextRNN方法对比, GM-FastText模型F1指标提升0.021和0.023, 准确率提升1.96和2.08个百分点; 与FastText, FastText-CNN, FastText-RNN等对比, GM-FastText模型F1指标提升0.006、0.014和0.016, 准确率提升0.42、1.06和1.41个百分点. 通过对比发现, 在FastText多通道词向量和GM混合结构网络的作用下, 多通道词向量在短文本分类中有更好的词向量表达且GM网络结构对多参数特征提取有更好的性能.

    Abstract:

    To tackle the problems in short text classification, such as difficult extraction of sparse text features and out of vocabulary (OOV) caused by non-standard words, this study proposes a short text classification model GM-FastText based on the FastText multi-channel embedded word vector and the GRU-MLP hybrid network architecture (GM) built by a gated recurrent unit (GRU) and multi-layer perceptron (MLP). This model uses the FastText model to generate different embedded word vectors in the N-gram mode and feeds them into the GRU layer and MLP layer to obtain short text features. After the extraction of text features by GRU and the hybrid extraction of the text features in different channels in the MLP layer, they are finally mapped to each classification. The experimental results show that compared with TextCNN and TextRNN, the GM-FastText model has an F1 index increased by 0.021 and 0.023 and accuracy by 1.96 and 2.08 percentage points. Moreover, compared with FastText, FastText-CNN and FastText-RNN, the GM-FastText has an F1 index improved by 0.006, 0.014 and 0.016 and accuracy by 0.42, 1.06 and 1.41 percentage points. In short, under the action of FastText multi-channel word vector and GM hybrid structure network, the multi-channel word vector has better word vector expression in short text classification and the GM network structure has better performance for multi-parameter feature extraction.

    参考文献
    相似文献
    引证文献
引用本文

白子诚,周艳玲,张龑. GM-FastText多通道词向量短文本分类模型.计算机系统应用,2022,31(9):403-408

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-11-23
  • 最后修改日期:2021-12-20
  • 录用日期:
  • 在线发布日期: 2022-05-30
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号