###
计算机系统应用英文版:2022,31(9):403-408
本文二维码信息
码上扫一扫!
GM-FastText多通道词向量短文本分类模型
(湖北大学 计算机与信息工程学院, 武汉 430062)
Short Text Classification Model of GM-FastText Multi-channel Word Vector
(School of Computer Science and Information Engineering, Hubei University, Wuhan 430062, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 701次   下载 1426
Received:November 23, 2021    Revised:December 20, 2021
中文摘要: 在针对短文本分类中文本特征稀疏难以提取、用词不规范导致OOV (out of vocabulary)等问题, 提出了基于FastText模型多通道嵌入词向量, 和GRU (gate recurrent unit)与多层感知机(multi-layer perceptron, MLP)混合网络结构(GRU-MLP hybrid network architecture, GM)的短文本分类模型GM-FastText. 该模型使用FastText模型以N-gram方式分别产生不同的嵌入词向量送入GRU层和MLP层获取短文本特征, 通过GRU对文本的特征提取和MLP层混合提取不同通道的文本特征, 最后映射到各个分类中. 多组对比实验结果表明: 与TextCNN、TextRNN方法对比, GM-FastText模型F1指标提升0.021和0.023, 准确率提升1.96和2.08个百分点; 与FastText, FastText-CNN, FastText-RNN等对比, GM-FastText模型F1指标提升0.006、0.014和0.016, 准确率提升0.42、1.06和1.41个百分点. 通过对比发现, 在FastText多通道词向量和GM混合结构网络的作用下, 多通道词向量在短文本分类中有更好的词向量表达且GM网络结构对多参数特征提取有更好的性能.
Abstract:To tackle the problems in short text classification, such as difficult extraction of sparse text features and out of vocabulary (OOV) caused by non-standard words, this study proposes a short text classification model GM-FastText based on the FastText multi-channel embedded word vector and the GRU-MLP hybrid network architecture (GM) built by a gated recurrent unit (GRU) and multi-layer perceptron (MLP). This model uses the FastText model to generate different embedded word vectors in the N-gram mode and feeds them into the GRU layer and MLP layer to obtain short text features. After the extraction of text features by GRU and the hybrid extraction of the text features in different channels in the MLP layer, they are finally mapped to each classification. The experimental results show that compared with TextCNN and TextRNN, the GM-FastText model has an F1 index increased by 0.021 and 0.023 and accuracy by 1.96 and 2.08 percentage points. Moreover, compared with FastText, FastText-CNN and FastText-RNN, the GM-FastText has an F1 index improved by 0.006, 0.014 and 0.016 and accuracy by 0.42, 1.06 and 1.41 percentage points. In short, under the action of FastText multi-channel word vector and GM hybrid structure network, the multi-channel word vector has better word vector expression in short text classification and the GM network structure has better performance for multi-parameter feature extraction.
文章编号:     中图分类号:    文献标志码:
基金项目:国家自然科学基金面上项目(61977021)
引用文本:
白子诚,周艳玲,张龑.GM-FastText多通道词向量短文本分类模型.计算机系统应用,2022,31(9):403-408
BAI Zi-Cheng,ZHOU Yan-Ling,ZHANG Yan.Short Text Classification Model of GM-FastText Multi-channel Word Vector.COMPUTER SYSTEMS APPLICATIONS,2022,31(9):403-408