Text Classification Model Based on Width and Word Vector Feature
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    To resolve the issues of weak memory ability and no global word feature information in the word-vector-based text classification model, we propose a text classification model (WideText) based on the width and word vector features. Firstly, text cleaning, word segmentation, unit encoding and dictionary definitions are carried out. Secondly, the Term Frequency-Inverse Document Frequency (TF-IDF) index of the global word units is calculated and each text is vectorized. Furthermore, the words in the input text are mapped to the word embedding matrix through encoding. After the word vector features are embedded and averagely superimposed, they are spliced with the text vector features based on TF-IDF and transmitted to the output layer. Finally, the probability of the features belonging to each category is calculated. The proposed model combines the expressive ability of text vector features on the basis of low-dimensional word vectors and has excellent generalization and memory abilities. The experimental results show that after the introduction of the width feature, the WideText classification performance is significantly improved in comparison with that in the word-vector-based text classification model and also slightly better than that in the feedforward neural network classifiers.

    Reference
    Related
    Cited by
Get Citation

李雪松.基于宽度和词向量特征的文本分类模型.计算机系统应用,2021,30(3):177-183

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:June 15,2020
  • Revised:July 14,2020
  • Adopted:
  • Online: March 06,2021
  • Published:
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063