Text Classification Model Based on Width and Word Vector Feature

doi:10.15888/j.cnki.csa.007827

AIPUB归智期刊联盟

WeChat

Mobile website

2025-4-11- 5

Home > Archive>Volume 30, Issue 3, 2021 >177-183. DOI:10.15888/j.cnki.csa.007827

PDF HTML XML Export Cite reminder

Text Classification Model Based on Width and Word Vector Feature
DOI:
                        10.15888/j.cnki.csa.007827
                    
CSTR:
                        [cstr]
                    
Author:
                        LI Xue-SongLI Xue-Song
Digital Personal Banking Department, Bank of China, Beijing 100818, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

To resolve the issues of weak memory ability and no global word feature information in the word-vector-based text classification model, we propose a text classification model (WideText) based on the width and word vector features. Firstly, text cleaning, word segmentation, unit encoding and dictionary definitions are carried out. Secondly, the Term Frequency-Inverse Document Frequency (TF-IDF) index of the global word units is calculated and each text is vectorized. Furthermore, the words in the input text are mapped to the word embedding matrix through encoding. After the word vector features are embedded and averagely superimposed, they are spliced with the text vector features based on TF-IDF and transmitted to the output layer. Finally, the probability of the features belonging to each category is calculated. The proposed model combines the expressive ability of text vector features on the basis of low-dimensional word vectors and has excellent generalization and memory abilities. The experimental results show that after the introduction of the width feature, the WideText classification performance is significantly improved in comparison with that in the word-vector-based text classification model and also slightly better than that in the feedforward neural network classifiers.

Key words:Word2Vec;FastText;WideText;text classification

Get Citation

李雪松.基于宽度和词向量特征的文本分类模型.计算机系统应用,2021,30(3):177-183

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:June 15,2020
Revised:July 14,2020
Adopted:
Online: March 06,2021
Published:

Article QR Code

You are the first991009Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063