Research on Chinese Short Text Classification Based on Word2Vec

doi:10.15888/j.cnki.csa.006325

AIPUB归智期刊联盟

WeChat

Mobile website

2025-4-25- 18

Home > Archive>Volume 27, Issue 5, 2018 >209-215. DOI:10.15888/j.cnki.csa.006325

PDF HTML XML Export Cite reminder

Research on Chinese Short Text Classification Based on Word2Vec
DOI:
                        10.15888/j.cnki.csa.006325
                    
CSTR:
                        [cstr]
                    
Author:
                        WANG JingWANG Jing
School of Computer Science, South-Central University for Nationalities, Wuhan 430074, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
LUO LangLUO Lang
School of Computer Science, South-Central University for Nationalities, Wuhan 430074, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
WANG De-QiangWANG De-Qiang
School of Computer Science, South-Central University for Nationalities, Wuhan 430074, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

To address the problems such as the inherent sparsity in the short text and the "lexical gap" of traditional classification model, using Word2Vec model to map words to a spatial vector of low-dimensional real number according to context semantic relations can effectively ease the sparse feature issue of short text. However, further study found that only using Word2Vec will ignore the influence of different parts of speech on the short text. Therefore, we introduce part of speech to improve the feature weighting approach, in which the contribution of speech is embedded into the traditional TF-IDF algorithm to calculate the weight of the words in the short text, and the vector of short text is generated by combining the word vector of Word2Vec. Finally, we use the SVM to achieve short text classification. Experimental results on Fudan University Chinese text classification corpus validate the effectiveness of the proposed method.

Key words:Word2Vec;TF-IDF;text representation;short text classification

Get Citation

汪静,罗浪,王德强.基于Word2Vec的中文短文本分类问题研究.计算机系统应用,2018,27(5):209-215

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:August 18,2017
Revised:September 05,2017
Adopted:
Online: March 12,2018
Published:

Article QR Code

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063