Key Feature Selection Method for Weibo Information Based on BIG-WFCHI
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Feature selection, whose premise is feature extraction, is a key step to improve the accuracy and efficiency in retweeting prediction through achine learning methods. Currently, the approaches commonly adopted in feature selection include Information Gain (IG), mutual information, and CHI-square test (CHI). In the traditional feature selection methods, such problems of IG and CHI as negative correlation and interference calculation elicited by low-frequency words lead to low classification accuracy. In view of these problems, we introduce a balance factor and a word frequency factor in this study to increase the algorithm accuracy. Then, according to the spread characteristics of Weibo information, combined with the improved IG and CHI algorithms, we propose the feature selection method based on Balance Information Gain-Word Frequency CHI-square test (BIG-WFCHI). Furthermore, we experimentally test the proposed method with five classifiers including maximum entropy model, support vector machine, naive Bayes classifier, K-nearest neighbor, and multi-layer perceptron on two heterogeneous data sets. The results show that our method can effectively eliminate both irrelevant and redundant features, increase the classification accuracy, and reduce the running time.

    Reference
    Related
    Cited by
Get Citation

殷仕刚,安洋,蔡欣华,屈小娥.基于BIG-WFCHI的微博信息关键特征选择方法.计算机系统应用,2021,30(2):188-193

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:June 15,2020
  • Revised:July 14,2020
  • Adopted:
  • Online: January 29,2021
  • Published:
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063