Feature Weight Analysis and Improvement of TF-IDF Based on Category Information
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    The classical TF-IDF algorithm only considers the feature term frequency, inverse document frequency, etc. but overlooks the distribution information of feature terms between and inside categories. In this study, we calculate the weights of feature terms through the TF-IDF algorithm in the corpus with different scales and analyze the impact of category information on weights. Based on this, a new method is proposed to measure the distribution information of feature terms between and inside categories. Furthermore, an improved TF-IDF-DI algorithm based on category information is proposed by adding two new weights and discrete factors between and inside categories to the classic TF-IDF algorithm. The Naive Bayes algorithm is used to validate the classification performance of the improved algorithm. Experiments show that the algorithm is superior to the classic TF-IDF algorithm in precision, recall, and F1 values.

    Reference
    Related
    Cited by
Get Citation

姚严志,李建良.基于类信息的TF-IDF权重分析与改进.计算机系统应用,2021,30(9):237-241

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 04,2020
  • Revised:January 08,2021
  • Adopted:
  • Online: September 04,2021
  • Published:
Article QR Code
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063