本文已被:浏览 1828次 下载 2146次
Received:November 23, 2017 Revised:December 15, 2017
Received:November 23, 2017 Revised:December 15, 2017
中文摘要: 关键词提取技术是语料库构建、文本分析处理、信息检索的基础.采用传统的TFIDF算法提取关键词时,主要依据词频计算权重,而未考虑文本特征项的影响,由于对词频的过度依赖,导致其提取关键词的准确性不高.针对这个问题,本文根据关键词的特性,引入位置和词性作为影响因子,对TFIDF算法权重重新进行了计算和排序,从而改进该算法,并利用Python语言完成了实现.实验结果表明,采用该改进方法提取关键词,其召回率、准确率、F因子与传统方法相比均得到明显提升.
Abstract:Keyword extraction technology is the foundation of corpus construction, text analysis, and information retrieval. The traditional TFIDF algorithm is mainly based on word frequency weighting to extract keywords without considering the influence of text features. The excessive reliance on word frequency leads to the inaccuracy of extract keywords. To solve this problem, an improved algorithm has been proposed, which use the word position and the word information as factors to recalculate the weight, then we implement it in Python. Experiment shows that using this method to extract keywords can improve the recall rate, accuracy, and F-measure.
keywords: multi-feature TFIDF keyword extraction Python
文章编号: 中图分类号: 文献标志码:
基金项目:云南省教育厅产业化扶持项目(2016CYH03);云南省科技创新强省计划项目(2014AB021);云南省创新团队项目
引用文本:
王洁,王丽清.多特征关键词提取算法研究.计算机系统应用,2018,27(7):162-166
WANG Jie,WANG Li-Qing.Research on Multi-Feature Keyword Extraction Algorithm.COMPUTER SYSTEMS APPLICATIONS,2018,27(7):162-166
王洁,王丽清.多特征关键词提取算法研究.计算机系统应用,2018,27(7):162-166
WANG Jie,WANG Li-Qing.Research on Multi-Feature Keyword Extraction Algorithm.COMPUTER SYSTEMS APPLICATIONS,2018,27(7):162-166