###

计算机系统应用英文版:2018,27(7):162-166

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

多特征关键词提取算法研究

王洁, 王丽清

(云南大学信息学院, 昆明 650223)

Research on Multi-Feature Keyword Extraction Algorithm

WANG Jie, WANG Li-Qing

(School of Information Science & Engineering, Yunnan University, Kunming 650223, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 1964次下载 2240次
Received:November 23, 2017 Revised:December 15, 2017

中文摘要: 关键词提取技术是语料库构建、文本分析处理、信息检索的基础.采用传统的TFIDF算法提取关键词时，主要依据词频计算权重，而未考虑文本特征项的影响，由于对词频的过度依赖，导致其提取关键词的准确性不高.针对这个问题，本文根据关键词的特性，引入位置和词性作为影响因子，对TFIDF算法权重重新进行了计算和排序，从而改进该算法，并利用Python语言完成了实现.实验结果表明，采用该改进方法提取关键词，其召回率、准确率、F因子与传统方法相比均得到明显提升.

中文关键词: 多特征 TFIDF 关键词提取 Python

Abstract:Keyword extraction technology is the foundation of corpus construction, text analysis, and information retrieval. The traditional TFIDF algorithm is mainly based on word frequency weighting to extract keywords without considering the influence of text features. The excessive reliance on word frequency leads to the inaccuracy of extract keywords. To solve this problem, an improved algorithm has been proposed, which use the word position and the word information as factors to recalculate the weight, then we implement it in Python. Experiment shows that using this method to extract keywords can improve the recall rate, accuracy, and F-measure.

keywords: multi-feature TFIDF keyword extraction Python

文章编号： 中图分类号： 文献标志码：

基金项目:云南省教育厅产业化扶持项目（2016CYH03）；云南省科技创新强省计划项目（2014AB021）；云南省创新团队项目

Author Name	Affiliation	E-mail
WANG Jie	School of Information Science & Engineering, Yunnan University, Kunming 650223, China
WANG Li-Qing	School of Information Science & Engineering, Yunnan University, Kunming 650223, China	wlq@ynu.edu.cn

Author Name	Affiliation	E-mail
WANG Jie	School of Information Science & Engineering, Yunnan University, Kunming 650223, China
WANG Li-Qing	School of Information Science & Engineering, Yunnan University, Kunming 650223, China	wlq@ynu.edu.cn

引用文本：
王洁,王丽清.多特征关键词提取算法研究.计算机系统应用,2018,27(7):162-166
WANG Jie,WANG Li-Qing.Research on Multi-Feature Keyword Extraction Algorithm.COMPUTER SYSTEMS APPLICATIONS,2018,27(7):162-166