###

计算机系统应用英文版:2019,28(1):275-278

View/Add Comment 过刊浏览高级检索 HTML

←前一篇 | 后一篇→

码上扫一扫！

下载全文

基于NLTK的中文文本内容抽取方法

李晨, 刘卫国

(中南大学信息科学与工程学院, 长沙 410083)

Chinese Text Information Extraction Based on NLTK

LI Chen, LIU Wei-Guo

(School of Information Science and Engineering, Central South University, Changsha 410083, China)

摘要

图/表

参考文献

相似文献

本文已被：浏览 2271次下载 2882次
Received:May 28, 2018 Revised:June 19, 2018

中文摘要: NLTK是Python中用于自然语言处理的第三方模块，但处理中文文本具有一定局限性.利用NLTK对中文文本中的信息内容进行抽取与挖掘，采用同语境词提取、双连词搭配提取、概率统计以及篇章分析等方法，得到一个适用于中文文本的NLTK文本内容抽取框架，及其具体的实现方法.经实证分析表明，在抽取结果中可以找到反映文本特点的语料内容，得到抽取结果与文本主题具有较强相关性的结论.

中文关键词: 自然语言处理中文文本自然语言处理工具包

Abstract:NLTK is a module for processing natural language text in Python, but it has limitations when processing Chinese text. To extracted information in the text by using NLTK, the means created in this study included a group of methods, such as common context words extraction, bigrams words extraction, probability statistics, and discourse analysis. Both of NLTK text content extraction framework suitable for Chinese texts and implementation method are obtained. In the results of empirical, it finds the content of the corpus which reflects the characteristics of the text, and gets the conclusion that a strong correlation between the results of extraction and text topic.

keywords: natural language processing Chinese texts NLTK

文章编号： 中图分类号： 文献标志码：

基金项目:

Author Name	Affiliation	E-mail
LI Chen	School of Information Science and Engineering, Central South University, Changsha 410083, China
LIU Wei-Guo	School of Information Science and Engineering, Central South University, Changsha 410083, China	liuwg@csu.edu.cn

Author Name	Affiliation	E-mail
LI Chen	School of Information Science and Engineering, Central South University, Changsha 410083, China
LIU Wei-Guo	School of Information Science and Engineering, Central South University, Changsha 410083, China	liuwg@csu.edu.cn

引用文本：
李晨,刘卫国.基于NLTK的中文文本内容抽取方法.计算机系统应用,2019,28(1):275-278
LI Chen,LIU Wei-Guo.Chinese Text Information Extraction Based on NLTK.COMPUTER SYSTEMS APPLICATIONS,2019,28(1):275-278