基于NLTK的中文文本内容抽取方法

doi:10.15888/j.cnki.csa.006700

微信公众号

网站二维码

首页 > 过刊浏览>2019年第28卷第1期 >275-278. DOI:10.15888/j.cnki.csa.006700

PDF HTML阅读 XML下载导出引用引用提醒

基于NLTK的中文文本内容抽取方法
DOI:
                        10.15888/j.cnki.csa.006700
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

Chinese Text Information Extraction Based on NLTK

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

NLTK是Python中用于自然语言处理的第三方模块，但处理中文文本具有一定局限性.利用NLTK对中文文本中的信息内容进行抽取与挖掘，采用同语境词提取、双连词搭配提取、概率统计以及篇章分析等方法，得到一个适用于中文文本的NLTK文本内容抽取框架，及其具体的实现方法.经实证分析表明，在抽取结果中可以找到反映文本特点的语料内容，得到抽取结果与文本主题具有较强相关性的结论.

Abstract:

NLTK is a module for processing natural language text in Python, but it has limitations when processing Chinese text. To extracted information in the text by using NLTK, the means created in this study included a group of methods, such as common context words extraction, bigrams words extraction, probability statistics, and discourse analysis. Both of NLTK text content extraction framework suitable for Chinese texts and implementation method are obtained. In the results of empirical, it finds the content of the corpus which reflects the characteristics of the text, and gets the conclusion that a strong correlation between the results of extraction and text topic.

参考文献

相似文献

引证文献

引用本文

李晨,刘卫国.基于NLTK的中文文本内容抽取方法.计算机系统应用,2019,28(1):275-278

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2018-05-28
最后修改日期:2018-06-19
录用日期:
在线发布日期: 2018-12-27
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码