###
计算机系统应用英文版:2019,28(1):275-278
←前一篇   |   后一篇→
本文二维码信息
码上扫一扫!
基于NLTK的中文文本内容抽取方法
(中南大学 信息科学与工程学院, 长沙 410083)
Chinese Text Information Extraction Based on NLTK
(School of Information Science and Engineering, Central South University, Changsha 410083, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 2271次   下载 2882
Received:May 28, 2018    Revised:June 19, 2018
中文摘要: NLTK是Python中用于自然语言处理的第三方模块,但处理中文文本具有一定局限性.利用NLTK对中文文本中的信息内容进行抽取与挖掘,采用同语境词提取、双连词搭配提取、概率统计以及篇章分析等方法,得到一个适用于中文文本的NLTK文本内容抽取框架,及其具体的实现方法.经实证分析表明,在抽取结果中可以找到反映文本特点的语料内容,得到抽取结果与文本主题具有较强相关性的结论.
Abstract:NLTK is a module for processing natural language text in Python, but it has limitations when processing Chinese text. To extracted information in the text by using NLTK, the means created in this study included a group of methods, such as common context words extraction, bigrams words extraction, probability statistics, and discourse analysis. Both of NLTK text content extraction framework suitable for Chinese texts and implementation method are obtained. In the results of empirical, it finds the content of the corpus which reflects the characteristics of the text, and gets the conclusion that a strong correlation between the results of extraction and text topic.
文章编号:     中图分类号:    文献标志码:
基金项目:
引用文本:
李晨,刘卫国.基于NLTK的中文文本内容抽取方法.计算机系统应用,2019,28(1):275-278
LI Chen,LIU Wei-Guo.Chinese Text Information Extraction Based on NLTK.COMPUTER SYSTEMS APPLICATIONS,2019,28(1):275-278