Abstract:NLTK is a module for processing natural language text in Python, but it has limitations when processing Chinese text. To extracted information in the text by using NLTK, the means created in this study included a group of methods, such as common context words extraction, bigrams words extraction, probability statistics, and discourse analysis. Both of NLTK text content extraction framework suitable for Chinese texts and implementation method are obtained. In the results of empirical, it finds the content of the corpus which reflects the characteristics of the text, and gets the conclusion that a strong correlation between the results of extraction and text topic.