Abstract:Massive Chinese information processing is a branch of big data processing, and the use of big data technology for Chinese information processing must be inseparable from Chinese word segmentation, so Chinese word segmentation technology is the basic technology of big data Chinese information processing. Chinese word segmentation technology has been advancing in performance and accuracy since this century. In terms of performance, it mainly improves the segmentation scanning algorithm, the word bank storage technology, and query method to improve the performance. In terms of accuracy, it is mainly to improve the processing method of unregistered words and ambiguous words. This paper gives up the idea of searching by lexicon index and proposes a lexicon storage structure based on character tree. Its segmenting speed is 35 times faster than the normal half method, occupying only 1/5 of its memory. It will be a big step forward in the performance of big data technology in processing Chinese information.