School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China;Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences, Beijing 100190, China 在期刊界中查找 在百度中查找 在本站中查找
Policy terms are characterized by timeliness, low frequency, sparsity, and compound phrases. To address the difficulty of traditional term extraction methods in meeting demands, we design and implement a semantic enhanced multi-strategy system of policy term extraction. The system models the features of policy texts from the two dimensions of frequent item mining and semantic similarity. Feature seed words are selected by integrating multiple frequent pattern mining strategies. Low-frequency and sparse policy terms are recalled by pre-training the language model and enhancing semantic matching. Transforming from a cold start without a thesaurus to a hot start with a thesaurus, the system achieves semi-automatic extraction of policy terms. The proposed system can improve the effect of policy text analysis and provide technical support for the construction of a smart government service platform.
[1] Wang H, Wang B, Zou MY, et al. New cyber word discovery using Chinese word segmentation. Proceedings of the IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). Chengdu: IEEE, 2019. 970–975.
[4] Kafando R, Decoupes R, Valentin S, et al. ITEXT-BIO: Intelligent term extraction for biomedical analysis. Health Information Science and Systems, 2021, 9(1): 29. [doi: 10.1007/s13755-021-00156-6
[5] Chen MJ, Xie ZP, Chen XQ, et al. Novel bidirectional aggregation degree feature extraction method for patent new word discovery. Journal of Computer Applications, 2020, 40(3): 631–637. [doi: 10.11772/j.issn.1001-9081.2019071193
[7] Li P, Guang YX, Qiao TL. Research on Chinese new word recognition method. Proceedings of the 4th International Conference on Electronic Information Technology and Computer Engineering. Xiamen: ACM, 2020. 703–707.
[9] Chen P, Lv XQ, Sun N, et al. Building phrase dictionary for defective products with convolutional neural network. Data Analysis and Knowledge Discovery, 2020, 4(11): 112–120. [doi: 10.11925/infotech.2096-3467.2020.0214
[11] Qian Y, Du Y, Deng XW, et al. Detecting new Chinese words from massive domain texts with word embedding. Journal of Information Science, 2019, 45(2): 196–211. [doi: 10.1177/0165551518786676
[13] Choi KH, Na SH. FastText and BERT for automatic term extraction. Annual Conference on Human and Language Technology. Human and Language Technology, 2021: 612–616