Chinese Word Segmentation of Self-attention Mechanisms Guided by Syntactic Dependence
CSTR:
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [25]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Based on previous work, this study proposes that the self-attention mechanism guided by syntactic dependency can integrate syntactic dependency knowledge to improve the performance of Chinese word segmentation so that the self-attention mechanism can only focus on those characters that have syntactic dependency influence on the current character’s word segmentation label and learn their influence degree on the current character. In addition, this study performs positional encoding on the self-attention mechanism guided by syntactic dependency trees. The experimental results show that the model has improved its performance compared with the baseline, and the recognition ability of the model for unregistered words has been strengthened.

    Reference
    [1] Zhang Q, Liu XY, Fu JL. Neural networks incorporating dictionaries for Chinese word segmentation. Proceedings of the 2018 AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018. 5682–5689.
    [2] Wu AD, Jiang ZX. Word segmentation in sentence analysis. Proceedings of the 1998 International Conference on Chinese Information Processing. Beijing: Tsinghua University Press, 1998. 169–180.
    [3] Asahara M, Goh CL, Wang XJ, et al. Combining segmenter and chunker for Chinese word segmentation. Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing. Sapporo: ACM, 2003. 144–147.
    [4] Fan C, Li Y. Research on Chinese word segmentation based on conditional random fields. Proceedings of the 17th International Conference on Intelligent Computing Theories and Application. Shenzhen: Springer, 2021. 316–326.
    [5] Qun N, Yan H, Qiu XP, et al. Chinese word segmentation via BILSTM+Semi-CRF with relay node. Journal of Computer Science and Technology, 2020, 35(5): 1115–1126. [doi: 10.1007/s11390-020-9576-4
    [6] Tian YH, Song Y, Xia F, et al. Improving Chinese word segmentation with wordhood memory networks. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 2020. 8274–8285.
    [7] 章登义, 胡思, 徐爱萍. 一种基于双向LSTM的联合学习的中文分词方法. 计算机应用研究, 2019, 36(10): 2920–2924. [doi: 10.19734/j.issn.1001-3695.2018.03.0239
    [8] Cai TT, Ma ZY, Zheng H, et al. NE-LP: Normalized entropy- and loss prediction-based sampling for active learning in Chinese word segmentation on EHRs. Neural Computing and Applications, 2021, 33(19): 12535–12549. [doi: 10.1007/s00521-021-05896-w
    [9] Devlin J, Chang MW, Lee K, et al. BERT: Pre-training of deep bidirectional Transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics. Minneapolis: ACL, 2019. 4171–4186.
    [10] Yang ZL, Dai ZH, Yang YM, et al. XLNet: Generalized autoregressive pretraining for language understanding. Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver: ACM, 2019. 517.
    [11] Diao SZ, Bai JX, Song Y, et al. ZEN: Pre-training Chinese text encoder enhanced by n-gram representations. Findings of the Association for Computational Linguistics: EMNLP. ACL, 2020. 4729–4740.
    [12] Tian YH, Song Y, Ao X, et al. Joint Chinese word segmentation and part-of-speech tagging via two-way attentions of auto-analyzed knowledge. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 2020. 8286–8296.
    [13] 韩虎, 吴渊航, 秦晓雅. 面向方面级情感分析的交互图注意力网络模型. 电子与信息学报, 2021, 43(11): 3282–3290. [doi: 10.11999/JEIT210036
    [14] Chen GM, Tian YH, Song Y, et al. Relation extraction with type-aware map memories of word dependencies. Findings of the Association for Computational Linguistics. ACL, 2021. 2501–2512.
    [15] Zhang ZS, Wu YW, Zhou JR, et al. SG-Net: Syntax-guided machine reading comprehension. Proceedings of the 2020 AAAI Conference on Artificial Intelligence. New York: AAAI, 2020. 9636–9643.
    [16] Shaw P, Uszkoreit J, Vaswani A. Self-attention with relative position representations. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). New Orleans: ACL, 2018. 464–468.
    [17] Wang X, Tu ZP, Wang LY, et al. Self-attention with structural position representations. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong: ACL, 2019. 1403–1409.
    [18] Nguyen DV, Vo LB, van Thin D, et al. Span labeling approach for Vietnamese and Chinese word segmentation. Proceedings of the 18th Pacific Rim International Conference on Artificial Intelligence. Hanoi: Springer, 2021. 244–258.
    [19] Qiu XP, Pei HZ, Yan H, et al. A concise model for multi-criteria Chinese word segmentation with Transformer encoder. Findings of the Association for Computational Linguistics: EMNLP 2020. ACL, 2020. 2887–2897.
    [20] Chen XC, Shi Z, Qiu XP, et al. Adversarial multi-criteria learning for Chinese word segmentation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver: ACL, 2017. 1193–1203.
    [21] Gong JJ, Chen XC, Gui T, et al. Switch-LSTMs for multi-criteria Chinese word segmentation. Proceedings of the 2019 AAAI Conference on Artificial Intelligence. Honolulu: AAAI, 2019. 6457–6464.
    [22] 周裕林, 陈艳平, 黄瑞章, 等. 一种采用机器阅读理解模型的中文分词方法. 西安交通大学学报, 2022, 56(8): 95–103. [doi: 10.7652/xjtuxb202208010
    [23] Maimaiti M, Liu Y, Zheng YH, et al. Segment, mask, and predict: Augmenting Chinese word segmentation with self-supervision. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Punta Cana: ACL, 2021. 2068–2077.
    [24] Huang WP, Cheng XY, Chen LK, et al. Towards fast and accurate neural Chinese word segmentation with multi-criteria learning. Proceedings of the 28th International Conference on Computational Linguistics. Barcelona: ACL, 2020. 2062–2072.
    [25] 韩士洋, 马致远, 杨芳艳, 等. 针对中文分词的带标签注意力的成词记忆网络. 计算机应用研究, 2022, 39(6): 1651–1655. [doi: 10.19734/j.issn.1001-3695.2021.11.0592
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

周保途.句法依存引导的自注意力机制的中文分词.计算机系统应用,2023,32(9):265-271

Copy
Share
Article Metrics
  • Abstract:618
  • PDF: 1819
  • HTML: 924
  • Cited by: 0
History
  • Received:February 20,2023
  • Revised:March 22,2023
  • Online: June 09,2023
Article QR Code
You are the first991220Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063