融合BERT和自注意力机制的张量图卷积网络文本分类
作者:
基金项目:

陕西省重点研发计划 (2024GX-YBXM-548)


Text Classification with Tensor Graph Convolutional Network Fusing BERT and Self-attention Mechanism
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [32]
  • | | | |
  • 文章评论
    摘要:

    TensorGCN模型是图神经网络应用在文本分类领域的SOTA模型之一. 然而在处理文本语义信息方面, 该模型使用的LSTM难以完全地提取短文本语义特征, 且对复杂的语义处理效果不佳; 同时, 由于长文本中包含的语义及句法特征较多, 在进行图间异构信息共享时特征共享不完全, 影响文本分类的准确性. 针对这两个问题, 对TensorGCN模型进行改进, 提出融合BERT和自注意力机制的张量图卷积网络 (BTSGCN)文本分类方法. 首先, 使用BERT代替TensorGCN架构中的LSTM模块进行语义特征提取, 通过考虑给定单词两侧的周围单词来捕获单词之间的依赖关系, 更准确地提取短文本语义特征; 然后, 在图间传播时加入自注意力机制, 帮助模型更好地捕捉不同图之间的特征, 完成特征融合. 在MR、R8、R52和20NG这4个数据集上的实验结果表明BTSGCN相比于其他对比方法的分类准确度更高.

    Abstract:

    TensorGCN model is one of the state-of-the-art (SOTA) models applied by graph neural networks in the field of text classification. However, in terms of processing text semantic information, the long short-term memory (LSTM) used by the model has difficulty in completely extracting the semantic features of short text and performs poorly in handling complex semantic information. At the same time, due to the large number of semantic and syntactic features contained in long texts, feature sharing is incomplete when heterogeneous information is shared among graphs, which affects the accuracy of text classification. To solve these two problems, the TensorGCN model is improved, and a text classification method based on the tensor graph convolutional network fusing BERT and the self-attention mechanism (BTSGCN) is proposed. Firstly, BERT is used to replace the LSTM module in the TensorGCN architecture for semantic feature extraction. It captures the dependencies between words by considering the surrounding words on both sides of a given word, thus extracting the semantic features of short texts more accurately. Then, the self-attention mechanism is added during the propagation among graphs to help the model better capture the features among different graphs and complete the feature fusion. Experimental results on MR, R8, R52, and 20NG datasets show that BTSGCN has higher classification accuracy than other comparison methods.

    参考文献
    [1] 郑诚, 肖双. 结合语法规则和图神经网络的文本分类方法. 小型微型计算机系统, 2024, 45(11): 2594–2601.
    [2] Hu ZK, Hu JQ, Ding WF, et al. Review sentiment analysis based on deep learning. Proceedings of the 12th International Conference on e-Business Engineering. Beijing: IEEE, 2015. 87–94.
    [3] Atanasova P, Simonsen JG, Lioma C, et al. A diagnostic study of explainability techniques for text classification. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. ACL, 2020. 3256–3274.
    [4] Sun XF, Li XY, Li JW, et al. Text classification via large language models. Proceedings of the 2023 Association for Computational Linguistics. Singapore: ACL, 2023. 8990–9005.
    [5] Yu WB, Yin L, Zhang CJ, et al. Application of quantum recurrent neural network in low-resource language text classification. IEEE Transactions on Quantum Engineering, 2024, 5: 2100213.
    [6] Chai YY, Li Z, Liu JH, et al. Compositional generalization for multi-label text classification: A data-augmentation approach. Proceedings of the 38th AAAI Conference on Artificial Intelligence. Vancouver: AAAI, 2024. 17727–17735.
    [7] Ding K, Li JD, Bhanushali R, et al. Deep anomaly detection on attributed networks. Proceedings of the 2019 SIAM International Conference on Data Mining. Tempe: SIAM, 2019. 594–602.
    [8] Papageorgiou G, Economou P, Bersimis S. A method for optimizing text preprocessing and text classification using multiple cycles of learning with an application on shipbrokers emails. Journal of Applied Statistics, 2024, 51(13): 2592–2626.
    [9] Roy PK, Singh JP, Banerjee S. Deep learning to filter SMS Spam. Future Generation Computer Systems, 2020, 102: 524–533.
    [10] Kim Y. Convolutional neural networks for sentence classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha: ACL, 2014. 1746–1751.
    [11] Liu PF, Qiu XP, Huang XJ. Recurrent neural network for text classification with multi-task learning. Proceedings of the 25th International Joint Conference on Artificial Intelligence. New York: AAAI Press, 2016. 2873–2879.
    [12] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780.
    [13] Zhang Y, Liu Q, Song LF. Sentence-state LSTM for text representation. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne: ACL, 2018. 317–327.
    [14] Cho K, Van Merriënboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha: ACL, 2014. 1724–1734.
    [15] Yang ZC, Yang DY, Dyer C, et al. Hierarchical attention networks for document classification. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego: ACL, 2016. 1480–1489.
    [16] Yao L, Mao CS, Luo Y. Graph convolutional networks for text classification. Proceedings of the 33th AAAI Conference on Artificial Intelligence and the 31th Innovative Applications of Artificial Intelligence Conference and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence. Honolulu: AAAI Press, 2019. 905.
    [17] Huang LZ, Ma DH, Li SJ, et al. Text level graph neural network for text classification. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Hong Kong: ACL, 2019. 3444–3450.
    [18] Wu F, Souza A, Zhang TY, et al. Simplifying graph convolutional networks. Proceedings of the 36th International Conference on Machine Learning. Long Beach: PMLR, 2019. 6861–6871.
    [19] Zhang YF, Yu XL, Cui ZY, et al. Every document owns its structure: Inductive text classification via graph neural networks. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 2020. 334–339 .
    [20] Wang KZ, Han SC, Poon J. InducT-GCN: Inductive graph convolutional networks for text classification. Proceedings of the 26th International Conference on Pattern Recognition (ICPR). Montreal: IEEE, 2022. 1243–1249.
    [21] Zhu H, Koniusz P. Simple spectral graph convolution. Proceedings of the 9th International Conference on Learning Representations. OpenReview.net, 2021.
    [22] Linmei H, Yang TC, Shi C, et al. Heterogeneous graph attention networks for semi-supervised short text classification. Proceedings of the 2019 Conference on Empirical????と???ㄠ?孮㈠ぎ???ひ????嵡??扵牡?孥??嵲??楥?即奩??娠桡慮湤朠??????慨?剉???敲瑮?慴汩???楬渠敊慯物?摴椠獃捯牮楦浥楲湥慮湣瑥?慯湮愠汎祡獴極獲?睬椠瑌桡?杧敵湡敧牥愠汐楲穯散摥?歳敩牮湧攮氠?捯潮湧猠瑋牯慮楧渺琠?晃潌爬?爲漰戱甹献琠?椸洲愱朓攴?挳氰愮猼獢楲显楛挲愳瑝椠潌湩??偘慅琬琠教牯湵?剘敘挬漠杚湨楡瑮楧漠湘??监ど资????????????????扲爠?raph convolutional networks for text classification. Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York: AAAI, 2020. 8409–8416.
    [24] Devika R, Vairavasundaram S, Mahenthar C S J, et al. A deep learning model based on BERT and sentence Transformer for semantic keyphrase extraction on big social data. IEEE Access, 2021, 9: 165252–165261.
    [25] Shen DH, Zareapoor M, Yang J. Multimodal image fusion based on point-wise mutual information. Image and Vision Computing, 2021, 105: 104047.
    [26] Yan MY, Chen ZD, Deng L, et al. Characterizing and Understanding GCNs on GPU. IEEE Computer Architecture Letters, 2020, 19(1): 22–25.
    [27] Widiastuti NI. Convolution neural network for text mining and natural language processing. IOP Conference Series: Materials Science and Engineering, 2019, 662(5): 052010.
    [28] Zhang C, Zhu H, Peng X, et al. Hierarchical information matters: Text classification via tree based graph neural network. Proceedings of the 29th International Conference on Computational Linguistics. Gyeongju, 2022. 950–959.
    [29] Peng H, Li JX, He Y, et al. Large-scale hierarchical text classification with recursively regularized deep graph-CNN. Proceedings of the 2018 World Wide Web Conference. Lyon, 2018. 1063–1072.
    [30] Wang YZ, Wang CX, Zhan JY, et al. Text FCG: Fusing contextual information via graph learning for text classification. Expert Systems with Applications, 2023, 219: 119658.
    [31] 王宁, 郭梓昱, 田淑珂, 等. 基于融合特征t-SNE降维的控制图质量异常模式识别. 系统工程理论与实践, 2024, 44(7): 2381–2393.
    [32] Poličar PG, Stražar M, Zupan B. Embedding to reference t-SNE space addresses batch effects in single-cell classification. Machine Learning, 2023, 112(2): 721–740.
    [33] 蒋志强, 陶屹, 崔岩, 等. 文本主题特征和医疗众筹筹资绩效: 基于LDA模型的研究. 中国管理科学. https://link.cnki.net/urlid/11.2835.g3.20240517.2042.005. (2
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

史文艺,朱欣娟.融合BERT和自注意力机制的张量图卷积网络文本分类.计算机系统应用,2025,34(3):152-160

复制
分享
文章指标
  • 点击次数:45
  • 下载次数: 627
  • HTML阅读次数: 7
  • 引用次数: 0
历史
  • 收稿日期:2024-09-06
  • 最后修改日期:2024-10-10
  • 在线发布日期: 2025-01-21
文章二维码
您是第10784838位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号