Incorporating BERT and Graph Attention Network for Multi-label Text Classification

doi:10.15888/j.cnki.csa.008488

AIPUB归智期刊联盟

WeChat

Mobile website

2025-4-4- 22

Home > Archive>Volume 31, Issue 6, 2022 >167-174. DOI:10.15888/j.cnki.csa.008488

PDF HTML XML Export Cite reminder

Incorporating BERT and Graph Attention Network for Multi-label Text Classification
DOI:
                        10.15888/j.cnki.csa.008488
                    
CSTR:
                        [cstr]
                    
Author:
                        HAO ChaoHAO Chao
Command & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
QIU Hang-PingQIU Hang-Ping
Command & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site
SUN YiSUN Yi
Command & Control Engineering College, Army Engineering University of PLA, Nanjing 210007, China
Find this author on All Journals
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [25]

Related [20]

Cited by

Materials

Comments

Abstract:

The multi-label text classification is one of the important branches of multi-label classification. Existing methods often ignore the relationship between labels, and thus the correlation between labels can hardly be put into effective use, which affects the effects of classification. On this basis, this study proposes a hybrid BERT and graph attention (HBGA) model that fuses BERT and the graph attention network. First, BERT is employed to obtain the context vector representation of the input text, and Bi-LSTM and the capsule network are used to extract the global and local features of the text, respectively. Then, through feature fusion, text feature vectors are constructed. Meanwhile, the correlation between labels is modeled through graphs, and the nodes in graphs are used to represent the word embedding of the labels, and these label vectors are mapped to a set of interdependent classifiers through the graph attention network. Finally, the classifiers are applied to the text features obtained by the feature extraction module for end-to-end training. The classifier and feature information are integrated to obtain the final prediction results. Comparative experiments are performed on datasets Reuters-21578 and AAPD, and the experimental results indicate that the model in this study has been effectively improved on tasks of multi-label text classification.

Key words:multi-label text classification;graph attention network;BERT;deep learning

Reference

[1] 肖琳, 陈博理, 黄鑫, 等. 基于标签语义注意力的多标签文本分类. 软件学报, 2020, 31(4): 1079–1089. [doi: 10.13328/j.cnki.jos.005923

[2] 郝超, 裘杭萍, 孙毅, 等. 多标签文本分类研究进展. 计算机工程与应用, 2021, 57(10): 48–56. [doi: 10.3778/j.issn.1002-8331.2101-0096

[3] Schapire RE, Singer Y. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 1999, 37(3): 297–336. [doi: 10.1023/A:1007614523901

[4] Boutell MR, Luo JB, Shen XP, et al. Learning multi-label scene classification. Pattern Recognition, 2004, 37(9): 1757–1771. [doi: 10.1016/j.patcog.2004.03.009

[5] Tsoumakas G, Katakis I. Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM), 2007, 3(3): 1–13. [doi: 10.4018/jdwm.2007070101

[6] Read J, Pfahringer B, Holmes G, et al. Classifier chains for multi-label classification. Machine Learning, 2011, 85(3): 333–359. [doi: 10.1007/s10994-011-5256-5

[7] Kim Y. Convolutional neural networks for sentence classification. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha: EMNLP, 2014. 1746–1751.

[8] Liu JZ, Chang WC, Wu YX, et al. Deep learning for extreme multi-label text classification. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. Shinjuku: ACM, 2017. 115–124.

[9] Chen GB, Ye DH, Xing ZC, et al. Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. 2017 International Joint Conference on Neural Networks (IJCNN). Anchorage: IEEE, 2017. 2377–2383.

[10] Yang PC, Sun X, Li W, et al. SGM: Sequence generation model for multi-label classification. arXiv: 1806.04822, 2018.

[11] Pal A, Selvakumar M, Sankarasubbu M. MAGNET: Multi-label text classification using attention-based graph neural network. Proceedings of the 12th International Conference on Agents and Artificial Intelligence, Volume 2: ICAART. Valletta: ICAART, 2020. 494–505.

[12] Zhang ML, Zhou ZH. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 1819–1837. [doi: 10.1109/TKDE.2013.39

[13] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space. arXiv: 1301.3781, 2013.

[14] Pennington J, Socher R, Manning CD. Glove: Global vectors for word representation Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha: ACL, 2014. 1532–1543.

[15] Devlin J, Chang MW, Lee K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv: 1810.04805, 2018.

[16] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6000–6010.

[17] Veličković P, Cucurull G, Casanova A, et al. Graph attention networks. arXiv: 1710.10903, 2017.

[18] Chen ZM, Wei XS, Wang P, et al. Multi-label image recognition with graph convolutional networks. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 5172–5181.

[19] Sabour S, Frosst N, Hinton GE. Dynamic routing between capsules. arXiv: 1710.09829, 2017.

[20] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780. [doi: 10.1162/neco.1997.9.8.1735

[21] Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 2005, 18(5-6): 602–610. [doi: 10.1016/j.neunet.2005.06.042

[22] 刘心惠, 陈文实, 周爱, 等. 基于联合模型的多标签文本分类研究. 计算机工程与应用, 2020, 56(14): 111–117. [doi: 10.3778/j.issn.1002-8331.1904-0273

[23] Manning CD, Raghavan P, Schütze H. Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008.

[24] Zhang ML, Zhou ZH. ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 2007, 40(7): 2038–2048. [doi: 10.1016/j.patcog.2006.12.019

[25] Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2014. 3104–3112.

Get Citation

郝超,裘杭萍,孙毅.融合BERT和图注意力网络的多标签文本分类.计算机系统应用,2022,31(6):167-174

Copy

Article Metrics

Abstract:1071
PDF: 2335
HTML: 5635
Cited by: 0

History

Received:August 13,2021
Revised:September 13,2021
Adopted:
Online: May 26,2022
Published:

Article QR Code

You are the first990571Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address：4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code：100190
Phone：010-62661041 Fax： Email：csa (a) iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063