Hierarchical Text Classification for Label Co-occurrence and Long-tail Distribution
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [27]
  • |
  • Related [9]
  • | | |
  • Comments
    Abstract:

    There are two problems in existing hierarchical text classification model: underutilization of the label information across hierarchical instances, and lack of handling unbalanced label distribution. To solve these problems, this study proposes a hierarchical text classification method for label co-occurrence and long-tail distribution (LC-LTD) to study the global semantic of text based on shared labels and balanced loss function for long-tail distribution. First, a contrastive learning objective based on shared labels is devised to narrow the semantic distance between text representations with more shared labels in feature space and to guide the model to generate discriminative semantic representations. Second, the distribution balanced loss function is introduced to replace binary cross-entropy loss to alleviate the long-tail distribution problem inherent in hierarchical classification, improving the generalization ability of the model. LC-LTD is compared with various mainstream models on WOS and BGC public datasets, and the results show that the proposed method achieves better classification performance and is more suitable for hierarchical text classification.

    Reference
    [1] Zhang Y, Shen ZH, Dong YX, et al. MATCH: Metadata-aware text classification in a large hierarchy. Proceedings of the Web Conference 2021. Ljubljana: ACM, 2021. 3246–3257.
    [2] 黄威. 层次化多标签分类方法及其应用研究 [博士学位论文]. 合肥: 中国科学技术大学, 2023.
    [3] 郭豪. 基于标签嵌入的层级多标签分类方法设计 [硕士学位论文]. 重庆: 重庆邮电大学, 2022.
    [4] Kumar A, Toshinwal D. HLC: Hierarchically-aware label correlation for hierarchical text classification. Applied Intelligence, 2024, 54(2): 1602–1618.
    [5] Zangari A, Marcuzzo M, Rizzo M, et al. Hierarchical text classification and its foundations: A review of current research. Electronics, 2024, 13(7): 1199.
    [6] Cao YK, Wei ZY, Tang YJ, et al. Hierarchical label text classification method with deep-level label-assisted classification. Proceedings of the 12th IEEE Data Driven Control and Learning Systems Conference. Xiangtan: IEEE, 2023. 1467–1474.
    [7] Wang ZH, Wang PY, Huang LZ, et al. Incorporating hierarchy into text encoder: A contrastive learning approach for hierarchical text classification. arXiv:2203.03825v2, 2022.
    [8] Su XA, Wang R, Dai XY. Contrastive learning-enhanced nearest neighbor mechanism for multi-label text classification. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Dublin: ACL, 2022. 672–679.
    [9] 王嫄, 徐涛, 王世龙, 等. 层级标签语义引导的极限多标签文本分类策略. 中文信息学报, 2021, 35(10): 110–118.
    [10] 赵海燕, 曹杰, 陈庆奎, 等. 层次多标签文本分类方法. 小型微型计算机系统, 2022, 43(4): 673–683.
    [11] Li SB, Hu J, Cui YX, et al. DeepPatent: Patent classification with convolutional neural networks and word embedding. Scientometrics, 2018, 117(2): 721–744.
    [12] Li YD, Zhang YQ, Zhao Z, et al. CSL: A large-scale Chinese scientific literature dataset. Proceedings of the 29th International Conference on Computational Linguistics. Gyeongju: ACL, 2022. 3917–3923.
    [13] Banerjee S, Akkaya C, Perez-Sorrosal F, et al. Hierarchical transfer learning for multi-label text classification. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019. 6295–6300.
    [14] Shimura K, Li JY, Fukumoto F. HFT-CNN: Learning hierarchical category structure for multi-label short text categorization. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels: ACL, 2018. 811–816.
    [15] 滕思洁. 基于图神经网络的层级文本分类 [硕士学位论文]. 合肥: 中国科学技术大学, 2022.
    [16] Zhou J, Ma CP, Long DK, et al. Hierarchy-aware global model for hierarchical text classification. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 2020. 1106–1117.
    [17] Chen HB, Ma QL, Lin ZX, et al. Hierarchy-aware label semantics matching network for hierarchical text classification. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). ACL, 2021. 4370–4379.
    [18] Deng ZF, Peng H, He DX, et al. HTCInfoMax: A global model for hierarchical text classification via information maximization. arXiv:2104.05220v1, 2021.
    [19] Devlin J, Chang MW, Lee K, et al. BERT: Pre-training of deep bidirectional Transformers for language understanding. arXiv:1810.04805v2, 2019.
    [20] Ying CX, Cai TL, Luo SJ, et al. Do transformers really perform bad for graph representation? Proceedings of the 35th International Conference on Neural Information Processing Systems. Curran Associates Inc., 2021. 2212.
    [21] Jang E, Gu SX, Poole B. Categorical reparameterization with gumbel-softmax. arXiv:1611.01144v5, 2017.
    [22] Chen T, Kornblith S, Norouzi M, et al. A simple framework for contrastive learning of visual representations. Proceedings of the 37th International Conference on Machine Learning. PMLR, 2020. 1597–1607.
    [23] Wu T, Huang QQ, Liu ZW, et al. Distribution-balanced loss for multi-label classification in long-tailed datasets. Proceedings of the 16th European Conference on Computer Vision. Glasgow: Springer, 2020. 162–178.
    [24] Kowsari K, Brown DE, Heidarysafa M, et al. HDLTex: Hierarchical deep learning for text classification. Proceedings of the 16th IEEE International Conference on Machine Learning and Applications. Cancun: IEEE, 2017. 364–371.
    [25] Aly R, Remus S, Biemann C. Hierarchical multi-label classification of text with capsule networks. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Florence: ACL, 2019. 323–330.
    [26] Peng H, Li JX, He Y, et al. Large-scale hierarchical text classification with recursively regularized deep graph-CNN. Proceedings of the 2018 World Wide Web Conference. Lyon: International World Wide Web Conferences Steering Committee, 2018. 1063–1072.
    [27] Zhu H, Zhang C, Huang JJ, et al. HiTIN: Hierarchy-aware tree isomorphism network for hierarchical text classification. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Toronto: ACL, 2023. 7809–7821.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

智媛,雷海卫,张斌龙.面向标签共现和长尾分布的层级文本分类.计算机系统应用,2025,34(2):174-182

Copy
Share
Article Metrics
  • Abstract:99
  • PDF: 318
  • HTML: 67
  • Cited by: 0
History
  • Received:July 29,2024
  • Revised:August 20,2024
  • Online: November 28,2024
Article QR Code
You are the first990692Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063