Multimodal Sentiment Analysis Using Interpolation Optimization Features
Author:
  • Article
  • | |
  • Metrics
  • |
  • Reference [24]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Currently, in multimodal sentiment analysis tasks, there are problems such as insufficient single modal feature extraction and lack of stability in data fusion methods. This study proposes a method of optimizing modal features that uses interpolation to solve these problems. Firstly, the interpolation-optimized BERT and GRU models are applied to extract features, and both of the models are used to mine text, audio, and video information. Secondly, an improved attention mechanism is used to fuse text, audio, and video information, thus achieving modal fusion more stably. This method is tested on the MOSI and MOSEI datasets. The experimental results show that using interpolation can improve the accuracy of multi-modal sentiment analysis tasks based on optimizing modal features. This result verifies the effectiveness of interpolation.

    Reference
    [1] 何俊, 刘跃, 何忠文. 多模态情感识别研究进展. 计算机应用研究, 2018, 35(11): 3201–3205.
    [2] Peng W, Hong XP, Zhao GY. Adaptive modality distillation for separable multimodal sentiment analysis. IEEE Intelligent Systems, 2021, 36(3): 82–89.
    [3] Poria S, Cambria E, Gelbukh A. Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon: ACL, 2015. 2539–2544.
    [4] Cai GY, Xia BB. Convolutional neural networks for multimedia sentiment analysis. Proceedings of the 4th CCF International Conference on Natural Language Processing and Chinese Computing. Nanchang: Springer, 2015. 159–167.
    [5] 陈敏. 融合文本和短视频的双模态情感分析 [硕士学位论文]. 南京: 南京邮电大学, 2020.
    [6] Chen MH, Wang S, Liang PP, et al. Multimodal sentiment analysis with word-level fusion and reinforcement learning. Proceedings of the 19th ACM International Conference on Multimodal Interaction. Glasgow: ACM, 2017. 163–171.
    [7] Zadeh A, Liang PP, Mazumder N, et al. Memory fusion network for multi-view sequential learning. Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018. 5634–5641.
    [8] Zadeh AAB, Liang PP, Poria S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne: ACL, 2018. 2236–2246.
    [9] Hazarika D, Zimmermann R, Poria S. MISA: Modality-invariant and -specific representations for multimodal sentiment analysis. Proceedings of the 28th ACM International Conference on Multimedia. Seattle: ACM, 2020. 1122–1131.
    [10] Zhang D, Wu LQ, Li SS, et al. Multi-modal language analysis with hierarchical interaction-level and selection-level attentions. Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME). Shanghai: IEEE, 2019. 724–729.
    [11] Wu J, Zhu TL, Zhu JH, et al. A optimized BERT for multimodal sentiment analysis. ACM Transactions on Multimedia Computing, Communications, and Applications, 2023, 19(2s): 91.
    [12] Yang KC, Xu H, Gao K. CM-BERT: Cross-modal BERT for text-audio sentiment analysis. Proceedings of the 28th ACM International Conference on Multimedia. Seattle: ACM, 2020. 521–528.
    [13] Tsai YHH, Bai SJ, Liang PP, et al. Multimodal transformer for unaligned multimodal language sequences. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019. 6558–6569.
    [14] Han W, Chen H, Gelbukh A, et al. Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis. Proceedings of the 2021 International Conference on Multimodal Interaction. Montréal: ACM, 2021. 6–15.
    [15] Sun H, Wang HY, Liu JQ, et al. CubeMLP: An MLP-based model for multimodal sentiment analysis and depression estimation. Proceedings of the 30th ACM International Conference on Multimedia. Lisboa: ACM Multimedia, 2022. 3722–3729.
    [16] Majumder N, Hazarika D, Gelbukh A, et al. Multimodal sentiment analysis using hierarchical fusion with context modeling. Knowledge-based Systems, 2018, 161: 124–133.
    [17] Han W, Chen H, Poria S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. ACL, 2021. 9180–9192.
    [18] Wei J, Zou K. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong: ACL, 2019. 6382–6388.
    [19] Cho K, Van Merriënboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha: ACL, 2014. 1724–1734.
    [20] Chen H, Han W, Yang DY, et al. DoubleMix: Simple interpolation-based data augmentation for text classification. Proceedings of the 29th International Conference on Computational Linguistics. Gyeongju: International Committee on Computational Linguistics, 2022. 4622–4632.
    [21] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6000–6010.
    [22] Zadeh A, Zellers R, Pincus E, et al. MOSI: Multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos. arXiv:1606.06259, 2016.
    [23] Yu WM, Xu H, Yuan ZQ, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. Proceedings of the 35th AAAI Conference on Artificial Intelligence. AAAI, 2021. 10790–10797.
    [24] 陶全桧, 安俊秀, 陈宏松. 基于跨模态融合ERNIE的多模态情感分析研究. 成都信息工程大学学报, 2022, 37(5): 501–507.
    Cited by
Get Citation

唐业凯,冯广,杨芳捷,林浩泽.利用插值优化特征的多模态情感分析.计算机系统应用,2024,33(10):255-262

Copy
Share
Article Metrics
  • Abstract:143
  • PDF: 1423
  • HTML: 694
  • Cited by: 0
History
  • Received:February 21,2024
  • Revised:March 19,2024
  • Online: August 21,2024
Article QR Code
You are the first987772Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-3
Address:4# South Fourth Street, Zhongguancun,Haidian, Beijing,Postal Code:100190
Phone:010-62661041 Fax: Email:csa (a) iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063