###
计算机系统应用英文版:2023,32(12):74-83
本文二维码信息
码上扫一扫!
改进DPCNN分类模型在金融领域长文本的应用
(成都信息工程大学 计算机学院, 成都 610225)
Improved DPCNN Classification Model for Long Texts in Finance
(School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China)
摘要
图/表
参考文献
相似文献
本文已被:浏览 614次   下载 1498
Received:May 24, 2023    Revised:June 28, 2023
中文摘要: 为了解决金融领域文本分类算法稀缺, 以及现有算法无法充分提取文本中词与词的关系、长距离依赖关系和深层次特征信息的问题, 提出了一种改进卷积自注意力模型的文本深度关系抽取算法. 该算法在改进的深度金字塔卷积神经网络(DPCNN)中引入自注意力, 并联合双向门控神经网络(BiGRU)模块建立文本分类模型, 解决了针对金融领域长文本的长距离依赖特征信息和词与词之间关系特征信息的提取问题, 实现文本中深层次特征信息和上下文语义信息联合抽取功能. 在THUCNews短文本与长文本数据集上分别进行实验, 实验结果表明, 所提方法与BERT等方法相比, 在评价指标上有显著提高. 在自制金融长文本数据集上的对比实验表明, 与其他模型相比, 该算法模型的准确率和F1值更高. 通过一系列实验可以证明, 该算法模型能够更准确地完成针对金融长文本的分类任务.
Abstract:To solve the scarcity of text classification algorithms in finance and the inability of existing algorithms to adequately extract word-to-word relations, long-distance dependency, and deep feature information in texts, this study proposes a text depth relationship extraction algorithm based on improved convolutional self-attention model. The algorithm introduces self-attention in a modified deep pyramidal convolutional neural network (DPCNN) and builds a text classification model jointly with bi-directional gated neural network (BiGRU) module to solve the problem of extracting long-distance dependency feature information and word-to-word relationship feature information for long texts in finance. Then the joint extraction function of deep feature information and contextual semantic information in texts is realized. Experiments on THUCNews short text and long text datasets show that the proposed method has significant improvement in evaluation indexes compared with BERT and other methods. The comparison experiments on the dataset of homemade financial long texts show that the accuracy and F1 value of the algorithm model are higher compared with other models. A series of experiments demonstrate that the algorithmic model can perform the classification task against financial long texts more accurately.
文章编号:     中图分类号:    文献标志码:
基金项目:四川省科技厅重点研发项目(2022YFG0375,2023YFG0099,2023YFG0261,23ZDYF0473,23ZDYF0181);南充生物医药产业技术研究院项目(22YYJCYJ0086);四川省科技服务业示范项目(2021GFW130)
引用文本:
王婷,梁佳莹,杨川,何松泽,向东,马洪江.改进DPCNN分类模型在金融领域长文本的应用.计算机系统应用,2023,32(12):74-83
WANG Ting,LIANG Jia-Ying,YANG Chuan,HE Song-Ze,XIANG Dong,MA Hong-Jiang.Improved DPCNN Classification Model for Long Texts in Finance.COMPUTER SYSTEMS APPLICATIONS,2023,32(12):74-83