基于多特征融合的TextRank新闻自动摘要模型

doi:10.15888/j.cnki.csa.008913

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月14日 13:22 星期一

首页 > 过刊浏览>2023年第32卷第2期 >242-249. DOI:10.15888/j.cnki.csa.008913

PDF HTML阅读 XML下载导出引用引用提醒

基于多特征融合的TextRank新闻自动摘要模型
DOI:
                        10.15888/j.cnki.csa.008913
                    
CSTR:
                        
                    
作者:
                        徐飞徐飞
西安工业大学 计算机科学与工程学院, 西安 710021
在期刊界中查找
在百度中查找
在本站中查找
彭佳佳彭佳佳
西安工业大学 计算机科学与工程学院, 西安 710021
在期刊界中查找
在百度中查找
在本站中查找
刘军刘军
63768部队, 西安 710021
在期刊界中查找
在百度中查找
在本站中查找
杨博杨博
西安工业大学 计算机科学与工程学院, 西安 710021
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:新型网络与检测控制国家地方联合工程实验室基金(GSYSJ2018006); 陕西省教育厅专项科研计划(18JK0399)

Automatic News Summarization Model Based on Multi-feature TextRank

Author:

XU Fei
XU Fei
School of Computer Science and Engineering, Xi’an Technological University, Xi’an 710021, China
在期刊界中查找
在百度中查找
在本站中查找
PENG Jia-Jia
PENG Jia-Jia
School of Computer Science and Engineering, Xi’an Technological University, Xi’an 710021, China
在期刊界中查找
在百度中查找
在本站中查找
LIU Jun
LIU Jun
Unit 63768, Xi’an 710021, China
在期刊界中查找
在百度中查找
在本站中查找
YANG Bo
YANG Bo
School of Computer Science and Engineering, Xi’an Technological University, Xi’an 710021, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [23]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

随着互联网的发展, 如何快速地从海量新闻中获取核心信息, 减少浏览负担, 是信息部门目前急需解决的问题. 现有的TextRank及其改进算法在新闻摘要抽取任务中, 考虑文本特征不全面. 在摘要句选择时, 只考虑到摘要的冗余度, 忽略了摘要的多样性及可读性. 针对上述问题, 本文提出了融合多特征的文本自动摘要方法MF-TextRank(multi-feature TextRank). 根据新闻的结构、句子和单词总结了更全面的文本特征信息用于改进TextRank算法的权重转移矩阵, 使句子权重计算更准确. 采用MMR算法更新句子权重, 通过集束搜索得到候选摘要集, 在MMR得分的基础上选择内聚性最高的候选摘要集作为最终的摘要输出. 实验结果表明, MF-TextRank算法在摘要抽取任务中摘要Rouge得分优于现有改进的TexRank算法, 有效提高了摘要抽取的准确性.

关键词:TextRank;MMR;Word2Vec;新闻摘要;多特征融合;自动摘要

Abstract:

With the development of the Internet, how to quickly obtain core information from massive news and make browsing easy has become an urgent problem for information departments. The existing TextRank and its improved algorithm fail to consider text features comprehensively in extracting news summaries. In selecting summaries, they only focus on the redundancy and ignore the diversity and readability of the summaries. In order to solve the above problems, this study proposes a multi-feature automatic text summarization method, namely, MF-TextRank. A more comprehensive text feature information is summarized according to the structure, sentences, and words of news, which is used to improve the weight transfer matrix of the TextRank algorithm and make the sentence weight calculation more accurate. Furthermore, an MMR algorithm is used to update sentence weight, and the candidate summary set is obtained by beam search. According to the MMR score, the candidate summary set with the highest cohesion is selected as the final summary for output. The experimental results show that the MF-TextRank algorithm outperforms the existing improved TextRank algorithm in extracting summaries and effectively improves the accuracy in this regard.

Key words:TextRank;MMR algorithm;Word2Vec;news summary;multi-feature fusion;automatic summary

参考文献

[1] Mihalcea R, Tarau P. TextRank: Bringing order into text. Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Barcelona: ACL, 2004. 404–411.

[2] 李峰, 黄金柱, 李舟军, 等. 使用关键词扩展的新闻文本自动摘要方法. 计算机科学与探索, 2016, 10(3): 372–380. [doi: 10.3778/j.issn.1673-9418.1509085

[3] 余珊珊, 苏锦钿, 李鹏飞. 基于改进的TextRank的自动摘要提取方法. 计算机科学, 2016, 43(6): 240–247. [doi: 10.11896/j.issn.1002-137X.2016.06.048

[4] 曹洋. 基于TextRank算法的单文档自动文摘研究[硕士学位论文]. 南京: 南京大学, 2016.

[5] 刘志明, 于波, 欧阳纯萍, 等. 基于主题的SE-TextRank情感摘要方法. 情报工程, 2017, 3(3): 97–104

[6] 李楠, 陶宏才. 一种新的融合BM25与文本特征的新闻摘要算法. 成都信息工程大学学报, 2018, 33(2): 113–118

[7] 徐馨韬, 柴小丽, 谢彬, 等. 基于改进TextRank算法的中文文本摘要提取. 计算机工程, 2019, 45(3): 273–277. [doi: 10.19678/j.issn.1000-3428.0051615

[8] 李娜娜, 刘培玉, 刘文锋, 等. 基于TextRank的自动摘要优化算法. 计算机应用研究, 2019, 36(4): 1045–1050. [doi: 10.19734/j.issn.1001-3695.2017.11.0786

[9] 罗飞雄. 基于TextRank的自动文摘算法的研究与应用[硕士学位论文]. 西安: 西安电子科技大学, 2020.

[10] 罗芳, 汪竞航, 何道森, 等. 融合主题特征的文本自动摘要方法研究. 计算机应用研究, 2021, 38(1): 129–133

[11] 汪旭祥, 韩斌, 高瑞, 等. 基于改进TextRank的文本摘要自动提取. 计算机应用与软件, 2021, 38(6): 155–160. [doi: 10.3969/j.issn.1000-386x.2021.06.025

[12] 何春辉, 李云翔, 王孟然, 等. 改进的TextRank双层单文档摘要提取算法. 湖南城市学院学报(自然科学版), 2017, 26(6): 55–60

[13] 朱玉佳, 祝永志, 董兆安. 基于TextRank算法的联合打分文本摘要生成. 通信技术, 2021, 54(2): 323–326

[14] 程琨, 李传艺, 贾欣欣, 等. 基于改进的MMR算法的新闻文本抽取式摘要方法. 应用科学学报, 2021, 39(3): 443–455. [doi: 10.3969/j.issn.0255-8297.2021.03.010

[15] 余传明, 郭亚静, 朱星宇, 等. 基于最大边界相关度的抽取式文本摘要模型研究. 情报科学, 2021, 39(2): 34–43

[16] Mitra M, Singhal A, Buckley C. Automatic text summarization by paragraph extraction. Proceedings of the ACL/EACL-97 Workshop on Intelligent Scalable Text Summarization. Madrid: ACL, 1997. 31–36.

[17] Qazvinian V, Hassanabadi LS, Halavati R. Summarising text with a genetic algorithm-based sentence extraction. International Journal of Knowledge Management Studies, 2008, 2(4): 426–444. [doi: 10.1504/IJKMS.2008.019750

[18] Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space. Proceedings of the 1st International Conference on Learning Representations. Scottsdale: ICLR, 2013. 1–12.

[19] Baxendale PB. Machine-made index for technical literature—An experiment. IBM Journal of Research and Development, 1958, 2(4): 354–361. [doi: 10.1147/rd.24.0354

[20] Wu HC, Luk RWP, Wong KF, et al. Interpreting TF-IDF term weights as making relevance decisions. ACM Transactions on Information Systems, 2008, 26(3): 13

[21] Chatterjee N, Mittal A, Goyal S. Single document extractive text summarization using genetic algorithms. 2012 3rd International Conference on Emerging Applications of Information Technology. Kolkata: IEEE, 2012. 19–23.

[22] Lin CY. ROUGE: A package for automatic evaluation of summaries. Proceedings of the Workshop on Text Summarization Branches Out. Barcelona: Association for Computational Linguistics, 2004. 74–81.

[23] 李金鹏, 张闯, 陈小军, 等. 自动文本摘要研究综述. 计算机研究与发展, 2021, 58(1): 1–21. [doi: 10.7544/issn1000-1239.2021.20190785

引用本文

徐飞,彭佳佳,刘军,杨博.基于多特征融合的TextRank新闻自动摘要模型.计算机系统应用,2023,32(2):242-249

复制

文章指标

点击次数:739
下载次数: 1652
HTML阅读次数: 1663
引用次数: 0

历史

收稿日期:2022-06-14
最后修改日期:2022-07-12
录用日期:
在线发布日期: 2022-09-14
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码