本文已被:浏览 249次 下载 637次
Received:December 12, 2023 Revised:January 10, 2024
Received:December 12, 2023 Revised:January 10, 2024
中文摘要: 抽象神经网络在文本摘要领域取得了长足进步, 展示了令人瞩目的成就. 然而, 由于抽象摘要的灵活性, 它很容易造成生成的摘要忠实性差的问题, 甚至偏离源文档的语义主旨. 针对这一问题, 本文提出了两种方法来提高摘要的保真度. (1)由于实体在摘要中起着重要作用, 而且通常来自于原始文档, 因此本文提出允许模型从源文档中复制实体, 确保生成的实体与源文档中的实体相匹配, 这有助于防止生成不一致的实体. (2)为了更好地防止生成的摘要与原文产生语义偏离, 本文在摘要生成过程中使用关键实体和关键token作为两种不同粒度的指导信息以指导摘要的生成. 本文使用 ROUGE指标在两个广泛使用的文本摘要数据集CNNDM和XSum上评估了本文方法的性能, 实验结果表明, 这两种方法在提高模型性能方面都取得了显著的效果. 此外, 实验还证明了实体复制机制可以在一定程度上借助指导信息以纠正引入的语义噪声.
Abstract:Abstract neural networks have made significant progress and demonstrated remarkable achievements in the field of text summarization. However, abstract summarization is highly likely to generate summaries of poor fidelity and even deviate from the semantic essence of the source documents due to its flexibility. To address this issue, this study proposes two methods to improve the fidelity of summaries. For Method 1, since entities play an important role in summaries and are usually derived from the original documents, the paper suggests allowing the model to copy entities from the source document to ensure that the generated entities match those in the source document and thereby prevent the generation of inconsistent entities. For Method 2, to better prevent the generated summary from deviating from the original text semantically, the study uses key entities and key tokens as two types of guiding information at different levels of granularity in the summary generation process. The performance of the proposed methods is evaluated using the ROUGE metric on two widely used text summarization datasets, namely, CNNDM and XSum. The experimental results demonstrate that both methods have significantly improved the performance of the model. Furthermore, the experiments also prove that the entity copy mechanism can, to some extent, use guiding information to correct introduced semantic noise.
keywords: abstract summarization entity copy dual granularity guidance deep learning pre-train model
文章编号: 中图分类号: 文献标志码:
基金项目:山东省自然科学基金(ZR2021MD115); 上海市科委项目(21511100302)
引用文本:
周子力,高士亮,安润鲁,包新月.基于实体复制和双粒度指导的抽象摘要.计算机系统应用,2024,33(5):210-217
ZHOU Zi-Li,GAO Shi-Liang,AN Run-Lu,BAO Xin-Yue.Abstractive Summarization Based on Entity Copy and Dual Granularity Guidance.COMPUTER SYSTEMS APPLICATIONS,2024,33(5):210-217
周子力,高士亮,安润鲁,包新月.基于实体复制和双粒度指导的抽象摘要.计算机系统应用,2024,33(5):210-217
ZHOU Zi-Li,GAO Shi-Liang,AN Run-Lu,BAO Xin-Yue.Abstractive Summarization Based on Entity Copy and Dual Granularity Guidance.COMPUTER SYSTEMS APPLICATIONS,2024,33(5):210-217