结构化数据到数值型分析文本生成的模型

doi:10.15888/j.cnki.csa.008480

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月24日 4:42 星期四

首页 > 过刊浏览>2022年第31卷第5期 >246-253. DOI:10.15888/j.cnki.csa.008480

PDF HTML阅读 XML下载导出引用引用提醒

结构化数据到数值型分析文本生成的模型
DOI:
                        10.15888/j.cnki.csa.008480
                    
CSTR:
                        
                    
作者:
                        杨子聪杨子聪
中国科学院 计算机网络信息中心, 北京 100190;中国科学院大学, 北京 100049
在期刊界中查找
在百度中查找
在本站中查找
焦文彬焦文彬
中国科学院 计算机网络信息中心, 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
刘晓东刘晓东
中国科学院 计算机网络信息中心, 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
汪洋汪洋
中国科学院 计算机网络信息中心, 北京 100190
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:中国科学院信息化专项 (XXH13510-03)

Generation Model from Structured Data to Numerical Analysis Text

Author:

YANG Zi-Cong
YANG Zi-Cong
Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China
在期刊界中查找
在百度中查找
在本站中查找
JIAO Wen-Bin
JIAO Wen-Bin
Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
LIU Xiao-Dong
LIU Xiao-Dong
Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Yang
WANG Yang
Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

基于结构化数据的文本生成是自然语言生成领域重要的研究方向, 其可以将传感器采集或计算机统计分析得到的结构化数据转化为适宜人阅读理解的自然语言文本, 因此也成为了实现报告自动生成的重要技术. 研究基于结构化数据到文本生成的模型为报告中的各类数值型数据生成分析性文本具有重要的实际应用价值. 本文针对数值型数据的特点, 提出了一种融合coarse-to-fine aligner选择机制和linked-based attention注意力机制的编码器-解码器文本生成模型, 考虑了生成数值型数据的分析性文本过程中内容过度分散、无法突出描述的问题, 另外也将数值型数据具体所属的域进行了关系建模, 以提高生成文本中语序的正确性. 实验结果表明, 本文提出的融合两种机制的模型, 比仅使用传统的基于内容的注意力机制和在前者基础上增加使用linked-based attention注意力机制的模型, 以及基于GPT2的模型在指标上都具有更好的表现, 证明了本文提出的模型在生成数值型数据的分析性文本任务中具有一定的效果.

关键词:结构化数据;数值型数据;文本生成;报告自动生成;深度学习

Abstract:

Text generation based on structured data is an important research direction in the field of natural language generation. It can transform structured data collected by sensors or statistically analyzed by computers into natural language texts suitable for human reading and understanding, thus becoming an important technology for automatic report generation. It is of great application value to study models of generating texts from structured data for the generation of analytical texts from various types of numerical data in reports. In this paper, we propose an encoder-decoder text generation model incorporating the coarse-to-fine aligner selection mechanism and the linked-based attention mechanism, which matches the characteristics of numerical data, and consider the problems of excessive content dispersion and failure to highlight descriptions in the process of generating analytical texts from numerical data. In addition, we also model the relationship between the domains to which the numerical data specifically belong in order to improve the correctness of the discourse order in generated texts. Experimental results show that the model proposed in this paper, which incorporates both mechanisms, has better performance in terms of metrics than the traditional model based on the content-based attention mechanism only, the model based on both the content-based attention mechanism and the linked-based attention mechanism, and the GPT2-based model. This proves the effectiveness of the proposed model in the task of generating analytical texts with numerical data.

Key words:structured data;numerical data;text generation;automatic report generation;deep learning

引用本文

杨子聪,焦文彬,刘晓东,汪洋.结构化数据到数值型分析文本生成的模型.计算机系统应用,2022,31(5):246-253

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-08-07
最后修改日期:2021-09-13
录用日期:
在线发布日期: 2022-04-11
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码