结构化数据到数值型分析文本生成的模型
作者:
基金项目:

中国科学院信息化专项 (XXH13510-03)


Generation Model from Structured Data to Numerical Analysis Text
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [21]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    基于结构化数据的文本生成是自然语言生成领域重要的研究方向, 其可以将传感器采集或计算机统计分析得到的结构化数据转化为适宜人阅读理解的自然语言文本, 因此也成为了实现报告自动生成的重要技术. 研究基于结构化数据到文本生成的模型为报告中的各类数值型数据生成分析性文本具有重要的实际应用价值. 本文针对数值型数据的特点, 提出了一种融合coarse-to-fine aligner选择机制和linked-based attention注意力机制的编码器-解码器文本生成模型, 考虑了生成数值型数据的分析性文本过程中内容过度分散、无法突出描述的问题, 另外也将数值型数据具体所属的域进行了关系建模, 以提高生成文本中语序的正确性. 实验结果表明, 本文提出的融合两种机制的模型, 比仅使用传统的基于内容的注意力机制和在前者基础上增加使用linked-based attention注意力机制的模型, 以及基于GPT2的模型在指标上都具有更好的表现, 证明了本文提出的模型在生成数值型数据的分析性文本任务中具有一定的效果.

    Abstract:

    Text generation based on structured data is an important research direction in the field of natural language generation. It can transform structured data collected by sensors or statistically analyzed by computers into natural language texts suitable for human reading and understanding, thus becoming an important technology for automatic report generation. It is of great application value to study models of generating texts from structured data for the generation of analytical texts from various types of numerical data in reports. In this paper, we propose an encoder-decoder text generation model incorporating the coarse-to-fine aligner selection mechanism and the linked-based attention mechanism, which matches the characteristics of numerical data, and consider the problems of excessive content dispersion and failure to highlight descriptions in the process of generating analytical texts from numerical data. In addition, we also model the relationship between the domains to which the numerical data specifically belong in order to improve the correctness of the discourse order in generated texts. Experimental results show that the model proposed in this paper, which incorporates both mechanisms, has better performance in terms of metrics than the traditional model based on the content-based attention mechanism only, the model based on both the content-based attention mechanism and the linked-based attention mechanism, and the GPT2-based model. This proves the effectiveness of the proposed model in the task of generating analytical texts with numerical data.

    参考文献
    [1] 曹娟, 龚隽鹏, 张鹏洲. 数据到文本生成研究综述. 计算机技术与发展, 2019, 29(1): 80–84, 89. [doi: 10.3969/j.issn.1673-629X.2019.01.017
    [2] Gong JP, Ren W, Zhang PZ. An automatic generation method of sports news based on knowledge rules. Proceedings of the 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS). Wuhan: IEEE, 2017. 499–502.
    [3] Cho K, Van Merri?nboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha: ACL, 2014. 1724–1734.
    [4] Tran VK, Nguyen LM, Tojo S. Neural-based natural language generation in dialogue using RNN encoder-decoder with semantic aggregation. Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue. Saarbrücken: ACL, 2017. 231–240.
    [5] Ding XF, Jiang WJ, He JW. Generating expert’s review from the crowds’: Integrating a multi-attention mechanism with encoder-decoder framework. Proceedings of the 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). Guangzhou: IEEE, 2018. 954–961.
    [6] Niranjan A, Shaik MAB. Improving grapheme-to-phoneme conversion by investigating copying mechanism in recurrent architectures. Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). Singapore: IEEE, 2019. 442–448.
    [7] Mei HY, Bansal M, Walter MR. What to talk about and how? Selective generation using LSTMs with coarse-to-fine alignment. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego: ACL, 2016. 720–730.
    [8] Sha L, Mou LL, Liu TY, et al. Order-planning neural text generation from structured data. Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans: AIAA, 2018. 5414–5421.
    [9] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv: 1409.0473, 2014.
    [10] Luong T, Pham H, Manning CD. Effective approaches to attention-based neural machine translation. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon: ACL, 2015. 1412–1421.
    [11] Bao JW, Tang DY, Duan N, et al. Table-to-text: Describing table region with natural language. Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans: AIAA, 2018. 5020–5027.
    [12] Chang EN, Shen XY, Zhu DW, et al. Neural data-to-text generation with LM-based text augmentation. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Online: ACL, 2021. 758–768.
    [13] Celikyilmaz A, Clark E, Gao JF. Evaluation of text generation: A survey. arXiv: 2006.14799, 2020.
    [14] Denkowski M, Lavie A. Meteor universal: Language specific translation evaluation for any target language. Proceedings of the Ninth Workshop on Statistical Machine Translation. Baltimore: ACL, 2014. 376–380.
    [15] Papineni K, Roukos S, Ward T, et al. BLEU: A method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Philadelphia: ACM, 2002. 311–318.
    [16] Lin CY. ROUGE: A package for automatic evaluation of summaries. Proceedings of ACL Workshop on Text Summarization Branches Out. Barcelona: ACL, 2004. 74–81.
    [17] Lee J. Transforming multi-conditioned generation from meaning representation. Proceedings of the International Conference on Recent Advances in Natural Language Processing. Online: INCOMA Ltd., 2021. 805–813.
    [18] 李锦乾, 张冬茉, 姚天方. 自然语言生成中的句子结构优化处理. 计算机应用研究, 1998, (1): 54–58
    [19] 康波, 孟祥飞, 夏梓峻. 应用驱动的大数据与人工智能融合平台建设. 数据与计算发展前沿, 2019, 1(1): 35–45. [doi: 10.11871/jfdc.issn.2096-742X.2019.01.005
    [20] 廖方宇, 洪学海, 汪洋, 等. 数据与计算平台是驱动当代科学研究发展的重要基础设施. 数据与计算发展前沿, 2019, 1(1): 2–10. [doi: 10.11871/jfdc.issn.2096-742X.2019.01.002
    [21] 孙哲南, 张兆翔, 王威, 等. 2019年人工智能新态势与新进展. 数据与计算发展前沿, 2019, 1(2): 1–16
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

杨子聪,焦文彬,刘晓东,汪洋.结构化数据到数值型分析文本生成的模型.计算机系统应用,2022,31(5):246-253

复制
分享
文章指标
  • 点击次数:627
  • 下载次数: 1830
  • HTML阅读次数: 1506
  • 引用次数: 0
历史
  • 收稿日期:2021-08-07
  • 最后修改日期:2021-09-13
  • 在线发布日期: 2022-04-11
文章二维码
您是第11208046位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号