基于中文自然语言的SQL生成综述
作者:
基金项目:

广州软件学院科研项目(ky202113)


Survey on SQL Generation Based on Chinese Natural Language
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [51]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    自然语言转为SQL (NL2SQL)的研究有较高的应用价值, 随着深度学习技术的成熟, 越来越多的研究者开始将深度学习技术应用于NL2SQL任务中. 本文梳理了英文和中文领域NL2SQL的研究现状, 总结按年份发布的数据集和模型, 对比当前4大中文NL2SQL数据集的特点, 阐述了当前基于深度学习的NL2SQL任务的基本框架以及针对中文领域的单表简单问题和跨表复杂问题所适用的典型模型, 介绍了一般常用的模型评测方法, 并提出未来研究方向的展望.

    Abstract:

    The research on natural language to SQL (NL2SQL) has high application value. With the maturity of deep learning technology, increasingly more researchers have begun to apply deep learning technology to NL2SQL tasks. This study reviews the research status of NL2SQL in English and Chinese fields and summarizes the datasets and models published by year. Additionally, it compares the characteristics of the four major Chinese NL2SQL datasets and expounds on the basic framework of NL2SQL tasks based on deep learning and typical models for simple single-table problems and complex cross-table problems in Chinese NL2SQL fields. Finally, the commonly adopted model evaluation methods are introduced, and future research directions are put forward.

    参考文献
    [1] Woods WA. Progress in natural language understanding: An application to lunar geology. Proceedings of the 1973 National Computer Conference and Exposition. New York: ACM, 1973. 441–450.
    [2] Sacerdoti ED. Language access to distributed data with error recovery. Proceedings of the 5th International Joint Conference on Artificial Intelligence. Cambridge: Morgan Kaufmann Publishers Inc., 1977. 196–202.
    [3] Warren DHD, Pereira FCN. An efficient easily adaptable system for interpreting natural language queries. Computational Linguistics, 1982, 8(3–4): 110–122.
    [4] Popescu AM, Etzioni O, Kautz H. Towards a theory of natural language interfaces to databases. Proceedings of the 8th International Conference on Intelligent User Interfaces. Miami: ACM, 2003. 149–157.
    [5] Price PJ. Evaluation of spoken language systems: The ATIS domain. Proceedings of the 1990 Workshop on Speech and Natural Language. Stroudsburg: ACL, 1990. 91–95.
    [6] Tang LR, Mooney RJ. Using multiple clause constructors in inductive logic programming for semantic parsing. Proceedings of the 12th European Conference on Machine Learning. Freiburg: Springer, 2001. 466–477.
    [7] Iyer S, Konstas I, Cheung A, et al. Learning a neural semantic parser from user feedback. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver: ACL, 2017. 963–973.
    [8] Li F, Jagadish HV. Constructing an interactive natural language interface for relational databases. Proceedings of the VLDB Endowment, 2014, 8(1): 73–84.
    [9] Zhong V, Xiong CM, Socher R. Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv:1709.00103, 2017.
    [10] Xu XJ, Liu C, Song D. SQLNet: Generating structured queries from natural language without reinforcement learning. arXiv:1711.04436, 2017.
    [11] Yu T, Li ZF, Zhang ZL, et al. TypeSQL: Knowledge-based type-aware neural text-to-SQL generation. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans: ACL, 2018. 588–594.
    [12] Sun YB, Tang DY, Duan N, et al. Semantic parsing with syntax- and table-aware SQL generation. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne: ACL, 2018. 361–372.
    [13] Dong L, Lapata M. Coarse-to-fine decoding for neural semantic parsing. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne: ACL, 2018. 731–742.
    [14] Shi TZ, Tatwawadi K, Chakrabarti K, et al. IncSQL: Training incremental text-to-SQL parsers with non-deterministic oracles. arXiv:1809.05054, 2018.
    [15] Hwang W, Yim J, Park S, et al. A comprehensive exploration on WikiSQL with table-aware word contextualization. arXiv:1902.01069, 2019.
    [16] He PC, Mao Y, Chakrabarti K, et al. X-SQL: Reinforce schema representation with context. arXiv:1908.08113, 2019.
    [17] Lyu Q, Chakrabarti K, Hathi S, et al. Hybrid ranking network for text-to-SQL. arXiv:2008.04759, 2020.
    [18] Xu K, Wang YB, Wang YL, et al. SeaD: End-to-end text-to-SQL generation with schema-aware denoising. Proceedings of the 2022 Findings of the Association for Computational Linguistics. Seattle: ACL, 2022. 1845–1853.
    [19] Yu T, Zhang R, Yang K, et al. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels: ACL, 2018. 3911–3921.
    [20] Yu T, Yasunaga M, Yang K, et al. SyntaxSQLNet: Syntax tree networks for complex and cross-domain text-to-SQL task. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels: ACL, 2018. 1653–1663.
    [21] Lee D. Clause-wise and recursive decoding for complex and cross-domain text-to-SQL generation. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong: ACL, 2019. 6045–6051.
    [22] Guo JQ, Zhan ZC, Gao Y, et al. Towards complex text-to-SQL in cross-domain database with intermediate representation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019. 4524–4535.
    [23] Choi DH, Shin MC, Kim EG, et al. RYANSQL: Recursively applying sketch-based slot fillings for complex text-to-SQL in cross-domain databases. Computational Linguistics, 2021, 47(2): 309–332.
    [24] Rubin O, Berant J. SmBoP: Semi-autoregressive bottom-up semantic parsing. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, 2020. 311–324.
    [25] Lin XV, Socher R, Xiong CM. Bridging textual and tabular data for cross-domain text-to-SQL semantic parsing. Proceedings of the 2020 Findings of the Association for Computational Linguistics. ACL, 2020. 4870–4888.
    [26] Xu P, Kumar D, Yang W, et al. Optimizing deeper Transformers on small datasets. arXiv:2012.15355, 2021.
    [27] Bogin B, Gardner M, Berant J. Global reasoning over database structures for text-to-SQL parsing. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong: ACL, 2019. 3659–3664.
    [28] Wang BL, Shin R, Liu XD, et al. RAT-SQL: Relation-aware schema encoding and linking for text-to-SQL parsers. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 2019. 7567–7578.
    [29] Chen Z, Chen L, Zhao YB, et al. ShadowGNN: Graph projection neural network for text-to-SQL parser. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, 2021. 5567–5577.
    [30] Cao RS, Chen L, Chen Z, et al. LGESQL: Line graph enhanced text-to-SQL model with mixed local and non-local relations. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. ACL, 2021. 2541–2555.
    [31] Hui BY, Geng RY, Wang LH, et al. S2SQL: Injecting syntax to question-schema interaction graph encoder for text-to-SQL parsers. Proceedings of the 2022 Findings of the Association for Computational Linguistics. Dublin: ACL, 2022. 1254–1262.
    [32] Tie J, Fan ZQ, Sun C, et al. INSL: Text2SQL generation based on inverse normalized schema linking. Proceedings of the 4th International Conference on Artificial Intelligence in China. Springer, 2023. 195–202.
    [33] Yu T, Zhang R, Yasunaga M, et al. SParC: Cross-domain semantic parsing in context. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019. 4511–4523.
    [34] Yu T, Zhang R, Er HY, et al. CoSQL: A conversational text-to-SQL challenge towards cross-domain natural language interfaces to databases. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Hong Kong: ACL, 2019. 1962–1979.
    [35] Zhang R, Yu T, Er HY, et al. Editing-based SQL query generation for cross-domain context-dependent questions. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong: ACL, 2019. 5338–5349.
    [36] Kelkar A, Relan R, Bhardwaj V, et al. Bertrand-DR: Improving text-to-SQL using a discriminative re-ranker. arXiv:2002.00557, 2020.
    [37] Zheng YZ, Wang HB, Dong BH, et al. HIE-SQL: History information enhanced network for context-dependent text-to-SQL semantic parsing. Proceedings of the 2022 Findings of the Association for Computational Linguistics. Dublin: ACL, 2022. 2997–3007.
    [38] Dou LX, Gao Y, Pan MY, et al. UniSAr: A unified structure-aware autoregressive language model for text-to-SQL. arXiv:2203.07781, 2022.
    [39] Qi JX, Tang JY, He ZW, et al. RASAT: Integrating relational structures into pretrained Seq2Seq model for text-to-SQL. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Abu Dhabi: ACL, 2022. 3215–3229.
    [40] Xiao DL, Chai LZ, Zhang QW, et al. CQR-SQL: Conversational question reformulation enhanced context-dependent text-to-SQL parsers. Proceedings of the 2022 Findings of the Association for Computational Linguistics. Abu Dhabi: ACL, 2022. 2055–2068.
    [41] Pourreza M, Rafiei D. DIN-SQL: Decomposed in-context learning of text-to-SQL with self-correction. arXiv:2304.11015, 2023.
    [42] Sun RX, Arik SO, Nakhost H, et al. SQL-PaLM: Improved large language model adaptation for text-to-SQL. arXiv:2306.00739, 2023.
    [43] 李保利, 周锡令, 胡景凡. 数据库汉语查询接口WTCDIS系统的设计与实现. 中文信息学报, 1999, 13(6): 26–33, 60.
    [44] 孟小峰, 王珊. 数据库自然语言查询系统Nchiql中语义依存树向SQL的转换. 中文信息学报, 2001, 15(5): 40–45.
    [45] Shen R, Sun G, Shen H, et al. SPSQL: Step-by-step parsing based framework for text-to-SQL generation. arXiv:2305.11061, 2023.
    [46] Min QK, Shi YF, Zhang Y. A pilot study for Chinese SQL semantic parsing. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Hong Kong: ACL, 2019. 3652–3658.
    [47] Sun NY, Yang XF, Liu YF. TableQA: A large-scale Chinese text-to-SQL dataset for table-aware SQL generation. arXiv:2006.06434, 2020.
    [48] Zhang XY, Yin FJ, Ma GJ, et al. M-SQL: Multi-task representation learning for single-table Text2SQL generation. IEEE Access, 2020, 8: 43156–43167.
    [49] Zhang XY, Yin FJ, Ma GJ, et al. F-SQL: Fuse table schema and table content for single-table Text2SQL generation. IEEE Access, 2020, 8: 136409–136420.
    [50] Wang LJ, Zhang A, Wu K, et al. DuSQL: A large-scale and pragmatic Chinese text-to-SQL dataset. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. ACL, 2020. 6923–6935.
    [51] Guo JQ, Si ZL, Wang Y, et al. CHASE: A large-scale and pragmatic Chinese dataset for cross-database context-dependent text-to-SQL. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. ACL, 2021. 2316–2331.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

郑耀东,李旭峰,陈和平,贺桂娇.基于中文自然语言的SQL生成综述.计算机系统应用,2023,32(12):32-42

复制
分享
文章指标
  • 点击次数:1248
  • 下载次数: 5138
  • HTML阅读次数: 3856
  • 引用次数: 0
历史
  • 收稿日期:2023-06-12
  • 最后修改日期:2023-07-19
  • 在线发布日期: 2023-10-20
文章二维码
您是第11120389位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号