基于中文自然语言的SQL生成综述

doi:10.15888/j.cnki.csa.009356

AIPUB归智期刊联盟

微信公众号

网站二维码

2025年4月1日 7:26 星期二

首页 > 过刊浏览>2023年第32卷第12期 >32-42. DOI:10.15888/j.cnki.csa.009356

PDF HTML阅读 XML下载导出引用引用提醒

基于中文自然语言的SQL生成综述
DOI:
                        10.15888/j.cnki.csa.009356
                    
CSTR:
                        
                    
作者:
                        郑耀东郑耀东
广州软件学院 计算机系, 广州 510990
在期刊界中查找
在百度中查找
在本站中查找
李旭峰李旭峰
广州软件学院 计算机系, 广州 510990
在期刊界中查找
在百度中查找
在本站中查找
陈和平陈和平
广州软件学院 计算机系, 广州 510990
在期刊界中查找
在百度中查找
在本站中查找
贺桂娇贺桂娇
广州软件学院 计算机系, 广州 510990
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:广州软件学院科研项目（ky202113）

Survey on SQL Generation Based on Chinese Natural Language

Author:

ZHENG Yao-Dong
ZHENG Yao-Dong
Department of Computer Science, Software Engineering Institute of Guangzhou, Guangzhou 510990, China
在期刊界中查找
在百度中查找
在本站中查找
LI Xu-Feng
LI Xu-Feng
Department of Computer Science, Software Engineering Institute of Guangzhou, Guangzhou 510990, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN He-Ping
CHEN He-Ping
Department of Computer Science, Software Engineering Institute of Guangzhou, Guangzhou 510990, China
在期刊界中查找
在百度中查找
在本站中查找
HE Gui-Jiao
HE Gui-Jiao
Department of Computer Science, Software Engineering Institute of Guangzhou, Guangzhou 510990, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [51]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

自然语言转为SQL (NL2SQL)的研究有较高的应用价值, 随着深度学习技术的成熟, 越来越多的研究者开始将深度学习技术应用于NL2SQL任务中. 本文梳理了英文和中文领域NL2SQL的研究现状, 总结按年份发布的数据集和模型, 对比当前4大中文NL2SQL数据集的特点, 阐述了当前基于深度学习的NL2SQL任务的基本框架以及针对中文领域的单表简单问题和跨表复杂问题所适用的典型模型, 介绍了一般常用的模型评测方法, 并提出未来研究方向的展望.

关键词:NL2SQL;深度学习;中文数据集;自然语言处理

Abstract:

The research on natural language to SQL (NL2SQL) has high application value. With the maturity of deep learning technology, increasingly more researchers have begun to apply deep learning technology to NL2SQL tasks. This study reviews the research status of NL2SQL in English and Chinese fields and summarizes the datasets and models published by year. Additionally, it compares the characteristics of the four major Chinese NL2SQL datasets and expounds on the basic framework of NL2SQL tasks based on deep learning and typical models for simple single-table problems and complex cross-table problems in Chinese NL2SQL fields. Finally, the commonly adopted model evaluation methods are introduced, and future research directions are put forward.

Key words:natural language to SQL (NL2SQL);deep leaning;Chinese dataset;natural language processing (NLP)

参考文献

[1] Woods WA. Progress in natural language understanding: An application to lunar geology. Proceedings of the 1973 National Computer Conference and Exposition. New York: ACM, 1973. 441–450.

[2] Sacerdoti ED. Language access to distributed data with error recovery. Proceedings of the 5th International Joint Conference on Artificial Intelligence. Cambridge: Morgan Kaufmann Publishers Inc., 1977. 196–202.

[3] Warren DHD, Pereira FCN. An efficient easily adaptable system for interpreting natural language queries. Computational Linguistics, 1982, 8(3–4): 110–122.

[4] Popescu AM, Etzioni O, Kautz H. Towards a theory of natural language interfaces to databases. Proceedings of the 8th International Conference on Intelligent User Interfaces. Miami: ACM, 2003. 149–157.

[5] Price PJ. Evaluation of spoken language systems: The ATIS domain. Proceedings of the 1990 Workshop on Speech and Natural Language. Stroudsburg: ACL, 1990. 91–95.

[6] Tang LR, Mooney RJ. Using multiple clause constructors in inductive logic programming for semantic parsing. Proceedings of the 12th European Conference on Machine Learning. Freiburg: Springer, 2001. 466–477.

[7] Iyer S, Konstas I, Cheung A, et al. Learning a neural semantic parser from user feedback. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver: ACL, 2017. 963–973.

[8] Li F, Jagadish HV. Constructing an interactive natural language interface for relational databases. Proceedings of the VLDB Endowment, 2014, 8(1): 73–84.

[9] Zhong V, Xiong CM, Socher R. Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv:1709.00103, 2017.

[10] Xu XJ, Liu C, Song D. SQLNet: Generating structured queries from natural language without reinforcement learning. arXiv:1711.04436, 2017.

[11] Yu T, Li ZF, Zhang ZL, et al. TypeSQL: Knowledge-based type-aware neural text-to-SQL generation. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. New Orleans: ACL, 2018. 588–594.

[12] Sun YB, Tang DY, Duan N, et al. Semantic parsing with syntax- and table-aware SQL generation. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne: ACL, 2018. 361–372.

[13] Dong L, Lapata M. Coarse-to-fine decoding for neural semantic parsing. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne: ACL, 2018. 731–742.

[14] Shi TZ, Tatwawadi K, Chakrabarti K, et al. IncSQL: Training incremental text-to-SQL parsers with non-deterministic oracles. arXiv:1809.05054, 2018.

[15] Hwang W, Yim J, Park S, et al. A comprehensive exploration on WikiSQL with table-aware word contextualization. arXiv:1902.01069, 2019.

[16] He PC, Mao Y, Chakrabarti K, et al. X-SQL: Reinforce schema representation with context. arXiv:1908.08113, 2019.

[17] Lyu Q, Chakrabarti K, Hathi S, et al. Hybrid ranking network for text-to-SQL. arXiv:2008.04759, 2020.

[18] Xu K, Wang YB, Wang YL, et al. SeaD: End-to-end text-to-SQL generation with schema-aware denoising. Proceedings of the 2022 Findings of the Association for Computational Linguistics. Seattle: ACL, 2022. 1845–1853.

[19] Yu T, Zhang R, Yang K, et al. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-SQL task. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels: ACL, 2018. 3911–3921.

[20] Yu T, Yasunaga M, Yang K, et al. SyntaxSQLNet: Syntax tree networks for complex and cross-domain text-to-SQL task. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels: ACL, 2018. 1653–1663.

[21] Lee D. Clause-wise and recursive decoding for complex and cross-domain text-to-SQL generation. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong: ACL, 2019. 6045–6051.

[22] Guo JQ, Zhan ZC, Gao Y, et al. Towards complex text-to-SQL in cross-domain database with intermediate representation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019. 4524–4535.

[23] Choi DH, Shin MC, Kim EG, et al. RYANSQL: Recursively applying sketch-based slot fillings for complex text-to-SQL in cross-domain databases. Computational Linguistics, 2021, 47(2): 309–332.

[24] Rubin O, Berant J. SmBoP: Semi-autoregressive bottom-up semantic parsing. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, 2020. 311–324.

[25] Lin XV, Socher R, Xiong CM. Bridging textual and tabular data for cross-domain text-to-SQL semantic parsing. Proceedings of the 2020 Findings of the Association for Computational Linguistics. ACL, 2020. 4870–4888.

[26] Xu P, Kumar D, Yang W, et al. Optimizing deeper Transformers on small datasets. arXiv:2012.15355, 2021.

[27] Bogin B, Gardner M, Berant J. Global reasoning over database structures for text-to-SQL parsing. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong: ACL, 2019. 3659–3664.

[28] Wang BL, Shin R, Liu XD, et al. RAT-SQL: Relation-aware schema encoding and linking for text-to-SQL parsers. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 2019. 7567–7578.

[29] Chen Z, Chen L, Zhao YB, et al. ShadowGNN: Graph projection neural network for text-to-SQL parser. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, 2021. 5567–5577.

[30] Cao RS, Chen L, Chen Z, et al. LGESQL: Line graph enhanced text-to-SQL model with mixed local and non-local relations. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. ACL, 2021. 2541–2555.

[31] Hui BY, Geng RY, Wang LH, et al. S²SQL: Injecting syntax to question-schema interaction graph encoder for text-to-SQL parsers. Proceedings of the 2022 Findings of the Association for Computational Linguistics. Dublin: ACL, 2022. 1254–1262.

[32] Tie J, Fan ZQ, Sun C, et al. INSL: Text2SQL generation based on inverse normalized schema linking. Proceedings of the 4th International Conference on Artificial Intelligence in China. Springer, 2023. 195–202.

[33] Yu T, Zhang R, Yasunaga M, et al. SParC: Cross-domain semantic parsing in context. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019. 4511–4523.

[34] Yu T, Zhang R, Er HY, et al. CoSQL: A conversational text-to-SQL challenge towards cross-domain natural language interfaces to databases. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Hong Kong: ACL, 2019. 1962–1979.

[35] Zhang R, Yu T, Er HY, et al. Editing-based SQL query generation for cross-domain context-dependent questions. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong: ACL, 2019. 5338–5349.

[36] Kelkar A, Relan R, Bhardwaj V, et al. Bertrand-DR: Improving text-to-SQL using a discriminative re-ranker. arXiv:2002.00557, 2020.

[37] Zheng YZ, Wang HB, Dong BH, et al. HIE-SQL: History information enhanced network for context-dependent text-to-SQL semantic parsing. Proceedings of the 2022 Findings of the Association for Computational Linguistics. Dublin: ACL, 2022. 2997–3007.

[38] Dou LX, Gao Y, Pan MY, et al. UniSAr: A unified structure-aware autoregressive language model for text-to-SQL. arXiv:2203.07781, 2022.

[39] Qi JX, Tang JY, He ZW, et al. RASAT: Integrating relational structures into pretrained Seq2Seq model for text-to-SQL. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Abu Dhabi: ACL, 2022. 3215–3229.

[40] Xiao DL, Chai LZ, Zhang QW, et al. CQR-SQL: Conversational question reformulation enhanced context-dependent text-to-SQL parsers. Proceedings of the 2022 Findings of the Association for Computational Linguistics. Abu Dhabi: ACL, 2022. 2055–2068.

[41] Pourreza M, Rafiei D. DIN-SQL: Decomposed in-context learning of text-to-SQL with self-correction. arXiv:2304.11015, 2023.

[42] Sun RX, Arik SO, Nakhost H, et al. SQL-PaLM: Improved large language model adaptation for text-to-SQL. arXiv:2306.00739, 2023.

[43] 李保利, 周锡令, 胡景凡. 数据库汉语查询接口WTCDIS系统的设计与实现. 中文信息学报, 1999, 13(6): 26–33, 60.

[44] 孟小峰, 王珊. 数据库自然语言查询系统Nchiql中语义依存树向SQL的转换. 中文信息学报, 2001, 15(5): 40–45.

[45] Shen R, Sun G, Shen H, et al. SPSQL: Step-by-step parsing based framework for text-to-SQL generation. arXiv:2305.11061, 2023.

[46] Min QK, Shi YF, Zhang Y. A pilot study for Chinese SQL semantic parsing. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Hong Kong: ACL, 2019. 3652–3658.

[47] Sun NY, Yang XF, Liu YF. TableQA: A large-scale Chinese text-to-SQL dataset for table-aware SQL generation. arXiv:2006.06434, 2020.

[48] Zhang XY, Yin FJ, Ma GJ, et al. M-SQL: Multi-task representation learning for single-table Text2SQL generation. IEEE Access, 2020, 8: 43156–43167.

[49] Zhang XY, Yin FJ, Ma GJ, et al. F-SQL: Fuse table schema and table content for single-table Text2SQL generation. IEEE Access, 2020, 8: 136409–136420.

[50] Wang LJ, Zhang A, Wu K, et al. DuSQL: A large-scale and pragmatic Chinese text-to-SQL dataset. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. ACL, 2020. 6923–6935.

[51] Guo JQ, Si ZL, Wang Y, et al. CHASE: A large-scale and pragmatic Chinese dataset for cross-database context-dependent text-to-SQL. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. ACL, 2021. 2316–2331.

引用本文

郑耀东,李旭峰,陈和平,贺桂娇.基于中文自然语言的SQL生成综述.计算机系统应用,2023,32(12):32-42

复制

文章指标

点击次数:1248
下载次数: 5138
HTML阅读次数: 3856
引用次数: 0

历史

收稿日期:2023-06-12
最后修改日期:2023-07-19
录用日期:
在线发布日期: 2023-10-20
出版日期:

微信公众号

网站二维码

引用本文

分享

文章指标

历史

文章二维码

微信公众号

网站二维码

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码