Published on Wed Jun 02 2021

Towards Robustness of Text-to-SQL Models against Synonym Substitution

Yujian Gan, Xinyun Chen, Qiuping Huang, Matthew Purver, John R. Woodward, Jinxia Xie, Pengsheng Huang

Existing text-to-SQL models typically rely on the lexical matching between words in natural language (NL) questions and table schemas. This may render the models vulnerable to attacks that break the Schema linking mechanism. We introduce Spider-Syn, a human-curated dataset.

0
0
0
Abstract

Recently, there has been significant progress in studying neural networks to translate text descriptions into SQL queries. Despite achieving good performance on some public benchmarks, existing text-to-SQL models typically rely on the lexical matching between words in natural language (NL) questions and tokens in table schemas, which may render the models vulnerable to attacks that break the schema linking mechanism. In this work, we investigate the robustness of text-to-SQL models to synonym substitution. In particular, we introduce Spider-Syn, a human-curated dataset based on the Spider benchmark for text-to-SQL translation. NL questions in Spider-Syn are modified from Spider, by replacing their schema-related words with manually selected synonyms that reflect real-world question paraphrases. We observe that the accuracy dramatically drops by eliminating such explicit correspondence between NL questions and table schemas, even if the synonyms are not adversarially selected to conduct worst-case adversarial attacks. Finally, we present two categories of approaches to improve the model robustness. The first category of approaches utilizes additional synonym annotations for table schemas by modifying the model input, while the second category is based on adversarial training. We demonstrate that both categories of approaches significantly outperform their counterparts without the defense, and the first category of approaches are more effective.

Mon Oct 19 2020
Artificial Intelligence
ColloQL: Robust Cross-Domain Text-to-SQL Over Search Queries
Translating natural language utterances to executable queries is a helpful technique in making the vast amount of data stored in relational databases accessible to a wider range of non-tech-savvy end users.
0
0
0
Wed Oct 21 2020
NLP
DuoRAT: Towards Simpler Text-to-SQL Models
Recent research has shown that neural text-to-SQL models can effectively translate natural language questions into corresponding queries on unseen databases.
0
0
0
Thu Jul 30 2020
Artificial Intelligence
Photon: A Robust Cross-Domain Text-to-SQL System
Photon is a robust, modular, cross-domain NLIDB that can flag natural language input to which a SQL mapping cannot be immediately determined. Photon consists of a strong neural semantic Parser, a human-in-the-loop question corrector, and a response generator.
0
0
0
Mon Apr 23 2018
NLP
Semantic Parsing with Syntax- and Table-Aware SQL Generation
Existing neural network based approaches typically generate a query word-by-word. A large portion of the generated results are not executable due to the mismatch between question words and table contents. Our approach addresses this problem by considering the structure of table and the syntax of SQL language.
0
0
0
Wed Feb 03 2021
NLP
An Investigation Between Schema Linking and Text-to-SQL Performance
Text-to-SQL is a crucial task toward developing methods for understanding natural language by computers. Recent neural approaches deliver excellent performance, but models that are difficult to interpret inhibit future developments. We hypothesize that the internal behavior of models at hand becomes much easier to analyze if we
0
0
0
Thu Oct 11 2018
Artificial Intelligence
SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-DomainText-to-SQL Task
Most existing studies in text-to-SQL tasks do not require generating complex queries with multiple clauses or sub-queries. In this paper we propose SyntaxSQLNet, a syntax tree network to address the complex and cross-domain text- to-SQL generation task.
0
0
0
Thu Aug 29 2019
NLP
Global Reasoning over Database Structures for Text-to-SQL Parsing
State-of-the-art semantic parsers rely on auto-regressive decoding, emitting one symbol at a time. When tested against complex databases that are unobserved at training time, the parser often struggles to select the correct set of database constants. In this work, we propose a
1
0
1
Mon Jun 19 2017
Machine Learning
Towards Deep Learning Models Resistant to Adversarial Attacks
Recent work has demonstrated that deep neural networks are vulnerable to adversarial examples. The existence of adversarial attacks may be an inherentweakness of deep learning models. To address this problem, we study the robustness of neural networks through the lens of robust optimizing.
3
0
1
Wed Aug 28 2019
NLP
SpatialNLI: A Spatial Domain Natural Language Interface to Databases Using Spatial Comprehension
A natural language interface (NLI) to databases is an interface that translates a natural language question to a structured query. However, an NLI that is trained in the general domain is hard to apply in the spatial domain due to the idiosyncrasy and expressiveness of the spatial questions.
0
0
0
Sat Apr 21 2018
NLP
Generating Natural Language Adversarial Examples
Deep neural networks (DNNs) are vulnerable to adversarial examples. Small perturbations to correctly classified examples can cause the model to misclassify. We use a black-box population-based optimization algorithm to generate semantically and syntactically similar examples.
0
0
0
Thu Oct 11 2018
Artificial Intelligence
SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-DomainText-to-SQL Task
Most existing studies in text-to-SQL tasks do not require generating complex queries with multiple clauses or sub-queries. In this paper we propose SyntaxSQLNet, a syntax tree network to address the complex and cross-domain text- to-SQL generation task.
0
0
0
Wed May 15 2019
NLP
Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing
The structure of the DB schema is encoded with a graph neural network, and this representation is later used at both encoding and decoding time. Evaluation shows that encoding the DB schema structure improves our parser accuracy from 33.8% to 39.4%.
0
0
0