W. Ronny Huang // [email protected]
Xin Chen // [email protected]
This repo provides an implementation of the SQLNet neural network for predicting SQL queries on WikiSQL dataset and on your own custom dataset. The original paper on SQLNet is available at here.
For a live demo with your own typed custom questions, you must first run the installation and training as outlined below.
To perform transfer learning, first train the SQLNet weights on the large WikiSQL dataset (details below) and save those weights into the folder saved_model_pretrained_wikisql/. Then uncomment the lines in train_mc.py which load the pretrained weights from that saved model and also uncomment the lines which commence finetuning.
After transfer learning is finished and the weights are saved into saved_model/, run the following command to do custom inference.
python infer_mc.py --ca
This will allow you to run your own custom inference (English) statements on your own SQL table. It will then return the predicted SQL query.
The data is in data.tar.bz2. Unzip the code by running
tar -xjvf data.tar.bz2The code is written using PyTorch 0.2.0 in Python 2.7. Check here to install PyTorch, or run
conda install pytorch=0.2.0 cuda90 -c pytorchYou can install other dependency by running
pip install -r requirements.txtDownload the pretrained glove embedding from here using
bash download_glove.shRun the following command to process the pretrained glove embedding for training the word embedding:
python extract_vocab.pyThe training script is train.py. To see the detailed parameters for running:
python train.py -hSome typical usage are listed as below:
Train a SQLNet model with column attention:
python train.py --caSpecify a gpu with
python train.py --ca --gpu=0Train a SQLNet model with column attention and trainable embedding (requires pretraining without training embedding, i.e., executing the command above):
python train.py --ca --train_embPretrain a Seq2SQL model on the re-splitted dataset
python train.py --baseline --dataset 1Train a Seq2SQL model with Reinforcement Learning after pretraining
python train.py --baseline --dataset 1 --rlThe script for evaluation on the dev split and test split. The parameters for evaluation is roughly the same as the one used for training. For example, the commands for evaluating the models from above commands are:
Test a trained SQLNet model with column attention
python test.py --caTest a trained SQLNet model with column attention and trainable embedding:
python test.py --ca --train_embTest a trained Seq2SQL model withour RL on the re-splitted dataset
python test.py --baseline --dataset 1Test a trained Seq2SQL model with Reinforcement learning
python test.py --baseline --dataset 1 --rlXiaojun Xu, Chang Liu, Dawn Song. 2017. SQLNet: Generating Structured Queries from Natural Language Without Reinforcement Learning.
@article{xu2017sqlnet,
title={SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning},
author={Xu, Xiaojun and Liu, Chang and Song, Dawn},
journal={arXiv preprint arXiv:1711.04436},
year={2017}
}