| | --- |
| | license: cc-by-nc-sa-4.0 |
| | language: |
| | - en |
| | library_name: transformers |
| | tags: |
| | - text-to-sql |
| | - text2sql |
| | - nlp2sql |
| | - nlp-to-sql |
| | - SQL |
| | --- |
| | # Model Card for text2sql |
| |
|
| | <!-- Provide a quick summary of what the model is/does. --> |
| |
|
| | LLM instruction finetuned for Text-to-SQL task. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | <!-- Provide a longer summary of what this model is. --> |
| |
|
| | - **Developed by:** [dataeaze systems pvt ltd](https://www.dataeaze.io/) |
| | - **Funded by :** [dataeaze systems pvt ltd](https://www.dataeaze.io/) |
| | - **Shared by :** [dataeaze systems pvt ltd](https://www.dataeaze.io/) |
| | - **Model type:** LlamaForCausalLM |
| | - **Language(s) (NLP):** English |
| | - **License:** [cc-by-nc-sa-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en) Model is made available under non-commercial use for research purposes only. For commercial usage please connect at contactus@dataeaze.io |
| | - **Finetuned from model :** [CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf) |
| |
|
| |
|
| | ## Uses |
| |
|
| | <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
| |
|
| | ### Direct Use |
| |
|
| | <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> |
| | Model can be used a tool to convert queries in expressed in natural language (English) to SQL statements |
| |
|
| |
|
| | ### Downstream Use |
| |
|
| | <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> |
| | The model could be used as the initial stage in a data analytics / business intelligence application pipeline. |
| |
|
| |
|
| | ### Out-of-Scope Use |
| |
|
| | <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
| |
|
| | Model has been fine tuned on a specific task of converting English language statements to SQL queries. |
| | Any use beyond this is not guaranteed to be accurate. |
| |
|
| | ## Bias, Risks, and Limitations |
| |
|
| | <!-- This section is meant to convey both technical and sociotechnical limitations. --> |
| |
|
| | - **Bias:** Trained for English language only. |
| | - **Risk:** Guardrails are reliant on the base models CodeLlama (Llama2). Finetuning could impact this behaviour. |
| | - **Limitations:** Intended to be a small model optimised for inference. Does not provide SoTA results on accuracy. |
| |
|
| |
|
| | ## How to Get Started with the Model |
| |
|
| | Use the code below to get started with the model. |
| |
|
| | ``` |
| | import torch |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | |
| | model = AutoModelForCausalLM.from_pretrained( |
| | "dataeaze/dataeaze-text2sql-codellama_7b_instruct-dzsql", |
| | torch_dtype=torch.bfloat16, |
| | device_map='auto' |
| | ) |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("dataeaze/dataeaze-text2sql-codellama_7b_instruct-dzsql") |
| | # print("model device :", model.device) |
| | tokenizer.pad_token = tokenizer.eos_token |
| | model.eval() |
| | |
| | prompt = """ Below are sql tables schemas paired with instruction that describes a task. |
| | Using valid SQLite, write a response that appropriately completes the request for the provided tables. |
| | ### Instruction: How many transactions were made by a customer in a specific month? |
| | ### Database: RewardsProgramDB61 |
| | ### Input: |
| | CREATE SCHEMA RewardsProgram; |
| | |
| | CREATE TABLE Customer ( |
| | CustomerID INT NOT NULL AUTO_INCREMENT, |
| | FirstName VARCHAR(50) NOT NULL, |
| | LastName VARCHAR(50) NOT NULL, |
| | Email VARCHAR(100) UNIQUE NOT NULL, |
| | Phone VARCHAR(20) UNIQUE, |
| | DateOfBirth DATE, |
| | PRIMARY KEY (CustomerID) |
| | ); |
| | |
| | CREATE TABLE Membership ( |
| | MembershipID INT NOT NULL AUTO_INCREMENT, |
| | MembershipType VARCHAR(50) NOT NULL, |
| | DiscountPercentage DECIMAL(5, 2) NOT NULL, |
| | ValidFrom DATETIME, |
| | ValidTo DATETIME, |
| | CustomerID INT NOT NULL, |
| | PRIMARY KEY (MembershipID), |
| | FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID) |
| | ); |
| | |
| | CREATE TABLE Transaction ( |
| | TransactionID INT NOT NULL AUTO_INCREMENT, |
| | TransactionDate TIMESTAMP, |
| | TotalAmount DECIMAL(10, 2) NOT NULL, |
| | CustomerID INT NOT NULL, |
| | PRIMARY KEY (TransactionID), |
| | FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID) |
| | ); |
| | |
| | CREATE TABLE TransactionDetail ( |
| | TransactionDetailID INT NOT NULL AUTO_INCREMENT, |
| | TransactionID INT NOT NULL, |
| | ProductID INT NOT NULL, |
| | Quantity INT NOT NULL, |
| | UnitPrice DECIMAL(10, 2) NOT NULL, |
| | PRIMARY KEY (TransactionDetailID), |
| | FOREIGN KEY (TransactionID) REFERENCES Transaction(TransactionID), |
| | FOREIGN KEY (ProductID) REFERENCES Product(ProductID) |
| | ); |
| | |
| | CREATE TABLE Product ( |
| | ProductID INT NOT NULL AUTO_INCREMENT, |
| | ProductName VARCHAR(100) NOT NULL, |
| | UnitPrice DECIMAL(10, 2) NOT NULL, |
| | AvailableQuantity INT NOT NULL, |
| | CreatedDate DATETIME, |
| | PRIMARY KEY (ProductID) |
| | ); |
| | |
| | ALTER TABLE Membership ADD CONSTRAINT FK_Membership_Customer FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID); |
| | |
| | ALTER TABLE TransactionDetail ADD CONSTRAINT FK_TransactionDetail_Transaction FOREIGN KEY (TransactionID) REFERENCES Transaction(TransactionID); |
| | |
| | ALTER TABLE TransactionDetail ADD CONSTRAINT FK_TransactionDetail_Product FOREIGN KEY (ProductID) REFERENCES Product(ProductID);" |
| | """ |
| | |
| | input_ids = tokenizer(prompt, padding=True, return_tensors='pt') |
| | outputs = model.generate( |
| | input_ids=input_ids['input_ids'].to(model.device), |
| | attention_mask=input_ids['attention_mask'].to(model.device), |
| | max_new_tokens=3072, |
| | ) |
| | |
| | generated_query = tokenizer.decode(outputs[0], skip_special_tokens=True) |
| | print(generated_query) |
| | |
| | |
| | ``` |
| |
|
| |
|
| | ## Evaluation |
| |
|
| | <!-- This section describes the evaluation protocols and provides the results. --> |
| |
|
| | ### Testing Data & Metrics |
| |
|
| | #### Testing Data |
| |
|
| | <!-- This should link to a Dataset Card if possible. --> |
| |
|
| | [SPIDER dataset Test Set](https://yale-lily.github.io/spider) |
| |
|
| |
|
| | #### Metrics |
| |
|
| | <!-- These are the evaluation metrics being used, ideally with a description of why. --> |
| |
|
| | SQL queries are matched against the correct answer, with two types of evaluation |
| | * Execution with Values |
| | * Exact Set Match without Values |
| |
|
| | ### Results |
| |
|
| | ``` |
| | model-index: |
| | - name: dataeaze/dataeaze-text2sql-codellama_7b_instruct-dzsql |
| | results: |
| | - task: |
| | type: text-to-sql |
| | dataset: |
| | name: SPIDER 1.0 |
| | type: text-to-sql |
| | metrics: |
| | - name: Execution with Values |
| | type: Execution with Values |
| | value: 20.6 |
| | - name: Exact Set Match without Values |
| | type: Exact Set Match without Values |
| | value: 16.7 |
| | source: |
| | name: Spider 1.0 - Leaderboard |
| | url: https://yale-lily.github.io/spider |
| | ``` |
| |
|
| |
|
| | ## Model Card Authors |
| |
|
| | * Suyash Chougule |
| | * Chittaranjan Rathod |
| | * Sourabh Daptardar |
| |
|
| | ## Model Card Contact |
| |
|
| | "dataeaze systems" <contactus@dataeaze.io> |