Qwen/Qwen3-14B Fine-tuned for NL2SQL++ v8
This model is a fine-tuned version of Qwen/Qwen3-14B on the NL2SQL++ v8 dataset with code-with-thought reasoning.
Model Details
- Base Model: Qwen/Qwen3-14B
- Task: Text-to-SQL generation
- Dataset: NL2SQL++ v8 with code-with-thought reasoning
- Fine-tuning Method: LoRA (Low-Rank Adaptation) with Unsloth
- Quantization: 16-bit merged weights
- Maximum Sequence Length: 4096 tokens
- Training Dataset Size: 56212 examples
- Validation Dataset Size: 1000 examples
Training Configuration
LoRA Parameters
- LoRA Rank (r): 64
- LoRA Alpha: 128
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Hyperparameters
- Learning Rate: 0.0002
- Training Epochs: 2
- Max Steps: N/A (using epochs)
- Train Batch Size: 64
- Eval Batch Size: 30
- Gradient Accumulation Steps: 2
- Effective Batch Size: 128
- Warmup Steps: 0
- Warmup Ratio: 0.1
- Optimizer: AdamW (torch)
- Learning Rate Scheduler: Cosine
- Weight Decay: 0.01
- Max Gradient Norm: 1.0
- Seed: 3407
- Instruction Part: "<|im_start|>user"
- Response Part: "<|im_start|>assistant"
Train Dataset Example
<|im_start|>user
You are an expert in SQL++ query generation. You will be given a document schema and a natural language query. You need to generate a valid SQL++ query equivalent to the natural language query.
Use the given document schema to generate the SQL++ query.
Document Schema:
{'route': {'properties': {'airline': {'samples': ['AM', 'B6'], 'type': 'string'}, 'airlineid': {'samples': ['airline_2009', 'airline_2638'], 'type': 'string'}, 'destinationairport': {'samples': ['ATL', 'IDA'], 'type': 'string'}, 'distance': {'samples': [303.2327637772, 1107.5683839502], 'type': 'number'}, 'equipment': {'samples': ['320', '738'], 'type': 'string'}, 'id': {'samples': [11138, 13958], 'type': 'number'}, 'schedule': {'items': {'properties': {'day': {'type': 'number'}, 'flight': {'type': 'string'}, 'utc': {'type': 'string'}}, 'type': 'object'}, 'samples': [[{'day': 0, 'flight': 'AM701', 'utc': '21:46:00'}], [{'day': 0, 'flight': 'B6285', 'utc': '16:12:00'}]], 'type': 'array'}, 'sourceairport': {'samples': ['BOS', 'IAH'], 'type': 'string'}, 'stops': {'samples': [0], 'type': 'number'}, 'type': {'samples': ['route'], 'type': 'string'}, '~meta': {'properties': {'id': {'samples': ['route_11138', 'route_13958'], 'type': 'string'}}, 'samples': [{'id': 'route_11138'}, {'id': 'route_13958'}], 'type': 'object'}}, 'type': 'object'}}
Natural Language Query
Which routes in the route collection rank first by the shortest distance within each destination airport when limited to seven results?
SQL++ Query:
<|im_end|>
<|im_start|>assistant
<think>
</think>
```sql++
SELECT d.id, d.destinationairport, ROW_NUMBER() OVER (PARTITION BY d.destinationairport ORDER BY d.distance NULLS FIRST) AS `row` FROM route AS d LIMIT 7;
<|im_end|>
## Val Dataset Example
<|im_start|>user
You are an expert in SQL++ query generation. You will be given a document schema and a natural language query. You need to generate a valid SQL++ query equivalent to the natural language query.
Use the given document schema to generate the SQL++ query.
Document Schema: {'route': {'properties': {'airline': {'samples': ['AS', 'BA'], 'type': 'string'}, 'airlineid': {'samples': ['airline_1355', 'airline_1756'], 'type': 'string'}, 'destinationairport': {'samples': ['ATL', 'JFK'], 'type': 'string'}, 'distance': {'samples': [448.3541058305, 466.0724892866], 'type': 'number'}, 'equipment': {'samples': ['73H 73J', '744'], 'type': 'string'}, 'id': {'samples': [11761, 14501], 'type': 'number'}, 'schedule': {'items': {'properties': {'day': {'type': 'number'}, 'flight': {'type': 'string'}, 'utc': {'type': 'string'}}, 'type': 'object'}, 'samples': [[{'day': 0, 'flight': 'AS136', 'utc': '05:15:00'}], [{'day': 0, 'flight': 'BA803', 'utc': '22:55:00'}]], 'type': 'array'}, 'sourceairport': {'samples': ['DUS', 'GCM'], 'type': 'string'}, 'stops': {'samples': [0], 'type': 'number'}, 'type': {'samples': ['route'], 'type': 'string'}, '~meta': {'properties': {'id': {'samples': ['route_11761', 'route_14501'], 'type': 'string'}}, 'samples': [{'id': 'route_11761'}, {'id': 'route_14501'}], 'type': 'object'}}, 'type': 'object'}}
Natural Language Query Retrieve the route identifier and destination airport fields from the route collection, limited to seven documents.
SQL++ Query: <|im_end|> <|im_start|>assistant
SELECT d.id, d.destinationairport FROM route AS d LIMIT 7;
<|im_end|>
- Downloads last month
- 37