YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Qwen3-4B-instruct-2507-exp01-dpo-merged
This is a merged model for the StructEval-T competition, created by fine-tuning Qwen/Qwen3-4B-Instruct-2507 using QLoRA (4-bit quantization) and Direct Preference Optimization (DPO).
Model Description
- Base Model: Qwen/Qwen3-4B-Instruct-2507
- Training Method: SFT + DPO with QLoRA
- Model Type: Causal Language Model
- Language: Japanese, English
- License: Apache 2.0
Training Details
Datasets
- SFT Dataset: daichira/hard-sft-4k
- DPO Dataset: u-10bei/dpo-dataset-qwen-cot
Training Configuration
- LoRA Rank: 64
- LoRA Alpha: 128
- LoRA Dropout: 0.0
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Sequence Length: 512 tokens
- Batch Size: 4
- Learning Rate: 2e-5
- Epochs: 1
Merging
This model was created by merging the LoRA adapter weights into the base model weights using the merge_lora_adapter utility from the training pipeline.
Usage
In Colab (notebook2)
MODEL_SOURCE = "merged"
MERGED_MODEL_ID_OR_PATH = "kevineen/Qwen3-4B-instruct-2507-exp01-dpo-merged"
Direct Inference
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("kevineen/Qwen3-4B-instruct-2507-exp01-dpo-merged")
tokenizer = AutoTokenizer.from_pretrained("kevineen/Qwen3-4B-instruct-2507-exp01-dpo-merged")
# Generate text
input_text = "Your input here"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Competition Details
This model was trained for the StructEval-T competition, focusing on accurate structured output generation (JSON, YAML, XML, TOML, CSV) from text instructions.
Competition Constraints
- VRAM Limit: 15GB (Colab T4 GPU)
- Inference Time Limit: 2h 20min
- Base Model: Qwen/Qwen3-4B-Instruct-2507 only
Model Performance
See the experiment logs for detailed training metrics and evaluation results.
License
This model is based on Qwen3-4B-Instruct-2507 and follows the same license terms (Apache 2.0).
Contact
For questions or issues, please contact through the competition platform or open an issue in the HuggingFace repository.
Repository: https://huggingface.co/kevineen/Qwen3-4B-instruct-2507-exp01-dpo-merged Created: 2026-02-06
- Downloads last month
- -