Qwen3-4B-instruct-2507-exp01-dpo-merged

This is a merged model for the StructEval-T competition, created by fine-tuning Qwen/Qwen3-4B-Instruct-2507 using QLoRA (4-bit quantization) and Direct Preference Optimization (DPO).

Model Description

Base Model: Qwen/Qwen3-4B-Instruct-2507
Training Method: SFT + DPO with QLoRA
Model Type: Causal Language Model
Language: Japanese, English
License: Apache 2.0

Training Details

Datasets

SFT Dataset: daichira/hard-sft-4k
DPO Dataset: u-10bei/dpo-dataset-qwen-cot

Training Configuration

LoRA Rank: 64
LoRA Alpha: 128
LoRA Dropout: 0.0
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Sequence Length: 512 tokens
Batch Size: 4
Learning Rate: 2e-5
Epochs: 1

Merging

This model was created by merging the LoRA adapter weights into the base model weights using the merge_lora_adapter utility from the training pipeline.

Usage

In Colab (notebook2)

MODEL_SOURCE = "merged"
MERGED_MODEL_ID_OR_PATH = "kevineen/Qwen3-4B-instruct-2507-exp01-dpo-merged"

Direct Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("kevineen/Qwen3-4B-instruct-2507-exp01-dpo-merged")
tokenizer = AutoTokenizer.from_pretrained("kevineen/Qwen3-4B-instruct-2507-exp01-dpo-merged")

# Generate text
input_text = "Your input here"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Competition Details

This model was trained for the StructEval-T competition, focusing on accurate structured output generation (JSON, YAML, XML, TOML, CSV) from text instructions.

Competition Constraints

VRAM Limit: 15GB (Colab T4 GPU)
Inference Time Limit: 2h 20min
Base Model: Qwen/Qwen3-4B-Instruct-2507 only

Model Performance

See the experiment logs for detailed training metrics and evaluation results.

License

This model is based on Qwen3-4B-Instruct-2507 and follows the same license terms (Apache 2.0).

Contact

For questions or issues, please contact through the competition platform or open an issue in the HuggingFace repository.

Repository: https://huggingface.co/kevineen/Qwen3-4B-instruct-2507-exp01-dpo-merged Created: 2026-02-06

Downloads last month: -

Safetensors

Model size

4B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support