YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Qwen3-4B-instruct-2507-exp01-dpo-merged

This is a merged model for the StructEval-T competition, created by fine-tuning Qwen/Qwen3-4B-Instruct-2507 using QLoRA (4-bit quantization) and Direct Preference Optimization (DPO).

Model Description

  • Base Model: Qwen/Qwen3-4B-Instruct-2507
  • Training Method: SFT + DPO with QLoRA
  • Model Type: Causal Language Model
  • Language: Japanese, English
  • License: Apache 2.0

Training Details

Datasets

  • SFT Dataset: daichira/hard-sft-4k
  • DPO Dataset: u-10bei/dpo-dataset-qwen-cot

Training Configuration

  • LoRA Rank: 64
  • LoRA Alpha: 128
  • LoRA Dropout: 0.0
  • Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Sequence Length: 512 tokens
  • Batch Size: 4
  • Learning Rate: 2e-5
  • Epochs: 1

Merging

This model was created by merging the LoRA adapter weights into the base model weights using the merge_lora_adapter utility from the training pipeline.

Usage

In Colab (notebook2)

MODEL_SOURCE = "merged"
MERGED_MODEL_ID_OR_PATH = "kevineen/Qwen3-4B-instruct-2507-exp01-dpo-merged"

Direct Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("kevineen/Qwen3-4B-instruct-2507-exp01-dpo-merged")
tokenizer = AutoTokenizer.from_pretrained("kevineen/Qwen3-4B-instruct-2507-exp01-dpo-merged")

# Generate text
input_text = "Your input here"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Competition Details

This model was trained for the StructEval-T competition, focusing on accurate structured output generation (JSON, YAML, XML, TOML, CSV) from text instructions.

Competition Constraints

  • VRAM Limit: 15GB (Colab T4 GPU)
  • Inference Time Limit: 2h 20min
  • Base Model: Qwen/Qwen3-4B-Instruct-2507 only

Model Performance

See the experiment logs for detailed training metrics and evaluation results.

License

This model is based on Qwen3-4B-Instruct-2507 and follows the same license terms (Apache 2.0).

Contact

For questions or issues, please contact through the competition platform or open an issue in the HuggingFace repository.


Repository: https://huggingface.co/kevineen/Qwen3-4B-instruct-2507-exp01-dpo-merged Created: 2026-02-06

Downloads last month
-
Safetensors
Model size
4B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support