qwen3_4b_instruct_2507_sft_v1
This repository contains the supervised fine-tuning (SFT) checkpoint for a Qwen3 4B Instruct model trained with DeepSpeed ZeRO-3. The weights have been consolidated and exported to the Hugging Face safetensors format for easier deployment.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "Chouoftears/qwen3_4b_instruct_2507_sft_v1"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True)
Training
- Base model:
Qwen/Qwen3-4B-Instruct-2507 - Framework:
transformers==4.56.2 - Optimization: DeepSpeed ZeRO Stage-3, bf16
- SFT run name:
qwen3-4B-Instruct-2507-toucan-sft-3ep - Max sequence length: 262,144 tokens (per config)
Refer to training_args.bin in the original run directory for the full trainer configuration.
Files
model-0000X-of-00004.safetensors: model weights shardsmodel.safetensors.index.json: weight index mapconfig.json/generation_config.json: architecture and generation defaults- Tokenizer artifacts:
tokenizer.json,tokenizer_config.json,vocab.json,merges.txt,special_tokens_map.json,added_tokens.json chat_template.jinja: conversation formatting used during SFT
Limitations
This checkpoint inherits limitations from the base Qwen3 model and SFT data. Review and align with your downstream safety and compliance requirements before deployment.
- Downloads last month
- 16