STAR-1b7 / README.md

Update README.md

4f5d951 verified 19 days ago

4.23 kB

	---
	license: apache-2.0
	language:
	- en
	- zh
	pipeline_tag: text-generation
	base_model: Qwen/Qwen3-1.7B
	tags:
	- chat
	- function-calling
	- tool-use
	- star-method
	- sota
	library_name: transformers
	---

	# STAR-1b7

	## Introduction

	STAR-1b7 is a highly capable 1.7B parameter language model specialized in function calling, achieving excellent performances on the [Berkeley Function Calling Leaderboard (BFCL)](https://huggingface.co/spaces/gorilla-llm/berkeley-function-calling-leaderboard) for models in its size class.

	This model is the result of fine-tuning the `Qwen/Qwen3-1.7B` base model using the novel STAR (Similarity-guided Teacher-Assisted Refinement) framework. STAR is a holistic training curriculum designed to effectively transfer the advanced capabilities of large language models (LLMs) into "super-tiny" models, making them powerful, accessible, and efficient for real-world agentic applications.

	The key innovations of the STAR framework include:
	- Similarity-guided RL (Sim-RL): A reinforcement learning mechanism that uses a fine-grained, similarity-based reward signal. This provides a more robust and continuous signal for policy optimization compared to simple binary rewards, which is crucial for complex, multi-solution tasks like function calling.
	- Constrained Knowledge Distillation (CKD): An advanced training objective that augments top-k forward KL divergence to suppress confidently incorrect predictions. This ensures training stability while preserving the model's exploration capacity, creating a strong foundation for the subsequent RL phase.

	Our STAR-1b7 model significantly outperforms other open models under 1B parameters and even surpasses several larger models, demonstrating the effectiveness of the STAR methodology.

	## Model Details

	- Model Type: Causal Language Model, fine-tuned for function calling.
	- Base Model: `Qwen/Qwen3-1.7B`
	- Training Framework: STAR (CKD + Sim-RL)
	- Architecture: Transformer with RoPE, SwiGLU, RMSNorm, and Attention QKV bias.
	- Number of Parameters: ~1.7B
	- Context Length: Supports up to 32,768 tokens.

	## Requirements

	The code of Qwen3 has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`.
	With `transformers<4.51.0`, you will encounter the following error:
	```
	KeyError: 'qwen3'
	```

	## Quickstart

	Here is a code snippet showing how to load STAR-1b7 and use it for a chat-based task.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "star-lab/STAR-1b7"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_name)

	# Example prompt that could trigger a function call
	prompt = "What is the current weather in San Francisco?"
	messages = [
	{"role": "system", "content": "You are a helpful assistant with access to external tools."},
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=32768
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(response)
	```

	For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
	- SGLang:
	```shell
	python -m sglang.launch_server --model-path star-lab/STAR-1b7 --reasoning-parser qwen3
	```
	- vLLM:
	```shell
	vllm serve star-lab/STAR-1b7 --enable-reasoning --reasoning-parser deepseek_r1
	```

	For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported STAR-1b7.

	## Evaluation & Performance

	STAR-1b7 has achieved outstanding performance for models of its size on renowned function calling benchmarks.

	- BFCLv3: Achieved 56.05% overall accuracy.
	- ACEBench: Achieved 60.90% summary score, demonstrating superior generalization and robustness.