Update README.md
Browse files
README.md
CHANGED
|
@@ -34,7 +34,7 @@ library_name: transformers
|
|
| 34 |
[Large Action Models (LAMs)](https://blog.salesforceairesearch.com/large-action-models/) are advanced language models designed to enhance decision-making by translating user intentions into executable actions. As the **brains of AI agents**, LAMs autonomously plan and execute tasks to achieve specific goals, making them invaluable for automating workflows across diverse domains.
|
| 35 |
**This model release is for research purposes only.**
|
| 36 |
|
| 37 |
-
The new **xLAM-2** series, built on our most advanced data synthesis, processing, and training pipelines, marks a significant leap in **multi-turn conversation** and **tool usage**. Trained using our novel APIGen-MT framework, which generates high-quality training data through simulated agent-human interactions. Our models achieve state-of-the-art performance on **BFCL** and **Ο-bench** benchmarks, outperforming frontier models like GPT-4o and Claude 3.5. Notably, even our smaller models demonstrate superior capabilities in multi-turn scenarios while maintaining exceptional consistency across trials.
|
| 38 |
|
| 39 |
We've also refined the **chat template** and **vLLM integration**, making it easier to build advanced AI agents. Compared to previous xLAM models, xLAM-2 offers superior performance and seamless deployment across applications.
|
| 40 |
|
|
@@ -46,42 +46,38 @@ We've also refined the **chat template** and **vLLM integration**, making it eas
|
|
| 46 |
|
| 47 |
|
| 48 |
## Table of Contents
|
| 49 |
-
- [Model Series](#model-series)
|
| 50 |
- [Usage](#usage)
|
| 51 |
- [Basic Usage with Huggingface Chat Template](#basic-usage-with-huggingface-chat-template)
|
|
|
|
|
|
|
|
|
|
| 52 |
- [Benchmark Results](#benchmark-results)
|
| 53 |
- [Citation](#citation)
|
| 54 |
|
| 55 |
---
|
|
|
|
| 56 |
## Model Series
|
| 57 |
|
| 58 |
[xLAM](https://huggingface.co/collections/Salesforce/xlam-models-65f00e2a0a63bbcd1c2dade4) series are significant better at many things including general tasks and function calling.
|
| 59 |
For the same number of parameters, the model have been fine-tuned across a wide range of agent tasks and scenarios, all while preserving the capabilities of the original model.
|
| 60 |
|
| 61 |
|
| 62 |
-
| Model | # Total Params | Context Length |
|
| 63 |
-
|
| 64 |
-
| Llama-xLAM-2-70b-fc-r | 70B | 128k |
|
| 65 |
-
| Llama-xLAM-2-8b-fc-r | 8B | 128k |
|
| 66 |
-
| xLAM-2-32b-fc-r | 32B | 32k (max 128k)* |
|
| 67 |
-
| xLAM-2-3b-fc-r | 3B | 32k (max 128k)* |
|
| 68 |
-
| xLAM-2-1b-fc-r | 1B | 32k (max 128k)* |
|
| 69 |
-
| xLAM-7b-r | 7.24B | 32k | Sep. 5, 2024|General, Function-calling | [π€ Link](https://huggingface.co/Salesforce/xLAM-7b-r) | -- |
|
| 70 |
-
| xLAM-8x7b-r | 46.7B | 32k | Sep. 5, 2024|General, Function-calling | [π€ Link](https://huggingface.co/Salesforce/xLAM-8x7b-r) | -- |
|
| 71 |
-
| xLAM-8x22b-r | 141B | 64k | Sep. 5, 2024|General, Function-calling | [π€ Link](https://huggingface.co/Salesforce/xLAM-8x22b-r) | -- |
|
| 72 |
-
| xLAM-1b-fc-r | 1.35B | 16k | July 17, 2024 | Function-calling| [π€ Link](https://huggingface.co/Salesforce/xLAM-1b-fc-r) | [π€ Link](https://huggingface.co/Salesforce/xLAM-1b-fc-r-gguf) |
|
| 73 |
-
| xLAM-7b-fc-r | 6.91B | 4k | July 17, 2024| Function-calling| [π€ Link](https://huggingface.co/Salesforce/xLAM-7b-fc-r) | [π€ Link](https://huggingface.co/Salesforce/xLAM-7b-fc-r-gguf) |
|
| 74 |
-
| xLAM-v0.1-r | 46.7B | 32k | Mar. 18, 2024 |General, Function-calling | [π€ Link](https://huggingface.co/Salesforce/xLAM-v0.1-r) | -- |
|
| 75 |
|
| 76 |
***Note:** The default context length for Qwen-2.5-based models is 32k, but you can use techniques like YaRN (Yet Another Recursive Network) to achieve maximum 128k context length. Please refer to [here](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct#processing-long-texts) for more details.
|
| 77 |
|
|
|
|
| 78 |
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
- β
All models are fully compatible with VLLM, FastChat, and Transformers-based inference frameworks.
|
| 83 |
|
| 84 |
-
---
|
| 85 |
|
| 86 |
## Usage
|
| 87 |
|
|
@@ -137,17 +133,90 @@ generated_tokens = outputs[:, input_ids_len:] # Slice the output to get only the
|
|
| 137 |
print(tokenizer.decode(generated_tokens[0], skip_special_tokens=True))
|
| 138 |
```
|
| 139 |
|
| 140 |
-
|
|
|
|
|
|
|
| 141 |
|
| 142 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 143 |
|
|
|
|
| 144 |
```bash
|
| 145 |
-
|
| 146 |
```
|
| 147 |
|
| 148 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 149 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 150 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 151 |
|
| 152 |
## Benchmark Results
|
| 153 |
|
|
@@ -155,7 +224,7 @@ And then interact with the model using your preferred method for querying a vLLM
|
|
| 155 |
<p align="center">
|
| 156 |
<img width="80%" alt="BFCL Results" src="https://github.com/apigen-mt/apigen-mt.github.io/blob/main/img/bfcl-result.png?raw=true">
|
| 157 |
<br>
|
| 158 |
-
<small><i>Performance comparison of different models on BFCL leaderboard. The rank is based on the overall accuracy, which is a weighted average of different evaluation categories. "FC" stands for function-calling mode in contrast to using a customized "prompt" to extract the function calls.</i></small>
|
| 159 |
</p>
|
| 160 |
|
| 161 |
### Ο-bench Benchmark
|
|
@@ -194,6 +263,9 @@ If you use our model or dataset in your work, please cite our paper:
|
|
| 194 |
}
|
| 195 |
```
|
| 196 |
|
|
|
|
|
|
|
|
|
|
| 197 |
```bibtex
|
| 198 |
@article{zhang2025actionstudio,
|
| 199 |
title={ActionStudio: A Lightweight Framework for Data and Training of Action Models},
|
|
@@ -212,8 +284,6 @@ If you use our model or dataset in your work, please cite our paper:
|
|
| 212 |
}
|
| 213 |
|
| 214 |
```
|
| 215 |
-
Additionally, please check our other related works regarding xLAM and consider citing them as well:
|
| 216 |
-
|
| 217 |
|
| 218 |
```bibtex
|
| 219 |
@article{liu2024apigen,
|
|
@@ -235,4 +305,3 @@ Additionally, please check our other related works regarding xLAM and consider c
|
|
| 235 |
}
|
| 236 |
```
|
| 237 |
|
| 238 |
-
|
|
|
|
| 34 |
[Large Action Models (LAMs)](https://blog.salesforceairesearch.com/large-action-models/) are advanced language models designed to enhance decision-making by translating user intentions into executable actions. As the **brains of AI agents**, LAMs autonomously plan and execute tasks to achieve specific goals, making them invaluable for automating workflows across diverse domains.
|
| 35 |
**This model release is for research purposes only.**
|
| 36 |
|
| 37 |
+
The new **xLAM-2** series, built on our most advanced data synthesis, processing, and training pipelines, marks a significant leap in **multi-turn conversation** and **tool usage**. Trained using our novel APIGen-MT framework, which generates high-quality training data through simulated agent-human interactions. Our models achieve state-of-the-art performance on [**BFCL**](https://gorilla.cs.berkeley.edu/leaderboard.html) and **Ο-bench** benchmarks, outperforming frontier models like GPT-4o and Claude 3.5. Notably, even our smaller models demonstrate superior capabilities in multi-turn scenarios while maintaining exceptional consistency across trials.
|
| 38 |
|
| 39 |
We've also refined the **chat template** and **vLLM integration**, making it easier to build advanced AI agents. Compared to previous xLAM models, xLAM-2 offers superior performance and seamless deployment across applications.
|
| 40 |
|
|
|
|
| 46 |
|
| 47 |
|
| 48 |
## Table of Contents
|
|
|
|
| 49 |
- [Usage](#usage)
|
| 50 |
- [Basic Usage with Huggingface Chat Template](#basic-usage-with-huggingface-chat-template)
|
| 51 |
+
- [Using vLLM for Inference](#using-vllm-for-inference)
|
| 52 |
+
- [Setup and Serving](#setup-and-serving)
|
| 53 |
+
- [Testing with OpenAI API](#testing-with-openai-api)
|
| 54 |
- [Benchmark Results](#benchmark-results)
|
| 55 |
- [Citation](#citation)
|
| 56 |
|
| 57 |
---
|
| 58 |
+
|
| 59 |
## Model Series
|
| 60 |
|
| 61 |
[xLAM](https://huggingface.co/collections/Salesforce/xlam-models-65f00e2a0a63bbcd1c2dade4) series are significant better at many things including general tasks and function calling.
|
| 62 |
For the same number of parameters, the model have been fine-tuned across a wide range of agent tasks and scenarios, all while preserving the capabilities of the original model.
|
| 63 |
|
| 64 |
|
| 65 |
+
| Model | # Total Params | Context Length | Category | Download Model | Download GGUF files |
|
| 66 |
+
|------------------------|----------------|------------|-------|----------------|----------|
|
| 67 |
+
| Llama-xLAM-2-70b-fc-r | 70B | 128k | Multi-turn Conversation, Function-calling | [π€ Link](https://huggingface.co/Salesforce/Llama-xLAM-2-70b-fc-r) | NA |
|
| 68 |
+
| Llama-xLAM-2-8b-fc-r | 8B | 128k | Multi-turn Conversation, Function-calling | [π€ Link](https://huggingface.co/Salesforce/Llama-xLAM-2-8b-fc-r) | [π€ Link](https://huggingface.co/Salesforce/Llama-xLAM-2-8b-fc-r-gguf) |
|
| 69 |
+
| xLAM-2-32b-fc-r | 32B | 32k (max 128k)* | Multi-turn Conversation, Function-calling | [π€ Link](https://huggingface.co/Salesforce/xLAM-2-32b-fc-r) | NA |
|
| 70 |
+
| xLAM-2-3b-fc-r | 3B | 32k (max 128k)* | Multi-turn Conversation, Function-calling | [π€ Link](https://huggingface.co/Salesforce/xLAM-2-3b-fc-r) | [π€ Link](https://huggingface.co/Salesforce/xLAM-2-3b-fc-r-gguf) |
|
| 71 |
+
| xLAM-2-1b-fc-r | 1B | 32k (max 128k)* | Multi-turn Conversation, Function-calling | [π€ Link](https://huggingface.co/Salesforce/xLAM-2-1b-fc-r) | [π€ Link](https://huggingface.co/Salesforce/xLAM-2-1b-fc-r-gguf) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
***Note:** The default context length for Qwen-2.5-based models is 32k, but you can use techniques like YaRN (Yet Another Recursive Network) to achieve maximum 128k context length. Please refer to [here](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct#processing-long-texts) for more details.
|
| 74 |
|
| 75 |
+
You can also explore our previous xLAM series [here](https://huggingface.co/collections/Salesforce/xlam-models-65f00e2a0a63bbcd1c2dade4).
|
| 76 |
|
| 77 |
+
The `-fc` suffix indicates that the models are fine-tuned for **function calling** tasks, while the `-r` suffix signifies a **research** release.
|
| 78 |
+
|
| 79 |
+
β
All models are fully compatible with vLLM and Transformers-based inference frameworks.
|
|
|
|
| 80 |
|
|
|
|
| 81 |
|
| 82 |
## Usage
|
| 83 |
|
|
|
|
| 133 |
print(tokenizer.decode(generated_tokens[0], skip_special_tokens=True))
|
| 134 |
```
|
| 135 |
|
| 136 |
+
### Using vLLM for Inference
|
| 137 |
+
|
| 138 |
+
The xLAM models can also be efficiently served using vLLM for high-throughput inference. Please use `vllm>=0.6.5` since earlier versions will cause degraded performance for Qwen-based models.
|
| 139 |
|
| 140 |
+
#### Setup and Serving
|
| 141 |
+
|
| 142 |
+
1. Install vLLM with the required version:
|
| 143 |
+
```bash
|
| 144 |
+
pip install "vllm>=0.6.5"
|
| 145 |
+
```
|
| 146 |
|
| 147 |
+
2. Download the tool parser plugin to your local path:
|
| 148 |
```bash
|
| 149 |
+
wget https://huggingface.co/Salesforce/xLAM-2-1b-fc-r/raw/main/xlam_tool_call_parser.py
|
| 150 |
```
|
| 151 |
|
| 152 |
+
3. Start the OpenAI API-compatible endpoint:
|
| 153 |
+
```bash
|
| 154 |
+
vllm serve Salesforce/xLAM-2-1b-fc-r \
|
| 155 |
+
--enable-auto-tool-choice \
|
| 156 |
+
--tool-parser-plugin ./xlam_tool_call_parser.py \
|
| 157 |
+
--tool-call-parser xlam \
|
| 158 |
+
--tensor-parallel-size 1
|
| 159 |
+
```
|
| 160 |
+
|
| 161 |
+
Note: Ensure that the tool parser plugin file is downloaded and that the path specified in `--tool-parser-plugin` correctly points to your local copy of the file. The xLAM series models all utilize the **same** tool call parser, so you only need to download it **once** for all models.
|
| 162 |
+
|
| 163 |
+
#### Testing with OpenAI API
|
| 164 |
+
|
| 165 |
+
Here's a minimal example to test tool usage with the served endpoint:
|
| 166 |
+
|
| 167 |
+
```python
|
| 168 |
+
import openai
|
| 169 |
+
import json
|
| 170 |
|
| 171 |
+
# Configure the client to use your local vLLM endpoint
|
| 172 |
+
client = openai.OpenAI(
|
| 173 |
+
base_url="http://localhost:8000/v1", # Default vLLM server URL
|
| 174 |
+
api_key="empty" # Can be any string
|
| 175 |
+
)
|
| 176 |
|
| 177 |
+
# Define a tool/function
|
| 178 |
+
tools = [
|
| 179 |
+
{
|
| 180 |
+
"type": "function",
|
| 181 |
+
"function": {
|
| 182 |
+
"name": "get_weather",
|
| 183 |
+
"description": "Get the current weather for a location",
|
| 184 |
+
"parameters": {
|
| 185 |
+
"type": "object",
|
| 186 |
+
"properties": {
|
| 187 |
+
"location": {
|
| 188 |
+
"type": "string",
|
| 189 |
+
"description": "The city and state, e.g. San Francisco, CA"
|
| 190 |
+
},
|
| 191 |
+
"unit": {
|
| 192 |
+
"type": "string",
|
| 193 |
+
"enum": ["celsius", "fahrenheit"],
|
| 194 |
+
"description": "The unit of temperature to return"
|
| 195 |
+
}
|
| 196 |
+
},
|
| 197 |
+
"required": ["location"]
|
| 198 |
+
}
|
| 199 |
+
}
|
| 200 |
+
}
|
| 201 |
+
]
|
| 202 |
+
|
| 203 |
+
# Create a chat completion
|
| 204 |
+
response = client.chat.completions.create(
|
| 205 |
+
model="Salesforce/xLAM-2-1b-fc-r", # Model name doesn't matter, vLLM uses the served model
|
| 206 |
+
messages=[
|
| 207 |
+
{"role": "system", "content": "You are a helpful assistant that can use tools."},
|
| 208 |
+
{"role": "user", "content": "What's the weather like in San Francisco?"}
|
| 209 |
+
],
|
| 210 |
+
tools=tools,
|
| 211 |
+
tool_choice="auto"
|
| 212 |
+
)
|
| 213 |
+
|
| 214 |
+
# Print the response
|
| 215 |
+
print("Assistant's response:")
|
| 216 |
+
print(json.dumps(response.model_dump(), indent=2))
|
| 217 |
+
```
|
| 218 |
+
|
| 219 |
+
For more advanced configurations and deployment options, please refer to the [vLLM documentation](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html).
|
| 220 |
|
| 221 |
## Benchmark Results
|
| 222 |
|
|
|
|
| 224 |
<p align="center">
|
| 225 |
<img width="80%" alt="BFCL Results" src="https://github.com/apigen-mt/apigen-mt.github.io/blob/main/img/bfcl-result.png?raw=true">
|
| 226 |
<br>
|
| 227 |
+
<small><i>Performance comparison of different models on [BFCL leaderboard](https://gorilla.cs.berkeley.edu/leaderboard.html). The rank is based on the overall accuracy, which is a weighted average of different evaluation categories. "FC" stands for function-calling mode in contrast to using a customized "prompt" to extract the function calls.</i></small>
|
| 228 |
</p>
|
| 229 |
|
| 230 |
### Ο-bench Benchmark
|
|
|
|
| 263 |
}
|
| 264 |
```
|
| 265 |
|
| 266 |
+
Additionally, please check our other awesome related works regarding xLAM series and consider citing them as well:
|
| 267 |
+
|
| 268 |
+
|
| 269 |
```bibtex
|
| 270 |
@article{zhang2025actionstudio,
|
| 271 |
title={ActionStudio: A Lightweight Framework for Data and Training of Action Models},
|
|
|
|
| 284 |
}
|
| 285 |
|
| 286 |
```
|
|
|
|
|
|
|
| 287 |
|
| 288 |
```bibtex
|
| 289 |
@article{liu2024apigen,
|
|
|
|
| 305 |
}
|
| 306 |
```
|
| 307 |
|
|
|