Hanzo Dev
commited on
Commit
·
91d2627
1
Parent(s):
52a5ab3
Add Zoo Nano-1: Ultra-lightweight 4B MLX model with Zoo branding
Browse files
README.md
CHANGED
|
@@ -1,18 +1,27 @@
|
|
| 1 |
---
|
| 2 |
library_name: mlx
|
| 3 |
license: apache-2.0
|
| 4 |
-
license_link: https://huggingface.co/
|
| 5 |
pipeline_tag: text-generation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
---
|
| 7 |
|
| 8 |
-
#
|
| 9 |
-
<a href="https://
|
| 10 |
-
<img alt="
|
| 11 |
</a>
|
| 12 |
|
| 13 |
-
##
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
| 16 |
|
| 17 |
- **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios.
|
| 18 |
- **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.
|
|
@@ -22,7 +31,7 @@ Qwen3 is the latest generation of large language models in Qwen series, offering
|
|
| 22 |
|
| 23 |
## Model Overview
|
| 24 |
|
| 25 |
-
**
|
| 26 |
- Type: Causal Language Models
|
| 27 |
- Training Stage: Pretraining & Post-training
|
| 28 |
- Number of Parameters: 4.0B
|
|
@@ -32,7 +41,7 @@ Qwen3 is the latest generation of large language models in Qwen series, offering
|
|
| 32 |
- Context Length: 32,768 natively and [131,072 tokens with YaRN](#processing-long-texts).
|
| 33 |
|
| 34 |
|
| 35 |
-
For more details
|
| 36 |
|
| 37 |
## Quickstart
|
| 38 |
|
|
@@ -54,7 +63,7 @@ The following contains a code snippet illustrating how to use the model generate
|
|
| 54 |
```python
|
| 55 |
from mlx_lm import load, generate
|
| 56 |
|
| 57 |
-
model, tokenizer = load("
|
| 58 |
prompt = "Hello, please introduce yourself and tell me what you can do."
|
| 59 |
|
| 60 |
if tokenizer.chat_template is not None:
|
|
@@ -128,8 +137,8 @@ Here is an example of a multi-turn conversation:
|
|
| 128 |
from mlx_lm import load, generate
|
| 129 |
|
| 130 |
|
| 131 |
-
class
|
| 132 |
-
def __init__(self, model_name="
|
| 133 |
self.model, self.tokenizer = load(model_name)
|
| 134 |
self.history = []
|
| 135 |
|
|
@@ -158,7 +167,7 @@ class QwenChatbot:
|
|
| 158 |
|
| 159 |
# Example Usage
|
| 160 |
if __name__ == "__main__":
|
| 161 |
-
chatbot =
|
| 162 |
|
| 163 |
# First input (without /think or /no_think tags, thinking mode is enabled by default)
|
| 164 |
user_input_1 = "How many 'r's are in strawberries?"
|
|
@@ -187,7 +196,7 @@ if __name__ == "__main__":
|
|
| 187 |
|
| 188 |
## Agentic Use
|
| 189 |
|
| 190 |
-
|
| 191 |
|
| 192 |
To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself.
|
| 193 |
|
|
@@ -196,7 +205,7 @@ from qwen_agent.agents import Assistant
|
|
| 196 |
|
| 197 |
# Define LLM
|
| 198 |
llm_cfg = {
|
| 199 |
-
"model": "
|
| 200 |
|
| 201 |
# Use the endpoint provided by Alibaba Model Studio:
|
| 202 |
# "model_type": "qwen_dashscope",
|
|
@@ -250,7 +259,7 @@ print(responses)
|
|
| 250 |
|
| 251 |
## Processing Long Texts
|
| 252 |
|
| 253 |
-
|
| 254 |
|
| 255 |
YaRN is currently supported by several inference frameworks, e.g., `transformers` and `llama.cpp` for local use, `vllm` and `sglang` for deployment. In general, there are two approaches to enabling YaRN for supported frameworks:
|
| 256 |
|
|
@@ -304,16 +313,22 @@ To achieve optimal performance, we recommend the following settings:
|
|
| 304 |
|
| 305 |
### Citation
|
| 306 |
|
| 307 |
-
If you find
|
| 308 |
|
| 309 |
```
|
| 310 |
-
@
|
| 311 |
-
|
| 312 |
-
|
| 313 |
-
|
| 314 |
-
|
| 315 |
-
|
| 316 |
-
primaryClass={cs.CL},
|
| 317 |
-
url={https://arxiv.org/abs/2505.09388},
|
| 318 |
}
|
| 319 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
library_name: mlx
|
| 3 |
license: apache-2.0
|
| 4 |
+
license_link: https://huggingface.co/zooai/nano-1/blob/main/LICENSE
|
| 5 |
pipeline_tag: text-generation
|
| 6 |
+
tags:
|
| 7 |
+
- zoo
|
| 8 |
+
- nano
|
| 9 |
+
- lightweight
|
| 10 |
+
- edge-computing
|
| 11 |
+
- mlx
|
| 12 |
+
- 4bit
|
| 13 |
---
|
| 14 |
|
| 15 |
+
# Zoo Nano-1 (4B MLX Model)
|
| 16 |
+
<a href="https://zoo.ai/" target="_blank" style="margin: 2px;">
|
| 17 |
+
<img alt="Zoo AI" src="https://img.shields.io/badge/⚡%20Zoo%20Nano--1%20-22C55E" style="display: inline-block; vertical-align: middle;"/>
|
| 18 |
</a>
|
| 19 |
|
| 20 |
+
## Zoo Nano-1 Highlights
|
| 21 |
|
| 22 |
+
**Zoo Nano-1** is an ultra-lightweight AI model optimized for edge computing and resource-constrained environments. Based on the Qwen3-4B architecture with MLX 4-bit quantization, this model delivers impressive performance while maintaining a minimal memory footprint of just ~700MB.
|
| 23 |
+
|
| 24 |
+
### Key Features
|
| 25 |
|
| 26 |
- **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios.
|
| 27 |
- **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.
|
|
|
|
| 31 |
|
| 32 |
## Model Overview
|
| 33 |
|
| 34 |
+
**Zoo Nano-1** has the following technical specifications:
|
| 35 |
- Type: Causal Language Models
|
| 36 |
- Training Stage: Pretraining & Post-training
|
| 37 |
- Number of Parameters: 4.0B
|
|
|
|
| 41 |
- Context Length: 32,768 natively and [131,072 tokens with YaRN](#processing-long-texts).
|
| 42 |
|
| 43 |
|
| 44 |
+
For more details about Zoo AI models and ecosystem, visit [zoo.ai](https://zoo.ai) and our [GitHub](https://github.com/zoo-ai).
|
| 45 |
|
| 46 |
## Quickstart
|
| 47 |
|
|
|
|
| 63 |
```python
|
| 64 |
from mlx_lm import load, generate
|
| 65 |
|
| 66 |
+
model, tokenizer = load("zooai/nano-1")
|
| 67 |
prompt = "Hello, please introduce yourself and tell me what you can do."
|
| 68 |
|
| 69 |
if tokenizer.chat_template is not None:
|
|
|
|
| 137 |
from mlx_lm import load, generate
|
| 138 |
|
| 139 |
|
| 140 |
+
class ZooChatbot:
|
| 141 |
+
def __init__(self, model_name="zooai/nano-1"):
|
| 142 |
self.model, self.tokenizer = load(model_name)
|
| 143 |
self.history = []
|
| 144 |
|
|
|
|
| 167 |
|
| 168 |
# Example Usage
|
| 169 |
if __name__ == "__main__":
|
| 170 |
+
chatbot = ZooChatbot()
|
| 171 |
|
| 172 |
# First input (without /think or /no_think tags, thinking mode is enabled by default)
|
| 173 |
user_input_1 = "How many 'r's are in strawberries?"
|
|
|
|
| 196 |
|
| 197 |
## Agentic Use
|
| 198 |
|
| 199 |
+
Zoo Nano-1 supports tool calling capabilities for lightweight agent applications. Compatible with [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) for enhanced functionality.
|
| 200 |
|
| 201 |
To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself.
|
| 202 |
|
|
|
|
| 205 |
|
| 206 |
# Define LLM
|
| 207 |
llm_cfg = {
|
| 208 |
+
"model": "zooai/nano-1",
|
| 209 |
|
| 210 |
# Use the endpoint provided by Alibaba Model Studio:
|
| 211 |
# "model_type": "qwen_dashscope",
|
|
|
|
| 259 |
|
| 260 |
## Processing Long Texts
|
| 261 |
|
| 262 |
+
Zoo Nano-1 natively supports context lengths of up to 32,768 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the [YaRN](https://arxiv.org/abs/2309.00071) method.
|
| 263 |
|
| 264 |
YaRN is currently supported by several inference frameworks, e.g., `transformers` and `llama.cpp` for local use, `vllm` and `sglang` for deployment. In general, there are two approaches to enabling YaRN for supported frameworks:
|
| 265 |
|
|
|
|
| 313 |
|
| 314 |
### Citation
|
| 315 |
|
| 316 |
+
If you find Zoo Nano-1 helpful, please cite:
|
| 317 |
|
| 318 |
```
|
| 319 |
+
@model{zoo2024nano,
|
| 320 |
+
title={Zoo Nano-1: Ultra-lightweight Language Model},
|
| 321 |
+
author={Zoo AI Team},
|
| 322 |
+
year={2024},
|
| 323 |
+
publisher={Zoo AI},
|
| 324 |
+
url={https://huggingface.co/zooai/nano-1}
|
|
|
|
|
|
|
| 325 |
}
|
| 326 |
+
```
|
| 327 |
+
|
| 328 |
+
### About Zoo AI
|
| 329 |
+
|
| 330 |
+
Zoo AI is building next-generation AI infrastructure with a focus on efficiency, accessibility, and performance. Our models are designed to run anywhere - from edge devices to enterprise clusters.
|
| 331 |
+
|
| 332 |
+
- **Website**: [zoo.ngo](https://zoo.ngo)
|
| 333 |
+
- **HuggingFace**: [huggingface.co/zooai](https://huggingface.co/zooai)
|
| 334 |
+
- **Spaces**: [huggingface.co/spaces/zooai](https://huggingface.co/spaces/zooai)
|