Hanzo Dev commited on
Commit
91d2627
·
1 Parent(s): 52a5ab3

Add Zoo Nano-1: Ultra-lightweight 4B MLX model with Zoo branding

Browse files
Files changed (1) hide show
  1. README.md +40 -25
README.md CHANGED
@@ -1,18 +1,27 @@
1
  ---
2
  library_name: mlx
3
  license: apache-2.0
4
- license_link: https://huggingface.co/Qwen/Qwen3-14B/blob/main/LICENSE
5
  pipeline_tag: text-generation
 
 
 
 
 
 
 
6
  ---
7
 
8
- # Qwen3-4B-MLX-4bit
9
- <a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
10
- <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
11
  </a>
12
 
13
- ## Qwen3 Highlights
14
 
15
- Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:
 
 
16
 
17
  - **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios.
18
  - **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.
@@ -22,7 +31,7 @@ Qwen3 is the latest generation of large language models in Qwen series, offering
22
 
23
  ## Model Overview
24
 
25
- **Qwen3-4B** has the following features:
26
  - Type: Causal Language Models
27
  - Training Stage: Pretraining & Post-training
28
  - Number of Parameters: 4.0B
@@ -32,7 +41,7 @@ Qwen3 is the latest generation of large language models in Qwen series, offering
32
  - Context Length: 32,768 natively and [131,072 tokens with YaRN](#processing-long-texts).
33
 
34
 
35
- For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
36
 
37
  ## Quickstart
38
 
@@ -54,7 +63,7 @@ The following contains a code snippet illustrating how to use the model generate
54
  ```python
55
  from mlx_lm import load, generate
56
 
57
- model, tokenizer = load("Qwen/Qwen3-4B-MLX-4bit")
58
  prompt = "Hello, please introduce yourself and tell me what you can do."
59
 
60
  if tokenizer.chat_template is not None:
@@ -128,8 +137,8 @@ Here is an example of a multi-turn conversation:
128
  from mlx_lm import load, generate
129
 
130
 
131
- class QwenChatbot:
132
- def __init__(self, model_name="Qwen/Qwen3-4B-MLX-4bit"):
133
  self.model, self.tokenizer = load(model_name)
134
  self.history = []
135
 
@@ -158,7 +167,7 @@ class QwenChatbot:
158
 
159
  # Example Usage
160
  if __name__ == "__main__":
161
- chatbot = QwenChatbot()
162
 
163
  # First input (without /think or /no_think tags, thinking mode is enabled by default)
164
  user_input_1 = "How many 'r's are in strawberries?"
@@ -187,7 +196,7 @@ if __name__ == "__main__":
187
 
188
  ## Agentic Use
189
 
190
- Qwen3 excels in tool calling capabilities. We recommend using [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) to make the best use of agentic ability of Qwen3. Qwen-Agent encapsulates tool-calling templates and tool-calling parsers internally, greatly reducing coding complexity.
191
 
192
  To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself.
193
 
@@ -196,7 +205,7 @@ from qwen_agent.agents import Assistant
196
 
197
  # Define LLM
198
  llm_cfg = {
199
- "model": "Qwen3-4B-MLX-4bit",
200
 
201
  # Use the endpoint provided by Alibaba Model Studio:
202
  # "model_type": "qwen_dashscope",
@@ -250,7 +259,7 @@ print(responses)
250
 
251
  ## Processing Long Texts
252
 
253
- Qwen3 natively supports context lengths of up to 32,768 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the [YaRN](https://arxiv.org/abs/2309.00071) method.
254
 
255
  YaRN is currently supported by several inference frameworks, e.g., `transformers` and `llama.cpp` for local use, `vllm` and `sglang` for deployment. In general, there are two approaches to enabling YaRN for supported frameworks:
256
 
@@ -304,16 +313,22 @@ To achieve optimal performance, we recommend the following settings:
304
 
305
  ### Citation
306
 
307
- If you find our work helpful, feel free to give us a cite.
308
 
309
  ```
310
- @misc{qwen3technicalreport,
311
- title={Qwen3 Technical Report},
312
- author={Qwen Team},
313
- year={2025},
314
- eprint={2505.09388},
315
- archivePrefix={arXiv},
316
- primaryClass={cs.CL},
317
- url={https://arxiv.org/abs/2505.09388},
318
  }
319
- ```
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: mlx
3
  license: apache-2.0
4
+ license_link: https://huggingface.co/zooai/nano-1/blob/main/LICENSE
5
  pipeline_tag: text-generation
6
+ tags:
7
+ - zoo
8
+ - nano
9
+ - lightweight
10
+ - edge-computing
11
+ - mlx
12
+ - 4bit
13
  ---
14
 
15
+ # Zoo Nano-1 (4B MLX Model)
16
+ <a href="https://zoo.ai/" target="_blank" style="margin: 2px;">
17
+ <img alt="Zoo AI" src="https://img.shields.io/badge/⚡%20Zoo%20Nano--1%20-22C55E" style="display: inline-block; vertical-align: middle;"/>
18
  </a>
19
 
20
+ ## Zoo Nano-1 Highlights
21
 
22
+ **Zoo Nano-1** is an ultra-lightweight AI model optimized for edge computing and resource-constrained environments. Based on the Qwen3-4B architecture with MLX 4-bit quantization, this model delivers impressive performance while maintaining a minimal memory footprint of just ~700MB.
23
+
24
+ ### Key Features
25
 
26
  - **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios.
27
  - **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.
 
31
 
32
  ## Model Overview
33
 
34
+ **Zoo Nano-1** has the following technical specifications:
35
  - Type: Causal Language Models
36
  - Training Stage: Pretraining & Post-training
37
  - Number of Parameters: 4.0B
 
41
  - Context Length: 32,768 natively and [131,072 tokens with YaRN](#processing-long-texts).
42
 
43
 
44
+ For more details about Zoo AI models and ecosystem, visit [zoo.ai](https://zoo.ai) and our [GitHub](https://github.com/zoo-ai).
45
 
46
  ## Quickstart
47
 
 
63
  ```python
64
  from mlx_lm import load, generate
65
 
66
+ model, tokenizer = load("zooai/nano-1")
67
  prompt = "Hello, please introduce yourself and tell me what you can do."
68
 
69
  if tokenizer.chat_template is not None:
 
137
  from mlx_lm import load, generate
138
 
139
 
140
+ class ZooChatbot:
141
+ def __init__(self, model_name="zooai/nano-1"):
142
  self.model, self.tokenizer = load(model_name)
143
  self.history = []
144
 
 
167
 
168
  # Example Usage
169
  if __name__ == "__main__":
170
+ chatbot = ZooChatbot()
171
 
172
  # First input (without /think or /no_think tags, thinking mode is enabled by default)
173
  user_input_1 = "How many 'r's are in strawberries?"
 
196
 
197
  ## Agentic Use
198
 
199
+ Zoo Nano-1 supports tool calling capabilities for lightweight agent applications. Compatible with [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) for enhanced functionality.
200
 
201
  To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself.
202
 
 
205
 
206
  # Define LLM
207
  llm_cfg = {
208
+ "model": "zooai/nano-1",
209
 
210
  # Use the endpoint provided by Alibaba Model Studio:
211
  # "model_type": "qwen_dashscope",
 
259
 
260
  ## Processing Long Texts
261
 
262
+ Zoo Nano-1 natively supports context lengths of up to 32,768 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the [YaRN](https://arxiv.org/abs/2309.00071) method.
263
 
264
  YaRN is currently supported by several inference frameworks, e.g., `transformers` and `llama.cpp` for local use, `vllm` and `sglang` for deployment. In general, there are two approaches to enabling YaRN for supported frameworks:
265
 
 
313
 
314
  ### Citation
315
 
316
+ If you find Zoo Nano-1 helpful, please cite:
317
 
318
  ```
319
+ @model{zoo2024nano,
320
+ title={Zoo Nano-1: Ultra-lightweight Language Model},
321
+ author={Zoo AI Team},
322
+ year={2024},
323
+ publisher={Zoo AI},
324
+ url={https://huggingface.co/zooai/nano-1}
 
 
325
  }
326
+ ```
327
+
328
+ ### About Zoo AI
329
+
330
+ Zoo AI is building next-generation AI infrastructure with a focus on efficiency, accessibility, and performance. Our models are designed to run anywhere - from edge devices to enterprise clusters.
331
+
332
+ - **Website**: [zoo.ngo](https://zoo.ngo)
333
+ - **HuggingFace**: [huggingface.co/zooai](https://huggingface.co/zooai)
334
+ - **Spaces**: [huggingface.co/spaces/zooai](https://huggingface.co/spaces/zooai)