zooai
/

nano-1

@@ -1,18 +1,27 @@
 ---
 library_name: mlx
 license: apache-2.0
-license_link: https://huggingface.co/Qwen/Qwen3-14B/blob/main/LICENSE
 pipeline_tag: text-generation
 ---
-# Qwen3-4B-MLX-4bit
-<a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
-    <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
 </a>
-## Qwen3 Highlights
-Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:
 - **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios.
 - **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.
@@ -22,7 +31,7 @@ Qwen3 is the latest generation of large language models in Qwen series, offering
 ## Model Overview
-**Qwen3-4B** has the following features:
 - Type: Causal Language Models
 - Training Stage: Pretraining & Post-training
 - Number of Parameters: 4.0B
@@ -32,7 +41,7 @@ Qwen3 is the latest generation of large language models in Qwen series, offering
 - Context Length: 32,768 natively and [131,072 tokens with YaRN](#processing-long-texts).
-For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
 ## Quickstart
@@ -54,7 +63,7 @@ The following contains a code snippet illustrating how to use the model generate
 ```python
 from mlx_lm import load, generate
-model, tokenizer = load("Qwen/Qwen3-4B-MLX-4bit")
 prompt = "Hello, please introduce yourself and tell me what you can do."
 if tokenizer.chat_template is not None:
@@ -128,8 +137,8 @@ Here is an example of a multi-turn conversation:
 from mlx_lm import load, generate
-class QwenChatbot:
-    def __init__(self, model_name="Qwen/Qwen3-4B-MLX-4bit"):
         self.model, self.tokenizer = load(model_name)
         self.history = []
@@ -158,7 +167,7 @@ class QwenChatbot:
 # Example Usage
 if __name__ == "__main__":
-    chatbot = QwenChatbot()
     # First input (without /think or /no_think tags, thinking mode is enabled by default)
     user_input_1 = "How many 'r's are in strawberries?"
@@ -187,7 +196,7 @@ if __name__ == "__main__":
 ## Agentic Use
-Qwen3 excels in tool calling capabilities. We recommend using [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) to make the best use of agentic ability of Qwen3. Qwen-Agent encapsulates tool-calling templates and tool-calling parsers internally, greatly reducing coding complexity.
 To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself.
@@ -196,7 +205,7 @@ from qwen_agent.agents import Assistant
 # Define LLM
 llm_cfg = {
-    "model": "Qwen3-4B-MLX-4bit",
     # Use the endpoint provided by Alibaba Model Studio:
     # "model_type": "qwen_dashscope",
@@ -250,7 +259,7 @@ print(responses)
 ## Processing Long Texts
-Qwen3 natively supports context lengths of up to 32,768 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the [YaRN](https://arxiv.org/abs/2309.00071) method.
 YaRN is currently supported by several inference frameworks, e.g., `transformers` and `llama.cpp` for local use, `vllm` and `sglang` for deployment. In general, there are two approaches to enabling YaRN for supported frameworks:
@@ -304,16 +313,22 @@ To achieve optimal performance, we recommend the following settings:
 ### Citation
-If you find our work helpful, feel free to give us a cite.
 ```
-@misc{qwen3technicalreport,
-      title={Qwen3 Technical Report},
-      author={Qwen Team},
-      year={2025},
-      eprint={2505.09388},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL},
-      url={https://arxiv.org/abs/2505.09388},
 }
-```

 ---
 library_name: mlx
 license: apache-2.0
+license_link: https://huggingface.co/zooai/nano-1/blob/main/LICENSE
 pipeline_tag: text-generation
+tags:
+  - zoo
+  - nano
+  - lightweight
+  - edge-computing
+  - mlx
+  - 4bit
 ---
+# Zoo Nano-1 (4B MLX Model)
+<a href="https://zoo.ai/" target="_blank" style="margin: 2px;">
+    <img alt="Zoo AI" src="https://img.shields.io/badge/⚡%20Zoo%20Nano--1%20-22C55E" style="display: inline-block; vertical-align: middle;"/>
 </a>
+## Zoo Nano-1 Highlights
+**Zoo Nano-1** is an ultra-lightweight AI model optimized for edge computing and resource-constrained environments. Based on the Qwen3-4B architecture with MLX 4-bit quantization, this model delivers impressive performance while maintaining a minimal memory footprint of just ~700MB.
+### Key Features
 - **Uniquely support of seamless switching between thinking mode** (for complex logical reasoning, math, and coding) and **non-thinking mode** (for efficient, general-purpose dialogue) **within single model**, ensuring optimal performance across various scenarios.
 - **Significantly enhancement in its reasoning capabilities**, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.
 ## Model Overview
+**Zoo Nano-1** has the following technical specifications:
 - Type: Causal Language Models
 - Training Stage: Pretraining & Post-training
 - Number of Parameters: 4.0B
 - Context Length: 32,768 natively and [131,072 tokens with YaRN](#processing-long-texts).
+For more details about Zoo AI models and ecosystem, visit [zoo.ai](https://zoo.ai) and our [GitHub](https://github.com/zoo-ai).
 ## Quickstart
 ```python
 from mlx_lm import load, generate
+model, tokenizer = load("zooai/nano-1")
 prompt = "Hello, please introduce yourself and tell me what you can do."
 if tokenizer.chat_template is not None:
 from mlx_lm import load, generate
+class ZooChatbot:
+    def __init__(self, model_name="zooai/nano-1"):
         self.model, self.tokenizer = load(model_name)
         self.history = []
 # Example Usage
 if __name__ == "__main__":
+    chatbot = ZooChatbot()
     # First input (without /think or /no_think tags, thinking mode is enabled by default)
     user_input_1 = "How many 'r's are in strawberries?"
 ## Agentic Use
+Zoo Nano-1 supports tool calling capabilities for lightweight agent applications. Compatible with [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) for enhanced functionality.
 To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself.
 # Define LLM
 llm_cfg = {
+    "model": "zooai/nano-1",
     # Use the endpoint provided by Alibaba Model Studio:
     # "model_type": "qwen_dashscope",
 ## Processing Long Texts
+Zoo Nano-1 natively supports context lengths of up to 32,768 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques to handle long texts effectively. We have validated the model's performance on context lengths of up to 131,072 tokens using the [YaRN](https://arxiv.org/abs/2309.00071) method.
 YaRN is currently supported by several inference frameworks, e.g., `transformers` and `llama.cpp` for local use, `vllm` and `sglang` for deployment. In general, there are two approaches to enabling YaRN for supported frameworks:
 ### Citation
+If you find Zoo Nano-1 helpful, please cite:
 ```
+@model{zoo2024nano,
+  title={Zoo Nano-1: Ultra-lightweight Language Model},
+  author={Zoo AI Team},
+  year={2024},
+  publisher={Zoo AI},
+  url={https://huggingface.co/zooai/nano-1}
 }
+```
+### About Zoo AI
+Zoo AI is building next-generation AI infrastructure with a focus on efficiency, accessibility, and performance. Our models are designed to run anywhere - from edge devices to enterprise clusters.
+- **Website**: [zoo.ngo](https://zoo.ngo)
+- **HuggingFace**: [huggingface.co/zooai](https://huggingface.co/zooai)
+- **Spaces**: [huggingface.co/spaces/zooai](https://huggingface.co/spaces/zooai)