🍎 functiongemma-270m-it-4bit-mlx

google/functiongemma-270m-it converted to MLX format

QuantLLM Format Quantization

⭐ Star QuantLLM on GitHub


📖 About This Model

This model is google/functiongemma-270m-it converted to MLX format optimized for Apple Silicon (M1/M2/M3/M4) Macs with native acceleration.

Property Value
Base Model google/functiongemma-270m-it
Format MLX
Quantization Q4_K_M
License apache-2.0
Created With QuantLLM

🚀 Quick Start

Generate Text with mlx-lm

from mlx_lm import load, generate

# Load the model
model, tokenizer = load("QuantLLM/functiongemma-270m-it-4bit-mlx")

# Simple generation
prompt = "Explain quantum computing in simple terms"
messages = [{"role": "user", "content": prompt}]
prompt_formatted = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True
)

# Generate response
text = generate(model, tokenizer, prompt=prompt_formatted, verbose=True)
print(text)

Streaming Generation

from mlx_lm import load, stream_generate

model, tokenizer = load("QuantLLM/functiongemma-270m-it-4bit-mlx")

prompt = "Write a haiku about coding"
messages = [{"role": "user", "content": prompt}]
prompt_formatted = tokenizer.apply_chat_template(
    messages, 
    add_generation_prompt=True
)

# Stream tokens as they're generated
for token in stream_generate(model, tokenizer, prompt=prompt_formatted, max_tokens=200):
    print(token, end="", flush=True)

Command Line Interface

# Install mlx-lm
pip install mlx-lm

# Generate text
python -m mlx_lm.generate --model QuantLLM/functiongemma-270m-it-4bit-mlx --prompt "Hello!"

# Interactive chat
python -m mlx_lm.chat --model QuantLLM/functiongemma-270m-it-4bit-mlx

System Requirements

Requirement Minimum
Chip Apple Silicon (M1/M2/M3/M4)
macOS 13.0 (Ventura) or later
Python 3.10+
RAM 8GB+ (16GB recommended)
# Install dependencies
pip install mlx-lm

📊 Model Details

Property Value
Original Model google/functiongemma-270m-it
Format MLX
Quantization Q4_K_M
License apache-2.0
Export Date 2025-12-21
Exported By QuantLLM v2.0

🚀 Created with QuantLLM

QuantLLM

Convert any model to GGUF, ONNX, or MLX in one line!

from quantllm import turbo

# Load any HuggingFace model
model = turbo("google/functiongemma-270m-it")

# Export to any format
model.export("mlx", quantization="Q4_K_M")

# Push to HuggingFace
model.push("your-repo", format="mlx")
GitHub Stars

📚 Documentation · 🐛 Report Issue · 💡 Request Feature

Downloads last month
55
Safetensors
Model size
0.3B params
Tensor type
F32
·
F16
·
I8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for QuantLLM/functiongemma-270m-it-4bit-mlx

Quantized
(25)
this model