--- license: gemma base_model: google/gemma-3-270m-it tags: - quantized - gguf - llama.cpp - gemma - text-generation - q4_k_m - edge-deployment - mobile-app library_name: gguf pipeline_tag: text-generation language: - en model_type: gemma --- # Gemma 3 270M Instruction-Tuned - Q4_K_M Quantized (GGUF) ## Model Description This is a quantized version of Google's Gemma 3 270M instruction-tuned model, optimized for efficient inference on consumer hardware and mobile applications. The model has been converted to GGUF format and quantized using Q4_K_M quantization through llama.cpp, making it perfect for resource-constrained environments. ## Model Details - **Base Model**: [google/gemma-3-270m-it](https://huggingface.co/google/gemma-3-270m-it) - **Model Type**: Large Language Model (LLM) - **Quantization**: Q4_K_M - **Format**: GGUF - **File Size**: 253MB - **Precision**: 4-bit quantized weights with mixed precision - **Framework**: Compatible with llama.cpp, Ollama, and other GGUF-compatible inference engines ## Quantization Details - **Method**: Q4_K_M quantization via llama.cpp - **Benefits**: Significantly reduced memory footprint while maintaining model quality - **Use Case**: Optimized for edge deployment, mobile applications, and resource-constrained environments - **Performance**: Maintains competitive performance compared to the original Gemma 3 instruction-tuned model ## Real-World Application This model is actively used in a production mobile application available on app stores. The app demonstrates the practical viability of running quantized LLMs on mobile devices while maintaining user privacy through on-device inference. The implementation showcases: - **On-device AI**: No data sent to external servers - **Fast inference**: Optimized for mobile hardware - **Efficient memory usage**: Runs smoothly on consumer devices - **App Store compliance**: Meets all platform requirements including Gemma licensing terms ## Usage ### With llama.cpp ```bash # Download the model wget https://huggingface.co/Durlabh/gemma-270m-q4-k-m-gguf/resolve/main/gemma-270m-q4-k-m.gguf # Run inference ./main -m gemma-270m-q4-k-m.gguf -p "Your prompt here" ``` ### With Ollama ```bash # Create Modelfile echo "FROM ./gemma-270m-q4-k-m.gguf" > Modelfile # Create and run ollama create gemma-270m-q4 -f Modelfile ollama run gemma-270m-q4 ``` ### With Python (llama-cpp-python) ```python from llama_cpp import Llama # Load model llm = Llama(model_path="gemma-270m-q4-k-m.gguf") # Generate text output = llm("Your prompt here", max_tokens=100) print(output['choices'][0]['text']) ``` ### Mobile Integration For mobile app development, this model can be integrated using: - **iOS**: llama.cpp with Swift bindings - **Android**: JNI wrappers or TensorFlow Lite conversion - **React Native**: Native modules with llama.cpp - **Flutter**: Platform channels with native implementations ## System Requirements - **RAM**: Minimum 1GB, Recommended 2GB+ - **Storage**: 300MB for model file - **CPU**: Modern x86_64 or ARM64 processor - **Mobile**: iOS 12+ / Android API 21+ - **OS**: Windows, macOS, Linux ## Performance Metrics | Metric | Original F16 | Q4_K_M | Improvement | |--------|-------------|---------|-------------| | Size | ~540MB | 253MB | 53% reduction | | RAM Usage | ~1GB | ~400MB | 60% reduction | | Inference Speed | Baseline | ~2x faster | 2x speedup | | Mobile Performance | Too large | Excellent | ✅ Mobile ready | *Performance tested on various devices including mobile hardware* ## License and Usage **Important**: This model is a derivative of Google's Gemma and is subject to the original licensing terms. **Gemma is provided under and subject to the [Gemma Terms of Use](https://ai.google.dev/gemma/terms).** ### Key Points: - ✅ **Commercial use permitted** under the Gemma license - ✅ **Mobile app deployment allowed** with proper attribution - ⚠️ **Must comply** with the [Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy) - 📄 **App store compliance**: Licensing terms disclosed in app store listings - 🔄 **Redistribution**: Must include proper attribution and license terms ### Usage Restrictions As per the Gemma Terms of Use, this model cannot be used for: - Illegal activities - Child safety violations - Generation of hateful, harassing, or violent content - Generation of false or misleading information - Privacy violations See the full [Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy) for complete details. ## Mobile App Compliance This model is used in compliance with: - **Gemma Terms of Use**: Full licensing terms disclosed - **App Store Guidelines**: Platform requirements met - **Privacy Standards**: On-device processing, no data collection - **Performance Standards**: Optimized for mobile hardware ## Limitations - Quantization may result in slight quality degradation compared to the original Gemma 3 instruction-tuned model - Performance characteristics may vary across different hardware platforms - Subject to the same content limitations as the base Gemma 3 instruction-tuned model - Context length and capabilities inherited from base Gemma 3 270M instruction-tuned model - Mobile performance depends on device specifications ## Technical Specifications - **Original Parameters**: 270M - **Quantization Scheme**: Q4_K_M (4-bit weights, mixed precision for critical layers) - **Context Length**: 32,768 tokens (inherited from Gemma 3 270M) - **Vocabulary Size**: 256,000 tokens - **Architecture**: Transformer decoder - **Attention Heads**: 8 - **Hidden Layers**: 18 ## Download Options ### Direct Download ```bash # Using wget wget https://huggingface.co/Durlabh/gemma-270m-q4-k-m-gguf/resolve/main/gemma-270m-q4-k-m.gguf # Using curl curl -L -o gemma-270m-q4-k-m.gguf https://huggingface.co/Durlabh/gemma-270m-q4-k-m-gguf/resolve/main/gemma-270m-q4-k-m.gguf ``` ### Programmatic Download ```python # Using huggingface-hub from huggingface_hub import hf_hub_download model_path = hf_hub_download( repo_id="Durlabh/gemma-270m-q4-k-m-gguf", filename="gemma-270m-q4-k-m.gguf" ) ``` ## Citation If you use this model, please cite both the original Gemma work and acknowledge the quantization: ```bibtex @misc{durlabh-gemma-270m-q4-k-m, title={Gemma 3 270M Instruction-Tuned Q4_K_M Quantized}, author={Durlabh}, year={2025}, note={Quantized version of Google's Gemma 3 270M instruction-tuned model using llama.cpp Q4_K_M}, url={https://huggingface.co/Durlabh/gemma-270m-q4-k-m-gguf} } ``` Original Gemma 3 paper: ```bibtex @misc{gemma3_2025, title={Gemma 3: Google's new open model based on Gemini 2.0}, author={Gemma Team}, year={2025}, publisher={Google}, url={https://blog.google/technology/developers/gemma-3/} } ``` ## Community & Support - **Issues**: Report problems or questions in the repository discussions - **Mobile Development**: See model usage in production mobile applications - **Quantization**: Built with llama.cpp for optimal performance ## Acknowledgments - **Google DeepMind team** for the original Gemma model - **llama.cpp community** for the quantization tools and GGUF format - **Hugging Face** for hosting infrastructure - **Georgi Gerganov** for creating and maintaining llama.cpp - **Mobile AI community** for advancing on-device inference ## Disclaimer This is an unofficial quantized version of Gemma 3 created for practical mobile deployment. For official Gemma models, please visit [Google's official Gemma page](https://ai.google.dev/gemma). The mobile application using this model fully complies with platform guidelines and Gemma licensing requirements. --- **Ready for production use!** This model powers real-world mobile applications while maintaining full compliance with licensing terms.