--- pipeline_tag: text-generation base_model: nvidia/Llama-3.1-8B-UltraLong-1M-Instruct base_model_relation: quantized tags: - chat - 4bit - apple - long-context license: cc-by-nc-4.0 language: - en - fr - es - de - it - hi - ru library_name: mlx --- # Llama 3.1 8B UltraLong 1M Instruct 4-bit MLX MLX version of **Llama 3.1 8B UltraLong 1M Instruct** This model was converted to MLX format from [`nvidia/Llama-3.1-8B-UltraLong-1M-Instruct`](https://huggingface.co/nvidia/Llama-3.1-8B-UltraLong-1M-Instruct) using mlx-lm version **0.22.5**. ## Model Details Maximum context window: 1M tokens For more details, please refer to [arXiv](https://arxiv.org/abs/2504.06214). ## Use with mlx ```bash pip install -U mlx-lm ``` ```bash python -m mlx_lm.generate --model TheCluster/Llama-3.1-8B-UltraLong-1M-Instruct-mlx-4bit --max-tokens 65536 --temperature 0.5 --prompt "Your big prompt" ```