This repository is primarily for the Transformers framework. If you're using other open-source frameworks, please use the alternative repository: MiniMax-Text-01
MiniMax-Text-01
1. Introduction
MiniMax-Text-01 is a powerful language model with 456 billion total parameters, of which 45.9 billion are activated per token. To better unlock the long context capabilities of the model, MiniMax-Text-01 adopts a hybrid architecture that combines Lightning Attention, Softmax Attention and Mixture-of-Experts (MoE). Leveraging advanced parallel strategies and innovative compute-communication overlap methodsโsuch as Linear Attention Sequence Parallelism Plus (LASP+), varlen ring attention, Expert Tensor Parallel (ETP), etc., MiniMax-Text-01's training context length is extended to 1 million tokens, and it can handle a context of up to 4 million tokens during the inference. On various academic benchmarks, MiniMax-Text-01 also demonstrates the performance of a top-tier model.
   
2. Model Architecture
The architecture of MiniMax-Text-01 is briefly described as follows:
- Total Parameters: 456B
- Activated Parameters per Token: 45.9B
- Number Layers: 80
- Hybrid Attention: a softmax attention is positioned after every 7 lightning attention.- Number of attention heads: 64
- Attention head dimension: 128
 
- Mixture of Experts:- Number of experts: 32
- Expert hidden dimension: 9216
- Top-2 routing strategy
 
- Positional Encoding: Rotary Position Embedding (RoPE) applied to half of the attention head dimension with a base frequency of 10,000,000
- Hidden Size: 6144
- Vocab Size: 200,064
3. Evaluation
Core Academic Benchmarks
| Tasks | GPT-4o (11-20) | Claude-3.5-Sonnet (10-22) | Gemini-1.5-Pro (002) | Gemini-2.0-Flash (exp) | Qwen2.5-72B-Inst. | DeepSeek-V3 | Llama-3.1-405B-Inst. | MiniMax-Text-01 | 
|---|---|---|---|---|---|---|---|---|
| General | ||||||||
| MMLU* | 85.7 | 88.3 | 86.8 | 86.5 | 86.1 | 88.5 | 88.6 | 88.5 | 
| MMLU-Pro* | 74.4 | 78.0 | 75.8 | 76.4 | 71.1 | 75.9 | 73.3 | 75.7 | 
| SimpleQA | 39.0 | 28.1 | 23.4 | 26.6 | 10.3 | 24.9 | 23.2 | 23.7 | 
| C-SimpleQA | 64.6 | 56.8 | 59.4 | 63.3 | 52.2 | 64.8 | 54.7 | 67.4 | 
| IFEval (avg) | 84.1 | 90.1 | 89.4 | 88.4 | 87.2 | 87.3 | 86.4 | 89.1 | 
| Arena-Hard | 92.4 | 87.6 | 85.3 | 72.7 | 81.2 | 91.4 | 63.5 | 89.1 | 
| Reasoning | ||||||||
| GPQA* (diamond) | 46.0 | 65.0 | 59.1 | 62.1 | 49.0 | 59.1 | 50.7 | 54.4 | 
| DROP* (F1) | 89.2 | 88.8 | 89.2 | 89.3 | 85.0 | 91.0 | 92.5 | 87.8 | 
| Mathematics | ||||||||
| GSM8k* | 95.6 | 96.9 | 95.2 | 95.4 | 95.8 | 96.7 | 96.7 | 94.8 | 
| MATH* | 76.6 | 74.1 | 84.6 | 83.9 | 81.8 | 84.6 | 73.8 | 77.4 | 
| Coding | ||||||||
| MBPP + | 76.2 | 75.1 | 75.4 | 75.9 | 77.0 | 78.8 | 73.0 | 71.7 | 
| HumanEval | 90.2 | 93.7 | 86.6 | 89.6 | 86.6 | 92.1 | 89.0 | 86.9 | 
* Evaluated following a 0-shot CoT setting.
Long Benchmarks
4M Needle In A Haystack Test
   
Ruler
| Model | 4k | 8k | 16k | 32k | 64k | 128k | 256k | 512k | 1M | 
|---|---|---|---|---|---|---|---|---|---|
| GPT-4o (11-20) | 0.970 | 0.921 | 0.890 | 0.888 | 0.884 | - | - | - | - | 
| Claude-3.5-Sonnet (10-22) | 0.965 | 0.960 | 0.957 | 0.950 | 0.952 | 0.938 | - | - | - | 
| Gemini-1.5-Pro (002) | 0.962 | 0.960 | 0.960 | 0.958 | 0.938 | 0.917 | 0.916 | 0.861 | 0.850 | 
| Gemini-2.0-Flash (exp) | 0.960 | 0.960 | 0.951 | 0.957 | 0.937 | 0.860 | 0.797 | 0.709 | - | 
| MiniMax-Text-01 | 0.963 | 0.961 | 0.953 | 0.954 | 0.943 | 0.947 | 0.945 | 0.928 | 0.910 | 
LongBench v2
| Model | overall | easy | hard | short | medium | long | 
|---|---|---|---|---|---|---|
| Human | 53.7 | 100.0 | 25.1 | 47.2 | 59.1 | 53.7 | 
| w/ CoT | ||||||
| GPT-4o (11-20) | 51.4 | 54.2 | 49.7 | 59.6 | 48.6 | 43.5 | 
| Claude-3.5-Sonnet (10-22) | 46.7 | 55.2 | 41.5 | 53.9 | 41.9 | 44.4 | 
| Deepseek-V3 | - | - | - | - | - | - | 
| Qwen2.5-72B-Inst. | 43.5 | 47.9 | 40.8 | 48.9 | 40.9 | 39.8 | 
| MiniMax-Text-01 | 56.5 | 66.1 | 50.5 | 61.7 | 56.7 | 47.2 | 
| w/o CoT | ||||||
| GPT-4o (11-20) | 50.1 | 57.4 | 45.6 | 53.3 | 52.4 | 40.2 | 
| Claude-3.5-Sonnet (10-22) | 41.0 | 46.9 | 37.3 | 46.1 | 38.6 | 37.0 | 
| Deepseek-V3 | 48.7 | - | - | - | - | - | 
| Qwen2.5-72B-Inst. | 42.1 | 42.7 | 41.8 | 45.6 | 38.1 | 44.4 | 
| MiniMax-Text-01 | 52.9 | 60.9 | 47.9 | 58.9 | 52.6 | 43.5 | 
MTOB
| Context Type | no context | half book | full book | ฮ half book | ฮ full book | 
|---|---|---|---|---|---|
| eng โ kalam (ChrF) | |||||
| GPT-4o (11-20) | 9.90 | 54.30 | - | 44.40 | - | 
| Claude-3.5-Sonnet (10-22) | 20.22 | 53.62 | 55.65 | 33.39 | 35.42 | 
| Gemini-1.5-Pro (002) | 16.79 | 53.68 | 57.90 | 36.89 | 41.11 | 
| Gemini-2.0-Flash (exp) | 12.20 | 49.50 | 53.30 | 37.30 | 41.10 | 
| Qwen-Long | 16.55 | 48.48 | 45.94 | 31.92 | 29.39 | 
| MiniMax-Text-01 | 6.0 | 51.74 | 51.60 | 45.7 | 45.6 | 
| kalam โ eng (BLEURT) | |||||
| GPT-4o (11-20) | 33.20 | 58.30 | - | 25.10 | - | 
| Claude-3.5-Sonnet (10-22) | 31.42 | 59.70 | 62.30 | 28.28 | 30.88 | 
| Gemini-1.5-Pro (002) | 32.02 | 61.52 | 63.09 | 29.50 | 31.07 | 
| Gemini-2.0-Flash (exp) | 33.80 | 57.50 | 57.00 | 23.70 | 23.20 | 
| Qwen-Long | 30.13 | 53.14 | 32.15 | 23.01 | 2.02 | 
| MiniMax-Text-01 | 33.65 | 57.10 | 58.00 | 23.45 | 24.35 | 
4. Quickstart
Here we provide a simple example of loading the tokenizer and model to generate content.
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, QuantoConfig, GenerationConfig
# load hf config
hf_config = AutoConfig.from_pretrained("MiniMaxAI/MiniMax-Text-01-hf")
# quantization config, int8 is recommended
quantization_config =  QuantoConfig(
            weights="int8",
            modules_to_not_convert=[
                "lm_head",
                "embed_tokens",
            ] + [f"model.layers.{i}.coefficient" for i in range(hf_config.num_hidden_layers)]
            + [f"model.layers.{i}.block_sparse_moe.gate" for i in range(hf_config.num_hidden_layers)]
        )
# assume 8 GPUs
world_size = 8
layers_per_device = hf_config.num_hidden_layers // world_size
# set device map
device_map = {
    'model.embed_tokens': 'cuda:0',
    'model.norm': f'cuda:{world_size - 1}',
    'lm_head': f'cuda:{world_size - 1}'
}
for i in range(world_size):
    for j in range(layers_per_device):
        device_map[f'model.layers.{i * layers_per_device + j}'] = f'cuda:{i}'
# load tokenizer
tokenizer = AutoTokenizer.from_pretrained("MiniMaxAI/MiniMax-Text-01-hf")
prompt = "Hello!"
messages = [
    {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant created by MiniMax based on MiniMax-Text-01 model."}]},
    {"role": "user", "content": [{"type": "text", "text": prompt}]},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
# tokenize and move to device
model_inputs = tokenizer(text, return_tensors="pt").to("cuda")
# load bfloat16 model, move to device, and apply quantization
quantized_model = AutoModelForCausalLM.from_pretrained(
    "MiniMaxAI/MiniMax-Text-01-hf",
    torch_dtype="bfloat16",
    device_map=device_map,
    quantization_config=quantization_config,
    offload_buffers=True,
)
# generate response
generation_config = GenerationConfig(
    max_new_tokens=20,
    eos_token_id=200020,
    use_cache=True,
)
generated_ids = quantized_model.generate(**model_inputs, generation_config=generation_config)
print(f"generated_ids: {generated_ids}")
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
5. Deployment Guide
For production deployment, we recommend using vLLM to serve MiniMax-Text-01. vLLM provides excellent performance for serving large language models with the following features:
๐ฅ Outstanding service throughput performance
โก Efficient and intelligent memory management
๐ฆ Powerful batch request processing capability
โ๏ธ Deeply optimized underlying performance  
For detailed deployment instructions, please refer to our vLLM Deployment Guide.
6. Function Calling
MiniMax-Text-01 supports Function Calling capability, enabling the model to intelligently identify when external functions need to be called and output parameters in structured JSON format. With Function Calling, you can:
- Let the model recognize implicit function call needs in user requests
- Receive structured parameter outputs for seamless application integration
- Support various complex parameter types, including nested objects and arrays Function Calling supports standard OpenAI-compatible format definitions and integrates seamlessly with the Transformers library. For detailed usage instructions, please refer to our Function Call Guide or Chinese Guide.
7. Citation
@misc{minimax2025minimax01scalingfoundationmodels,
      title={MiniMax-01: Scaling Foundation Models with Lightning Attention}, 
      author={MiniMax and Aonian Li and Bangwei Gong and Bo Yang and Boji Shan and Chang Liu and Cheng Zhu and Chunhao Zhang and Congchao Guo and Da Chen and Dong Li and Enwei Jiao and Gengxin Li and Guojun Zhang and Haohai Sun and Houze Dong and Jiadai Zhu and Jiaqi Zhuang and Jiayuan Song and Jin Zhu and Jingtao Han and Jingyang Li and Junbin Xie and Junhao Xu and Junjie Yan and Kaishun Zhang and Kecheng Xiao and Kexi Kang and Le Han and Leyang Wang and Lianfei Yu and Liheng Feng and Lin Zheng and Linbo Chai and Long Xing and Meizhi Ju and Mingyuan Chi and Mozhi Zhang and Peikai Huang and Pengcheng Niu and Pengfei Li and Pengyu Zhao and Qi Yang and Qidi Xu and Qiexiang Wang and Qin Wang and Qiuhui Li and Ruitao Leng and Shengmin Shi and Shuqi Yu and Sichen Li and Songquan Zhu and Tao Huang and Tianrun Liang and Weigao Sun and Weixuan Sun and Weiyu Cheng and Wenkai Li and Xiangjun Song and Xiao Su and Xiaodong Han and Xinjie Zhang and Xinzhu Hou and Xu Min and Xun Zou and Xuyang Shen and Yan Gong and Yingjie Zhu and Yipeng Zhou and Yiran Zhong and Yongyi Hu and Yuanxiang Fan and Yue Yu and Yufeng Yang and Yuhao Li and Yunan Huang and Yunji Li and Yunpeng Huang and Yunzhi Xu and Yuxin Mao and Zehan Li and Zekang Li and Zewei Tao and Zewen Ying and Zhaoyang Cong and Zhen Qin and Zhenhua Fan and Zhihang Yu and Zhuo Jiang and Zijia Wu},
      year={2025},
      eprint={2501.08313},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.08313}, 
}
8. Chatbot & API
For general use and evaluation, we provide a Chatbot with online search capabilities and the online API for developers. For general use and evaluation, we provide the MiniMax MCP Server with video generation, image generation, speech synthesis, and voice cloning for developers.
9. Contact Us
Contact us at [email protected].
- Downloads last month
- 10,341
