π€ Hugging Face | π€ ModelScope | π Experience Now
Introduction
Ling-1T is the first flagship non-thinking model in the Ling 2.0 series, featuring 1 trillion total parameters with β 50 billion active parameters per token. Built on the Ling 2.0 architecture, Ling-1T is designed to push the limits of efficient reasoning and scalable cognition.
Pre-trained on 20 trillion+ high-quality, reasoning-dense tokens, Ling-1T-base supports up to 128K context length and adopts an evolutionary chain-of-thought (Evo-CoT) process across mid-training and post-training. This curriculum greatly enhances the modelβs efficiency and reasoning depth, allowing Ling-1T to achieve state-of-the-art performance on multiple complex reasoning benchmarksβbalancing accuracy and efficiency.
Flagship-Level Efficient Reasoning
We comprehensively evaluated Ling-1T against leading flagship models, including both open-source giants (e.g., DeepSeek-V3.1-Terminus, Kimi-K2-Instruct-0905) and closed-source APIs (GPT-5-main, Gemini-2.5-Pro). Across code generation, software development, competition-level mathematics, professional math, and logical reasoning, Ling-1T consistently demonstrates superior complex reasoning ability and overall advantage.
In the AIME 25 benchmark, Ling-1T extends the Pareto frontier of reasoning accuracy vs. reasoning length, showcasing its strength in βefficient thinking and precise reasoning.β
Aesthetic Understanding and Front-End Generation
Ling-1T excels in visual reasoning and front-end code generation tasks, combining deep semantic understanding with precise code synthesis. We introduce a hybrid SyntaxβFunctionβAesthetics reward mechanism, enabling the model to not only generate correct and functional code but also demonstrate a refined sense of visual aesthetics. On ArtifactsBench, Ling-1T ranks first among open-source models, and the benchmark visualizations in this card were, in fact, generated by Ling-1T itself.
Emergent Intelligence at Trillion-Scale
Scaling to the trillion-parameter level has revealed strong emergent reasoning and transfer capabilities. For example, in the BFCL V3 tool-use benchmark, Ling-1T achieves β 70% tool-call accuracy with only light instruction tuningβdespite having seen no large-scale trajectory data during training. Ling-1T can:
- Interpret complex natural-language instructions
- Transform abstract logic into functional visual components
- Generate cross-platform compatible front-end code
- Create stylistically controlled marketing copy and multi-lingual text
These capabilities form the foundation for general, collaborative humanβAI intelligence, which we aim to advance together with the open-source community through Ling-1Tβs release.
Pre-Training at Trillion Scale
The Ling 2.0 architecture was designed from the ground up for trillion-scale efficiency, guided by the Ling Scaling Law (arXiv:2507.17702). This ensures architectural and hyperparameter scalability even under 1e25β1e26 FLOPs of compute.
Key architectural innovations include:
- 1T total / 50B active parameters with a 1/32 MoE activation ratio
- MTP layers for enhanced compositional reasoning
- Aux-loss-free, sigmoid-scoring expert routing with zero-mean updates
- QK Normalization for fully stable convergence
Ling-1T is the largest FP8-trained foundation model known to date. FP8 mixed-precision training yields 15%+ end-to-end speedup, improved memory efficiency, and maintains β€ 0.1% loss deviation from BF16 across 1T tokens. A fine-grained, heterogeneous 1F1B interleaved pipeline further boosts utilization by 40 %+. System-level optimizationsβfused kernels, communication scheduling, recomputation, checkpointing, simulation, and telemetryβensure stable trillion-scale training.
Pre-training used over 20T high-quality tokens, with > 40% reasoning-dense data in later stages. Mid-training introduced curated chain-of-thought corpora for βreasoning pre-activationβ, improving downstream reasoning stability. A custom WSM (WarmupβStableβMerge) LR schedulerοΌarXiv:2507.17634οΌ with mid-train checkpoint merging simulates LR decay and boosts generalization.
Post-Training and Evo-CoT Optimization
Built upon mid-training reasoning activation, post-training adopts Evo-CoT (Evolutionary Chain-of-Thought) for progressive reasoning enhancement under controllable cost. This approach continually expands the Pareto frontier of reasoning accuracy vs. efficiencyβideal for reflexive non-thinking models.
For reinforcement learning, we introduce LPO (Linguistics-Unit Policy Optimization) βa novel sentence-level policy optimization method. Unlike GRPO (token-level) or GSPO (sequence-level) algorithms, LPO treats sentences as the natural semantic action units, enabling precise alignment between rewards and reasoning behavior. Empirically, LPO offers superior training stability and generalization across reasoning tasks.
Evaluation
Ling-1T has been extensively evaluated across knowledge, code, math, reasoning, agent, and alignment benchmarks. It currently stands as the best open-source flagship non-thinking model, rivaling closed-source APIs in complex reasoning while maintaining exceptional efficiency and interpretability.
Model Downloads
You can download Ling-1T from the following table. If you are located in mainland China, we also provide the model on ModelScope.cn to speed up the download process.
Model | Context Length | Download |
---|---|---|
Ling-1T | 32K -> 128K (YaRN) | π€ HuggingFace π€ ModelScope |
Note: If you are interested in previous version, please visit the past model collections in Huggingface or ModelScope.
Quickstart
π Try Online
You can experience Ling-1T online at: ZenMux
π API Usage
You can also use Ling-1T through API calls:
from openai import OpenAI
# 1. Initialize the OpenAI client
client = OpenAI(
# 2. Point the base URL to the ZenMux endpoint
base_url="https://zenmux.ai/api/v1",
# 3. Replace with the API Key from your ZenMux user console
api_key="<your ZENMUX_API_KEY>",
)
# 4. Make a request
completion = client.chat.completions.create(
# 5. Specify the model to use in the format "provider/model-name"
model="inclusionai/ling-1t",
messages=[
{
"role": "user",
"content": "What is the meaning of life?"
}
]
)
print(completion.choices[0].message.content)
Deployment
SGLang
Environment Preparation
We will later submit our model to the SGLang official release. Now we can prepare the environment by following these steps:
pip3 install -U sglang sgl-kernel
Run Inference
Both BF16 and FP8 models are supported by SGLang now. It depends on the dtype of the model in ${MODEL_PATH}.
Here is the example to run Ling-1T with multiple GPU nodes, where the master node IP is ${MASTER_IP} and server port is ${PORT}:
- Start server:
# Node 0:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 0
# Node 1:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 1
# Node 2:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 2
# Node 3:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 3
# This is only an example. Please adjust arguments according to your actual environment.
- Client:
curl -s http://${MASTER_IP}:${PORT}/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'
More usage can be found here
vLLM
Environment Preparation
pip install vllm==0.11.0
Run Inference:
Here is the example to deploy the model with multiple GPU nodes, where the master node IP is ${MASTER_IP}, server port is ${PORT} and the path of model is ${MODEL_PATH}:
# step 1. start ray on all nodes
# step 2. start vllm server only on node 0:
vllm serve $MODEL_PATH --port $PORT --served-model-name my_model --trust-remote-code --tensor-parallel-size 8 --pipeline-parallel-size 4 --gpu-memory-utilization 0.85
# This is only an example, please adjust arguments according to your actual environment.
To handle long context in vLLM using YaRN, we need to follow these two steps:
- Add a
rope_scaling
field to the model'sconfig.json
file, for example:
{
...,
"rope_scaling": {
"factor": 4.0,
"original_max_position_embeddings": 32768,
"type": "yarn"
}
}
- Use an additional parameter
--max-model-len
to specify the desired maximum context length when starting the vLLM service.
For detailed guidance, please refer to the vLLM instructions
.
Limitations & Future Plans
While Ling-1T has made strong progress in efficient reasoning, cross-domain generalization, and training efficiency, several limitations remain:
- GQA-based attention: stable for long-context reasoning but relatively costly. Future versions will adopt hybrid attention to improve efficiency.
- Limited agentic ability: current model has room to grow in multi-turn interaction, long-term memory, and tool use.
- Instruction and identity issues: occasional deviations or role confusion may occur; future updates will enhance alignment and consistency.
The future versions of Ling-1T will continue to evolve in architecture, reasoning, and alignment, advancing the series toward more general intelligence.
License
This code repository is licensed under the MIT License.
FAQ
Recommended temperature? 0.7
Recommended top_p? 0.95
- Downloads last month
- 3,721