t-tech
/

T-pro-it-2.0

Text Generation

text-generation-inference

Model card Files Files and versions

batalovme commited on Jul 17

Commit

3c90d8b

·

verified ·

1 Parent(s): 2ff3e54

Update README.md

Files changed (1) hide show

README.md +46 -2

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ Instruction Pre-Training:
 40B tokens of instruction data, with one-third focused on reasoning tasks.
 Supervised Fine-Tuning (SFT):
-~500K high-quality and diverse instructions with balanced complexity. Reasoning tasks make up 10% of the dataset.
 Preference Tuning:
 ~100K carefully selected instructions, filtered by length and type for general tasks and with domain-balanced selection for reasoning tasks.
@@ -240,4 +240,48 @@ outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampli
 generated_text = [output.outputs[0].text for output in outputs]
 print(generated_text)
-```

 40B tokens of instruction data, with one-third focused on reasoning tasks.
 Supervised Fine-Tuning (SFT):
+~500K high-quality and diverse instructions with balanced complexity. Reasoning tasks make up about 20% of the dataset.
 Preference Tuning:
 ~100K carefully selected instructions, filtered by length and type for general tasks and with domain-balanced selection for reasoning tasks.
 generated_text = [output.outputs[0].text for output in outputs]
 print(generated_text)
+```
+## SGLang Usage
+To run an inference server for **T-pro IT 2.0**, start by launching the SGLang server:
+```bash
+python -m sglang.launch_server \
+    --model-path t-tech/T-pro-it-2.0 \
+    --reasoning-parser qwen3
+````
+Once the server is up and listening on `localhost:30000`, you can send chat-based requests via the OpenAI Python client.
+```python
+import openai
+client = openai.OpenAI(
+    base_url="http://127.0.0.1:30000/v1",
+    api_key="ANY"  # the server ignores the API key
+)
+prompt = (
+    "Пожалуйста, вычисли определённый интеграл ∫_0^1 x² eˣ dx, "
+    "пошагово объясни решение и укажи окончательный результат."
+)
+completion = client.chat.completions.create(
+    model="ANY",  # the server ignores the model name
+    messages=[
+        {"role": "system", "content": "Ты T-pro, виртуальный ассистент в Т-Технологии. Твоя задача - быть полезным диалоговым ассистентом."},
+        {"role": "user", "content": prompt}
+    ],
+    # REQUIRED: sampling params from the "Recommended Generation Parameters" table
+    temperature=0.6,
+    presence_penalty=1.0,
+)
+# The generated reply is in `completion.choices[0].message.content`
+print(completion.choices[0].message.content)
+```
+**Note:** It is **obligatory** to include both `temperature` and `presence_penalty` in every completion call.