Update README.md
Browse files
README.md
CHANGED
|
@@ -19,7 +19,7 @@ Instruction Pre-Training:
|
|
| 19 |
40B tokens of instruction data, with one-third focused on reasoning tasks.
|
| 20 |
|
| 21 |
Supervised Fine-Tuning (SFT):
|
| 22 |
-
~500K high-quality and diverse instructions with balanced complexity. Reasoning tasks make up
|
| 23 |
|
| 24 |
Preference Tuning:
|
| 25 |
~100K carefully selected instructions, filtered by length and type for general tasks and with domain-balanced selection for reasoning tasks.
|
|
@@ -240,4 +240,48 @@ outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampli
|
|
| 240 |
|
| 241 |
generated_text = [output.outputs[0].text for output in outputs]
|
| 242 |
print(generated_text)
|
| 243 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
40B tokens of instruction data, with one-third focused on reasoning tasks.
|
| 20 |
|
| 21 |
Supervised Fine-Tuning (SFT):
|
| 22 |
+
~500K high-quality and diverse instructions with balanced complexity. Reasoning tasks make up about 20% of the dataset.
|
| 23 |
|
| 24 |
Preference Tuning:
|
| 25 |
~100K carefully selected instructions, filtered by length and type for general tasks and with domain-balanced selection for reasoning tasks.
|
|
|
|
| 240 |
|
| 241 |
generated_text = [output.outputs[0].text for output in outputs]
|
| 242 |
print(generated_text)
|
| 243 |
+
```
|
| 244 |
+
|
| 245 |
+
|
| 246 |
+
|
| 247 |
+
## SGLang Usage
|
| 248 |
+
|
| 249 |
+
To run an inference server for **T-pro IT 2.0**, start by launching the SGLang server:
|
| 250 |
+
|
| 251 |
+
```bash
|
| 252 |
+
python -m sglang.launch_server \
|
| 253 |
+
--model-path t-tech/T-pro-it-2.0 \
|
| 254 |
+
--reasoning-parser qwen3
|
| 255 |
+
````
|
| 256 |
+
|
| 257 |
+
Once the server is up and listening on `localhost:30000`, you can send chat-based requests via the OpenAI Python client.
|
| 258 |
+
|
| 259 |
+
```python
|
| 260 |
+
import openai
|
| 261 |
+
|
| 262 |
+
client = openai.OpenAI(
|
| 263 |
+
base_url="http://127.0.0.1:30000/v1",
|
| 264 |
+
api_key="ANY" # the server ignores the API key
|
| 265 |
+
)
|
| 266 |
+
|
| 267 |
+
prompt = (
|
| 268 |
+
"Пожалуйста, вычисли определённый интеграл ∫_0^1 x² eˣ dx, "
|
| 269 |
+
"пошагово объясни решение и укажи окончательный результат."
|
| 270 |
+
)
|
| 271 |
+
|
| 272 |
+
completion = client.chat.completions.create(
|
| 273 |
+
model="ANY", # the server ignores the model name
|
| 274 |
+
messages=[
|
| 275 |
+
{"role": "system", "content": "Ты T-pro, виртуальный ассистент в Т-Технологии. Твоя задача - быть полезным диалоговым ассистентом."},
|
| 276 |
+
{"role": "user", "content": prompt}
|
| 277 |
+
],
|
| 278 |
+
# REQUIRED: sampling params from the "Recommended Generation Parameters" table
|
| 279 |
+
temperature=0.6,
|
| 280 |
+
presence_penalty=1.0,
|
| 281 |
+
)
|
| 282 |
+
|
| 283 |
+
# The generated reply is in `completion.choices[0].message.content`
|
| 284 |
+
print(completion.choices[0].message.content)
|
| 285 |
+
```
|
| 286 |
+
|
| 287 |
+
**Note:** It is **obligatory** to include both `temperature` and `presence_penalty` in every completion call.
|