docs: Updated the `Transformers` example to use intended temp=0.15

This PR adds the required hparam arguments to enable stochastic sampling (temp 0.15) for the `transformers` snippet rather than greedy decoding.
This reflects the recommended usage pattern intended by Mistral.

fix: https://huggingface.co/mistralai/Devstral-2-123B-Instruct-2512/discussions/9

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -487,6 +487,8 @@ input_ids = tokenized["input_ids"].to(device="cuda")
 output = model.generate(
     input_ids,
     max_new_tokens=200,
 )[0]
 decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]) :])

 output = model.generate(
     input_ids,
     max_new_tokens=200,
+    do_sample=True,
+    temperature=0.15,
 )[0]
 decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]) :])