Spaces:

NurseCitizenDeveloper
/

ewaast-demo

Sleeping

Nursing Citizen Development commited on 2 days ago

Commit

9af128b

1 Parent(s): 99d19a9

Fix: Use greedy decoding to avoid CUDA NaN probability error

Files changed (1) hide show

medgemma_client.py CHANGED Viewed

@@ -139,8 +139,7 @@ def _local_inference(messages: list, max_tokens: int = 2048) -> str:
         outputs = model.generate(
             **inputs,
             max_new_tokens=max_tokens,
-            do_sample=True,
-            temperature=0.1
         )
     # Decode only the new tokens (skip input)

         outputs = model.generate(
             **inputs,
             max_new_tokens=max_tokens,
+            do_sample=False  # Greedy decoding - more stable with quantized models
         )
     # Decode only the new tokens (skip input)