gpt-oss:20b repeatedly misclassifies weekdays and doubts authoritative tool results

#197

by MarkWard0110 - opened 9 days ago

9 days ago

I’m encountering a recurring issue with the gpt-oss:20b model when used in an agent setting with tools: the model frequently asserts incorrect weekdays for specific dates, and more importantly, actively doubts or overrides tool responses that provide correct weekday information.

This behavior persists even when the system prompt explicitly instructs the model not to infer weekdays and to treat tool output as authoritative. A model that confidently mislabels weekdays — and resists correction by tools — can silently corrupt calendar data. This is especially problematic in agent workflows where tools are intended to serve as the source of truth.

For example, a calendar event having an until timestamp: 20260211T153000, the model reasoned 2026-02-11 is Sunday.
This is incorrect.
2026-02-11 is a Wednesday.
GPT-OS didn't try to compute the weekday; in its thoughts, it asserted that it was a Sunday. The model is instructed not to do this and to use tools.

This does not happen all of the time. In many cases, the model behaves correctly and accepts tool output as authoritative. However, I’ve encountered this behavior often enough — where the model confidently asserts an incorrect weekday and then questions the tool’s result — that I’m increasingly doubtful this is a prompt issue.

At this point, it feels more likely that the model has a strong internal bias or prior around calendar facts that occasionally override tool trust, even when explicitly instructed not to do so.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment