Q0G Fail/Pass?

by McG-221 - opened 1 day ago

Discussion

McG-221

1 day ago

•

edited 1 day ago

Happy new year, Naphula! 🎉
What does a Q0G pass or fail signify? Very curious to see how this table develops 🙌

Naphula

Owner about 23 hours ago

Happy new years!

Q0G tests a model's response toward logic puzzles. Ones that fail are more heavily programmed with pre-defined biases and linguistic patterns, while the ones that pass are more objectively neutral and Socratic in their reasoning. Probably a meaningless metric for RPers. What's interesting to me though is how no base models in existence pass Q0G, only the finetunes. Says a lot about the pretraining data.

Since ablations are easier now I'd rather test simple Q0 prompts than Compliance benchmarks. It takes less time and although not a thorough test suite, gives a vague impression of model's quality. The reason some finetunes score much lower than others is because the benchmark is mostly oriented towards Instruction Following of Uncensored Prompts. There's probably more accurate benchmarks but this is done manually and mainly just a point of reference for me to test new Mistral models and merges.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment