Prometheus open source?

#2
by Trilogix1 - opened

Hi, I tried to check your Prometheus linking to github 404. I would like to try you method.
Does the repo exist, if so can i have the link?

Hello, as i see prometheus looks like heretic. But i have a quest .. u wrote Qwen3.5-9B 2/200 (1%) 0.0105 50 , 50 trials, i did 2000 and nothing i dont have something like ur digits :\ i mean this Β»
[Trial 167] Refusals: 59/100, KL divergence: 0.2764
[Trial 63] Refusals: 60/100, KL divergence: 0.2433
[Trial 133] Refusals: 61/100, KL divergence: 0.1853
[Trial 144] Refusals: 65/100, KL divergence: 0.1836
[Trial 135] Refusals: 67/100, KL divergence: 0.1631
[Trial 145] Refusals: 69/100, KL divergence: 0.1431
[Trial 142] Refusals: 73/100, KL divergence: 0.1249
[Trial 80] Refusals: 74/100, KL divergence: 0.0940
[Trial 138] Refusals: 75/100, KL divergence: 0.0868
[Trial 68] Refusals: 78/100, KL divergence: 0.0838
[Trial 114] Refusals: 79/100, KL divergence: 0.0565
[Trial 115] Refusals: 81/100, KL divergence: 0.0403
[Trial 177] Refusals: 89/100, KL divergence: 0.0313
[Trial 187] Refusals: 92/100, KL divergence: 0.0255
[Trial 189] Refusals: 96/100, KL divergence: 0.0165
[Trial 13] Refusals: 100/100, KL divergence: 0.0005

Hey! There are two things that could explain why your results look so different.

First, make sure you start Prometheus with the LLM judge enabled. Without it, refusal detection falls back to keyword matching, which is essentially unreliable β€” it can flag a response as a refusal just because it contains words like "sorry" or "cannot", even when the model actually answered the question. With an LLM judge, each response gets semantically evaluated, so the refusal rate and KL divergence numbers become actually meaningful.

Second, dataset quality matters a lot. The eval dataset needs to contain genuinely adversarial / sensitive prompts that would actually trigger refusals in an unabliterated model β€” if your dataset is too mild or generic, you won't see a clean Pareto front because the model barely refuses anything to begin with, making the optimization signal very noisy. Make sure you're using a well-curated refusal benchmark dataset, not a general-purpose one.

Re-run with both of these in place and your numbers should be much closer to what's in the writeup.

I just uploaded the dataset that I used https://huggingface.co/datasets/wangzhang/prometheus-datasets.

Great, thanks. I will try, hope this one will not affect accuracy as much as the others.
Edit: It works alla grande :)

Thank you again. you are already cited, use doi for better citation.

Trilogix1 changed discussion status to closed

Sign up or log in to comment