BigCodeArena: Judging code generations end to end with code executions
•
16
None defined yet.
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Compare two AI models by sending them code and seeing their responses
Explore and analyze code completion benchmarks
Compare two AI models by sending them code and seeing their responses
Explore and analyze code completion benchmarks