Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
khazarai
's Collections
Benchmarks
CoT
Az-Language
GRPO
Text-to-Speech Models
RLHF
SFT
GRPO
updated
12 days ago
Group Relative Policy Optimization
Upvote
1
khazarai/HeisenbergQ-0.5B-RL
Text Generation
•
Updated
Sep 25
•
30
•
1
khazarai/Math-RL
Text Generation
•
Updated
Sep 25
•
31
•
1
Upvote
1
Share collection
View history
Collection guide
Browse collections