thejaminator/grpo-feature-vector-step-1
This is a LoRA adapter trained using verl with GRPO (Group Relative Policy Optimization) on math reasoning tasks.
Training Details
- Base model: google/gemma-2-9b-it
- Framework: verl GRPO
- Training steps: 1
- Dataset: Math reasoning problems
- Batch size: 8
- Learning rate: 5e-05
- LoRA rank: 64
- LoRA alpha: 128.0
- Number of generations: 16
Generated from verl LoRA checkpoint: /workspace/verl_outputs_feature_vector/global_step_1/actor/lora_adapter
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support