thejaminator/grpo-feature-vector-step-1

This is a LoRA adapter trained using verl with GRPO (Group Relative Policy Optimization) on math reasoning tasks.

Training Details

  • Base model: google/gemma-2-9b-it
  • Framework: verl GRPO
  • Training steps: 1
  • Dataset: Math reasoning problems
  • Batch size: 8
  • Learning rate: 5e-05
  • LoRA rank: 64
  • LoRA alpha: 128.0
  • Number of generations: 16

Generated from verl LoRA checkpoint: /workspace/verl_outputs_feature_vector/global_step_1/actor/lora_adapter

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for thejaminator/grpo-feature-vector-step-1

Base model

google/gemma-2-9b
Adapter
(178)
this model