thejaminator/grpo-feature-vector-step-1

This is a LoRA adapter trained using verl with GRPO (Group Relative Policy Optimization) on math reasoning tasks.

Training Details

Generated from verl LoRA checkpoint: /workspace/verl_outputs_feature_vector/global_step_1/actor/lora_adapter

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

Adapter

(178)

this model