Post
2371
Online training methods (e.g., GRPO) require real-time generation, a compute- and memory-heavy bottleneck.
TRL has built-in vLLM support and in this new recipe, we show how to leverage it for efficient online training. Run on Colab ⚡, scale to multi-GPU/multi-node!
🧑🍳 recipe: https://huggingface.co/learn/cookbook/grpo_vllm_online_training
TRL has built-in vLLM support and in this new recipe, we show how to leverage it for efficient online training. Run on Colab ⚡, scale to multi-GPU/multi-node!
🧑🍳 recipe: https://huggingface.co/learn/cookbook/grpo_vllm_online_training