RLinf: Reinforcement Learning Infrastructure for Agentic AI

RLinf is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.

Model Description

This LoRA model is trained on Maniskill setting by SFT with base model Haozhan72/Openvla-oft-SFT-libero10-trajall. Combining this and the base model, you can get a reasonable starting checkpoint for the OpenVLA-OFT model on the ManiSkill PutOnPlateInScene25Main task defined in RL4VLA and RLinf.

How to Use

Please integrate the provided model with the RLinf codebase and the model RLinf/Openvla-oft-SFT-libero10-trajall. To do so, modify the following parameters in the configuration file examples/embodiment/config/maniskill_ppo_openvlaoft.yaml:

Set actor.checkpoint_load_path, actor.tokenizer.tokenizer_model, and rollout.model_dir to the path of the RLinf/Openvla-oft-SFT-libero10-trajall checkpoint.
Set actor.model.lora_path to the path of this LoRA checkpoint and set actor.model.is_lora as True.

Note: Currently we only support LoRA training with this checkpoint. If you do not want to use LoRA, please combine the base model and the lora checkpoint by the merge_and_unload function in the peft package.

License

This code repository and the model weights are licensed under the MIT License.

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

success rate on put-on-plate-in-scene-25-main
self-reported

null

View on Papers With Code