cleanrl

non-profit

https://github.com/vwxyzjn/cleanrl

vwxyzjn

vwxyzjn

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

sdpkjc authored a paper about 2 months ago

ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents

sdpkjc authored a paper about 2 months ago

SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

sdpkjc authored a paper about 2 months ago

CAMEL: Continuous Action Masking Enabled by Large Language Models for Reinforcement Learning

View all activity

cleanrl 's models 1,217

cleanrl/EleutherAI_pythia-6.9b-dedupedppotldr

Text Generation • 7B • Updated May 30, 2024 • 2

cleanrl/EleutherAI_pythia-2.8b-dedupedppotldr

Text Generation • 3B • Updated May 30, 2024

cleanrl/EleutherAI_pythia-1b-dedupedppotldr

Text Generation • 1B • Updated May 30, 2024 • 2

cleanrl/EleutherAI_pythia-2.8b-dedupedrewardtldr

Text Classification • Updated May 15, 2024 • 2

cleanrl/EleutherAI_pythia-1b-dedupedrewardtldr

Text Classification • Updated May 15, 2024 • 997

cleanrl/EleutherAI_pythia-1b-dedupedsfttldr

Text Generation • Updated May 15, 2024 • 1.21k

cleanrl/EleutherAI_pythia-2.8b-dedupedsfttldr

Text Generation • Updated May 15, 2024 • 1

cleanrl/EleutherAI_pythia-6.9b-dedupedsfttldr

Text Generation • Updated May 15, 2024 • 347

cleanrl/EleutherAI_pythia-6.9b-dedupedrewardtldr

Text Classification • Updated May 7, 2024 • 181

cleanrl/ppo_zephyr310

Text Generation • 7B • Updated May 1, 2024

cleanrl/BeamRiderNoFrameskip-v4-dqn_atari-seed1

Reinforcement Learning • Updated Apr 16, 2024

cleanrl/PongNoFrameskip-v4-dqn_atari-seed1

Reinforcement Learning • Updated Apr 16, 2024

cleanrl/BreakoutNoFrameskip-v4-dqn_atari-seed1

Reinforcement Learning • Updated Apr 16, 2024 • 2

cleanrl/QbertNoFrameskip-v4-dqn_atari-seed1

Reinforcement Learning • Updated Apr 16, 2024

cleanrl/SpaceInvadersNoFrameskip-v4-dqn_atari-seed1

Reinforcement Learning • Updated Apr 16, 2024

cleanrl/MsPacmanNoFrameskip-v4-dqn_atari-seed1

Reinforcement Learning • Updated Apr 16, 2024

cleanrl/Ant-v2-td3_continuous_action_jax-seed1

Reinforcement Learning • Updated Oct 16, 2023

cleanrl/Ant-v2-td3_continuous_action-seed1

Reinforcement Learning • Updated Oct 16, 2023

cleanrl/Swimmer-v4-td3_continuous_action_jax-seed1

Reinforcement Learning • Updated Oct 16, 2023

cleanrl/Ant-v4-td3_continuous_action_jax-seed1

Reinforcement Learning • Updated Oct 16, 2023

cleanrl/Swimmer-v4-td3_continuous_action-seed1

Reinforcement Learning • Updated Oct 16, 2023

cleanrl/Ant-v4-td3_continuous_action-seed1

Reinforcement Learning • Updated Oct 16, 2023

cleanrl/InvertedPendulum-v2-ppo_continuous_action-seed1

Reinforcement Learning • Updated Oct 15, 2023

cleanrl/Humanoid-v2-ppo_continuous_action-seed1

Reinforcement Learning • Updated Oct 15, 2023

cleanrl/Pusher-v2-ppo_continuous_action-seed1

Reinforcement Learning • Updated Oct 15, 2023

cleanrl/Ant-v2-ppo_continuous_action-seed1

Reinforcement Learning • Updated Oct 15, 2023

cleanrl/HalfCheetah-v2-ppo_continuous_action-seed1

Reinforcement Learning • Updated Oct 15, 2023

cleanrl/Walker2d-v2-ppo_continuous_action-seed1

Reinforcement Learning • Updated Oct 15, 2023

cleanrl/Hopper-v2-ppo_continuous_action-seed1

Reinforcement Learning • Updated Oct 15, 2023

cleanrl/InvertedPendulum-v4-ppo_continuous_action-seed1

Reinforcement Learning • Updated Oct 15, 2023