PPO Agent playing LunarLander-v2

This is a trained model of a PPO agent playing LunarLander-v2 using the stable-baselines3 library.

Model Description

Algorithm: Proximal Policy Optimization (PPO)
Environment: LunarLander-v2 (Box2D)
Mean Reward: 267.58 +/- 21.91 (over 10 evaluation episodes)

Usage (with Stable-baselines3)

from stable_baselines3 import PPO
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.evaluation import evaluate_policy
from huggingface_sb3 import load_from_hub
import gymnasium as gym

# Load the model from Hugging Face Hub
repo_id = "Haxxsh/PPO-LunarLander-v2"
filename = "ppo-LunarLander-v2.zip"

# Custom objects for compatibility
custom_objects = {
    "learning_rate": 0.0,
    "lr_schedule": lambda _: 0.0,
    "clip_range": lambda _: 0.0,
}

# Download and load the model
checkpoint = load_from_hub(repo_id, filename)
model = PPO.load(checkpoint, custom_objects=custom_objects, print_system_info=True)

# Create environment
eval_env = Monitor(gym.make("LunarLander-v2"))

# Evaluate the model
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"Mean reward: {mean_reward:.2f} +/- {std_reward:.2f}")

# Watch the agent play
env = gym.make("LunarLander-v2", render_mode="human")
obs, info = env.reset()
for _ in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, info = env.reset()
env.close()

Training Details

The model was trained using the Hugging Face Deep RL Course curriculum.

Hyperparameters

Learning rate: [Add your learning rate]
Number of timesteps: [Add total timesteps]
Batch size: [Add batch size]
Other relevant hyperparameters...

Results

The agent achieves a mean reward of 267.58 +/- 21.91 on the LunarLander-v2 environment, demonstrating successful landing behavior.

Installation

pip install stable-baselines3[extra]
pip install huggingface-sb3
pip install gymnasium[box2d]

Framework versions

Stable-Baselines3: 2.0.0a5
Gymnasium: 0.28.1

Downloads last month: 27

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on LunarLander-v2
self-reported

267.58 +/- 21.91