PPO Agent playing LunarLander-v2
This is a trained model of a PPO agent playing LunarLander-v2 using the stable-baselines3 library.
Model Description
- Algorithm: Proximal Policy Optimization (PPO)
- Environment: LunarLander-v2 (Box2D)
- Mean Reward: 267.58 +/- 21.91 (over 10 evaluation episodes)
Usage (with Stable-baselines3)
from stable_baselines3 import PPO
from stable_baselines3.common.monitor import Monitor
from stable_baselines3.common.evaluation import evaluate_policy
from huggingface_sb3 import load_from_hub
import gymnasium as gym
# Load the model from Hugging Face Hub
repo_id = "Haxxsh/PPO-LunarLander-v2"
filename = "ppo-LunarLander-v2.zip"
# Custom objects for compatibility
custom_objects = {
"learning_rate": 0.0,
"lr_schedule": lambda _: 0.0,
"clip_range": lambda _: 0.0,
}
# Download and load the model
checkpoint = load_from_hub(repo_id, filename)
model = PPO.load(checkpoint, custom_objects=custom_objects, print_system_info=True)
# Create environment
eval_env = Monitor(gym.make("LunarLander-v2"))
# Evaluate the model
mean_reward, std_reward = evaluate_policy(model, eval_env, n_eval_episodes=10, deterministic=True)
print(f"Mean reward: {mean_reward:.2f} +/- {std_reward:.2f}")
# Watch the agent play
env = gym.make("LunarLander-v2", render_mode="human")
obs, info = env.reset()
for _ in range(1000):
action, _states = model.predict(obs, deterministic=True)
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset()
env.close()
Training Details
The model was trained using the Hugging Face Deep RL Course curriculum.
Hyperparameters
- Learning rate: [Add your learning rate]
- Number of timesteps: [Add total timesteps]
- Batch size: [Add batch size]
- Other relevant hyperparameters...
Results
The agent achieves a mean reward of 267.58 +/- 21.91 on the LunarLander-v2 environment, demonstrating successful landing behavior.
Installation
pip install stable-baselines3[extra]
pip install huggingface-sb3
pip install gymnasium[box2d]
Framework versions
- Stable-Baselines3: 2.0.0a5
- Gymnasium: 0.28.1
- Downloads last month
- 27
Evaluation results
- mean_reward on LunarLander-v2self-reported267.58 +/- 21.91