--- library_name: stable-baselines3 tags: - FetchPickAndPlace-v4 - deep-reinforcement-learning - reinforcement-learning - stable-baselines3 model-index: - name: SAC results: - task: type: reinforcement-learning name: reinforcement-learning dataset: name: FetchPickAndPlace-v4 type: FetchPickAndPlace-v4 metrics: - type: mean_reward value: -9.70 +/- 4.17 name: mean_reward verified: false --- # SAC + HER Agent for FetchPickAndPlace-v4 ## Model Overview This repository contains a Soft Actor-Critic (SAC) agent trained with Hindsight Experience Replay (HER) on the `FetchPickAndPlace-v4` environment from `gymnasium-robotics`. The agent learns to pick and place objects using sparse or dense rewards, and is suitable for robotic manipulation research. - **Algorithm:** Soft Actor-Critic (SAC) - **Replay Buffer:** Hindsight Experience Replay (HER) - **Environment:** FetchPickAndPlace-v4 (`gymnasium-robotics`) - **Framework:** Stable Baselines3 ## Training Details - **Total Timesteps:** 500,000 - **Evaluation Frequency:** Every 2,000 steps (15 episodes per eval) - **Checkpoint Frequency:** Every 50,000 steps (model + replay buffer) - **Seed:** 42 - **Dense Shaping:** `False` (can be enabled with wrapper) - **Device:** CUDA if available, otherwise auto ### Hyperparameters | Parameter | Value | |--------------------------|----------------------| | Algorithm | SAC | | Policy | MultiInputPolicy | | Replay Buffer | HER | | n_sampled_goal | 4 | | goal_selection_strategy | future | | Batch Size | 512 | | Buffer Size | 1,000,000 | | Learning Rate | 1e-3 | | Gamma | 0.95 | | Tau | 0.05 | | Entropy Coefficient | auto | | Train Frequency | 1 step | | Gradient Steps | 1 | | Tensorboard Log | logs_pnp_sac_her/tb | | Seed | 42 | | Device | CUDA/Auto | | Dense Shaping | False (default) | ## Files - `sac_her_pnp.zip`: Final trained SAC model - `ckpt_sac_her_250000_steps.zip`: Latest checkpoint - `replay_buffer.pkl`: Replay buffer for continued training - `replay.mp4`: Replay video of agent performance (manual generation recommended) - `README.md`: This model card ## Usage To load and use the model for inference: ```python from stable_baselines3 import SAC import gymnasium as gym import gymnasium_robotics env = gym.make("FetchPickAndPlace-v4", render_mode="rgb_array") model = SAC.load("path/to/sac_her_pnp.zip", env=env) obs, info = env.reset() done = False while not done: action, _ = model.predict(obs, deterministic=True) obs, reward, done, truncated, info = env.step(action) env.render() ``` ## Evaluation To evaluate the agent over multiple episodes: ```python from stable_baselines3 import SAC import gymnasium as gym import gymnasium_robotics env = gym.make("FetchPickAndPlace-v4", render_mode="human") model = SAC.load("path/to/sac_her_pnp.zip", env=env) num_episodes = 10 for ep in range(num_episodes): obs, info = env.reset() done = False truncated = False episode_reward = 0 while not (done or truncated): action, _ = model.predict(obs, deterministic=True) obs, reward, done, truncated, info = env.step(action) env.render() episode_reward += reward print(f"Episode {ep+1} reward: {episode_reward}") env.close() ``` ## Replay Video If `replay.mp4` is not present, you can manually generate it: ```python import gymnasium as gym import gymnasium_robotics from stable_baselines3 import SAC import moviepy.editor as mpy env = gym.make("FetchPickAndPlace-v4", render_mode="rgb_array") model = SAC.load("path/to/sac_her_pnp.zip", env=env) frames = [] obs, info = env.reset() done = False truncated = False step = 0 max_steps = 1000 while not (done or truncated) and step < max_steps: frame = env.render() frames.append(frame) action, _ = model.predict(obs, deterministic=True) obs, reward, done, truncated, info = env.step(action) step += 1 env.close() clip = mpy.ImageSequenceClip(frames, fps=30) clip.write_videofile("replay.mp4", codec="libx264") ``` ## Continued Training To continue training from a checkpoint: ```python from stable_baselines3 import SAC import gymnasium as gym import gymnasium_robotics env = gym.make("FetchPickAndPlace-v4", render_mode=None) model = SAC.load("logs_pnp_sac_her/ckpt_sac_her_250000_steps.zip", env=env) model.learn(total_timesteps=500_000, reset_num_timesteps=False) ``` ## Citation If you use this model, please cite: ``` @misc{IntelliGrow_FetchPickAndPlace_SAC_HER, title={SAC + HER Agent for FetchPickAndPlace-v4}, author={IntelliGrow}, year={2025}, howpublished={Hugging Face Hub}, url={https://huggingface.co/IntelliGrow/FetchPickAndPlace-v4} } ``` ## License MIT License --- **Contact:** For questions or issues, open an issue on the [Hugging Face repository](https://huggingface.co/IntelliGrow/FetchPickAndPlace-v4).