Adilbai
/

stock-trading-rl-agent

+---
+library_name: stable-baselines3
+tags:
+- reinforcement-learning
+- trading
+- finance
+- stock-market
+- ppo
+- quantitative-finance
+- algorithmic-trading
+license: mit
+---
+# 🚀 Stock Trading RL Agent - MyTestExp
+A reinforcement learning agent trained for stock trading using **PPO** algorithm.
+## 🎯 Model Overview
+This model uses reinforcement learning to make trading decisions (Hold, Buy, Sell) based on technical indicators and market data.
+### 🔧 Key Features
+- **Algorithm**: PPO
+- **Policy**: Multi-Layer Perceptron (MLP)
+- **Action Space**: Continuous (Action Type + Position Size)
+- **Observation Space**: Technical indicators + Portfolio state
+- **Training Steps**: 500,000
+- **Stocks Trained On**: 5
+## 📈 Training Configuration
+### Data Configuration
+```json
+{
+  "tickers": [
+    "AAPL",
+    "MSFT",
+    "GOOGL",
+    "AMZN",
+    "TSLA"
+  ],
+  "period": "5y",
+  "interval": "1d",
+  "use_sp500": false
+}
+```
+### Environment Configuration
+```
+{
+  "initial_balance": 10000,
+  "transaction_cost": 0.001,
+  "max_position_size": 1.0,
+  "lookback_window": 60,
+  "reward_type": "return"
+}
+```
+### Training Configuration
+```
+{
+  "algorithm": "PPO",
+  "total_timesteps": 500000,
+  "learning_rate": 0.0003,
+  "batch_size": 64,
+  "n_epochs": 10,
+  "gamma": 0.99,
+  "eval_freq": 1000,
+  "n_eval_episodes": 5,
+  "save_freq": 10000,
+  "seed": 42
+}
+## 📊 Evaluation Results
+| Stock | Total Return | Sharpe Ratio | Max Drawdown | Win Rate |
+|-------|-------------|-------------|-------------|----------|
+| AMZN | 162.87% | 0.74 | 187.11% | 6.72% |
+| MSFT | 7243.44% | 0.56 | 164.60% | 52.11% |
+| GOOGL | 0.00% | 0.00 | 0.00% | 0.00% |
+| TSLA | 109.91% | -0.22 | 145.29% | 44.76% |
+| AAPL | -74.02% | 0.65 | 157.07% | 7.01% |
+```
+🚀 Usage
+### Installation
+```
+pip install stable-baselines3 yfinance pandas numpy
+```
+Loading the Model
+```
+from stable_baselines3 import PPO
+# Load the trained model
+model = PPO.load("best_model.zip")
+```
+# Load the data scaler
+```
+import pickle
+with open("scaler.pkl", "rb") as f:
+    scaler = pickle.load(f)
+```
+# Making Predictions
+```
+import numpy as np
+# Prepare your observation (should match training format)
+obs = your_observation_data  # Shape: (n_features,)
+# Get action from model
+action, _states = model.predict(obs, deterministic=True)
+# action[0] = action type (0: Hold, 1: Buy, 2: Sell)
+# action[1] = position size (0-1)
+```
+📊 Model Performance
+The model has been evaluated on multiple stocks with the following key metrics:
+Risk-adjusted returns (Sharpe ratio)
+Maximum drawdown analysis
+Win rate performance
+Transaction cost considerations
+🛠️ Technical Details
+State Space
+The agent observes:
+Technical indicators (SMA, EMA, RSI, MACD, Bollinger Bands)
+Price and volume data
+Portfolio state (balance, position, net worth)
+Historical sequences (lookback window)
+Action Space
+Action Type: Discrete choice (Hold=0, Buy=1, Sell=2)
+Position Size: Continuous value (0-1) representing fraction of available capital
+Reward Function
+Type: return
+Considerations: Transaction costs, risk-adjusted returns
+📝 Training Details
+Environment: Enhanced Stock Trading Environment
+Evaluation Frequency: Every 1000 steps
+Model Checkpoints: Every 10000 steps
+Random Seed: 42 (for reproducibility)
+📋 Files in this Repository
+best_model.zip: Best performing model during training
+final_model.zip: Final model after training completion
+scaler.pkl: Data preprocessing scaler
+config.json: Complete training configuration
+evaluation_results.json: Detailed evaluation metrics
+training_summary.json: Training statistics and progress
+⚠️ Disclaimer
+This model is for educational and research purposes only. Past performance does not guarantee future results. Always do your own research and consider consulting with a financial advisor before making investment decisions.
+🤝 Contributing
+Contributions are welcome! Please feel free to submit issues and pull requests.
+📄 License
+This project is licensed under the MIT License.
+Generated on: 2025-07-04 17:14:46 UTC
+Training completed: 2025-07-04