Update README.md
Browse files
README.md
CHANGED
|
@@ -1,8 +1,9 @@
|
|
| 1 |
---
|
| 2 |
-
license:
|
| 3 |
tags:
|
| 4 |
- policy representation
|
| 5 |
- diffusion
|
|
|
|
| 6 |
---
|
| 7 |
## Policy Representation via Diffusion Probability Model for Reinforcement Learning
|
| 8 |
|
|
@@ -43,9 +44,9 @@ Hyperparameters for DIPO have been shown as follow for easily reproducing our re
|
|
| 43 |
| No. of hidden nodes | 256 | 256 | 256 | 256 |
|
| 44 |
| Activation | mish | relu | relu | tanh |
|
| 45 |
| Batch size | 256 | 256 | 256 | 256 |
|
| 46 |
-
| Discount for reward
|
| 47 |
| Target smoothing coefficient $\tau$ | 0.005 | 0.005 | 0.005 | 0.005 |
|
| 48 |
-
| Learning rate for actor |
|
| 49 |
| Learning rate for actor | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $7 × 10^{-4}$ |
|
| 50 |
| Actor Critic grad norm | 2 | N/A | N/A | 0.5 |
|
| 51 |
| Memeroy size | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ |
|
|
|
|
| 1 |
---
|
| 2 |
+
license: mit
|
| 3 |
tags:
|
| 4 |
- policy representation
|
| 5 |
- diffusion
|
| 6 |
+
- reinforcement learning
|
| 7 |
---
|
| 8 |
## Policy Representation via Diffusion Probability Model for Reinforcement Learning
|
| 9 |
|
|
|
|
| 44 |
| No. of hidden nodes | 256 | 256 | 256 | 256 |
|
| 45 |
| Activation | mish | relu | relu | tanh |
|
| 46 |
| Batch size | 256 | 256 | 256 | 256 |
|
| 47 |
+
| Discount for reward $\gamma$ | 0.99 | 0.99 | 0.99 | 0.99 |
|
| 48 |
| Target smoothing coefficient $\tau$ | 0.005 | 0.005 | 0.005 | 0.005 |
|
| 49 |
+
| Learning rate for actor | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $7 × 10^{-4}$ |
|
| 50 |
| Learning rate for actor | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $3 × 10^{-4}$ | $7 × 10^{-4}$ |
|
| 51 |
| Actor Critic grad norm | 2 | N/A | N/A | 0.5 |
|
| 52 |
| Memeroy size | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ | $1 × 10^6$ |
|