| Hyperparameter | Value |
|---|---|
| Steps | 150k |
| Max length | 256 |
| LR | 1e-4 |
| LR schedule | constant |
| Optimizer | AdamW |
| beta_1, beta_2 | 0.9, 0.95 |
| Final eval loss | 2.245 |
| Final eval perplexity | 9.44 |
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support