Update README.md
Browse files
README.md
CHANGED
|
@@ -1,22 +1,34 @@
|
|
| 1 |
---
|
| 2 |
library_name: keras
|
|
|
|
| 3 |
tags:
|
|
|
|
| 4 |
- translation
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
-
##
|
| 8 |
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
## Intended uses & limitations
|
| 12 |
|
| 13 |
-
|
| 14 |
|
| 15 |
## Training and evaluation data
|
| 16 |
-
|
| 17 |
-
|
| 18 |
|
| 19 |
## Training procedure
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
### Training hyperparameters
|
| 22 |
|
|
@@ -26,6 +38,13 @@ The following hyperparameters were used during training:
|
|
| 26 |
|----|-------------|-----|---|--------|-------|--------|------------------|
|
| 27 |
|RMSprop|0.0010000000474974513|0.0|0.8999999761581421|0.0|1e-07|False|float32|
|
| 28 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
## Model Plot
|
| 30 |
|
| 31 |
<details>
|
|
|
|
| 1 |
---
|
| 2 |
library_name: keras
|
| 3 |
+
license: apache-2.0
|
| 4 |
tags:
|
| 5 |
+
- seq2seq
|
| 6 |
- translation
|
| 7 |
+
language:
|
| 8 |
+
- en
|
| 9 |
+
- fr
|
| 10 |
---
|
| 11 |
|
| 12 |
+
## Keras Implementation of Character-level recurrent sequence-to-sequence model
|
| 13 |
|
| 14 |
+
This repo contains the model and the notebook [to this Keras example on Character-level recurrent sequence-to-sequence model](https://keras.io/examples/nlp/lstm_seq2seq/).
|
| 15 |
+
|
| 16 |
+
Full credits to : [fchollet](https://twitter.com/fchollet)
|
| 17 |
+
Model reproduced by : [Sumedh](https://huggingface.co/sumedh)
|
| 18 |
|
| 19 |
## Intended uses & limitations
|
| 20 |
|
| 21 |
+
This model implements a basic character-level recurrent sequence-to-sequence network for translating short English sentences into short French sentences, character-by-character. Note that it is fairly unusual to do character-level machine translation, as word-level models are more common in this domain. It works best on text of length <= 15 characters.
|
| 22 |
|
| 23 |
## Training and evaluation data
|
| 24 |
+
English to French translation data from
|
| 25 |
+
https://www.manythings.org/anki/
|
| 26 |
|
| 27 |
## Training procedure
|
| 28 |
+
- We start with input sequences from a domain (e.g. English sentences) and corresponding target sequences from another domain (e.g. French sentences).
|
| 29 |
+
- An encoder LSTM turns input sequences to 2 state vectors (we keep the last LSTM state and discard the outputs).
|
| 30 |
+
- A decoder LSTM is trained to turn the target sequences into the same sequence but offset by one timestep in the future, a training process called "teacher forcing" in this context. It uses as initial state the state vectors from the encoder. Effectively, the decoder learns to generate targets[t+1...] given targets[...t], conditioned on the input sequence.
|
| 31 |
+
- In inference mode, when we want to decode unknown input sequences, we: - Encode the input sequence into state vectors - Start with a target sequence of size 1 (just the start-of-sequence character) - Feed the state vectors and 1-char target sequence to the decoder to produce predictions for the next character - Sample the next character using these predictions (we simply use argmax). - Append the sampled character to the target sequence - Repeat until we generate the end-of-sequence character or we hit the character limit.
|
| 32 |
|
| 33 |
### Training hyperparameters
|
| 34 |
|
|
|
|
| 38 |
|----|-------------|-----|---|--------|-------|--------|------------------|
|
| 39 |
|RMSprop|0.0010000000474974513|0.0|0.8999999761581421|0.0|1e-07|False|float32|
|
| 40 |
|
| 41 |
+
```python
|
| 42 |
+
batch_size = 64 # Batch size for training.
|
| 43 |
+
epochs = 100 # Number of epochs to train for.
|
| 44 |
+
latent_dim = 256 # Latent dimensionality of the encoding space.
|
| 45 |
+
num_samples = 10000 # Number of samples to train on.
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
## Model Plot
|
| 49 |
|
| 50 |
<details>
|