Update README.md
Browse files
README.md
CHANGED
|
@@ -4,13 +4,19 @@ _NOTE: this model card is a WIP_
|
|
| 4 |
|
| 5 |
GPT2-L (774M parameters) fine-tuned on the Wizard of Wikipedia dataset for 40k steps with 34/36 layers frozen using `aitextgen`. This model was then subsequently further fine-tuned on the [Daily Dialogues](http://yanran.li/dailydialog) dataset for an additional 40k steps, this time with **35** of 36 layers frozen.
|
| 6 |
|
| 7 |
-
Designed for use with [ai-msgbot](https://github.com/pszemraj/ai-msgbot) to create an open-ended chatbot (of course, if other use cases arise have at it).
|
| 8 |
|
| 9 |
|
| 10 |
## conversation data
|
| 11 |
|
| 12 |
-
The dataset was tokenized and fed to the model as a conversation between two speakers, whose names are below.
|
| 13 |
|
| 14 |
`script_speaker_name` = `person alpha`
|
| 15 |
|
| 16 |
-
`script_responder_name` = `person beta`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
GPT2-L (774M parameters) fine-tuned on the Wizard of Wikipedia dataset for 40k steps with 34/36 layers frozen using `aitextgen`. This model was then subsequently further fine-tuned on the [Daily Dialogues](http://yanran.li/dailydialog) dataset for an additional 40k steps, this time with **35** of 36 layers frozen.
|
| 6 |
|
| 7 |
+
Designed for use with [ai-msgbot](https://github.com/pszemraj/ai-msgbot) to create an open-ended chatbot (of course, if other use cases arise, have at it).
|
| 8 |
|
| 9 |
|
| 10 |
## conversation data
|
| 11 |
|
| 12 |
+
The dataset was tokenized and fed to the model as a conversation between two speakers, whose names are below. This is relevant for writing prompts and filtering/extracting text from responses.
|
| 13 |
|
| 14 |
`script_speaker_name` = `person alpha`
|
| 15 |
|
| 16 |
+
`script_responder_name` = `person beta`
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
## examples
|
| 20 |
+
|
| 21 |
+
- the default inference API examples should work _okay_
|
| 22 |
+
- an ideal test would be explicitly adding `person beta` to the **end** of the prompt text. That way, the model is forced to respond to instead of adding to the entered prompt.
|