Update README.md
Browse files
README.md
CHANGED
|
@@ -23,15 +23,15 @@ model-index:
|
|
| 23 |
metrics:
|
| 24 |
- name: Precision
|
| 25 |
type: precision
|
| 26 |
-
value:
|
| 27 |
verified: false
|
| 28 |
- name: Recall
|
| 29 |
type: recall
|
| 30 |
-
value:
|
| 31 |
verified: false
|
| 32 |
- name: F1
|
| 33 |
type: f1
|
| 34 |
-
value:
|
| 35 |
verified: false
|
| 36 |
- task:
|
| 37 |
type: text-generation
|
|
@@ -41,15 +41,15 @@ model-index:
|
|
| 41 |
metrics:
|
| 42 |
- name: Precision
|
| 43 |
type: precision
|
| 44 |
-
value:
|
| 45 |
verified: false
|
| 46 |
- name: Recall
|
| 47 |
type: recall
|
| 48 |
-
value:
|
| 49 |
verified: false
|
| 50 |
- name: F1
|
| 51 |
type: f1
|
| 52 |
-
value:
|
| 53 |
verified: false
|
| 54 |
- task:
|
| 55 |
type: text-generation
|
|
@@ -59,15 +59,15 @@ model-index:
|
|
| 59 |
metrics:
|
| 60 |
- name: Precision
|
| 61 |
type: precision
|
| 62 |
-
value:
|
| 63 |
verified: false
|
| 64 |
- name: Recall
|
| 65 |
type: recall
|
| 66 |
-
value:
|
| 67 |
verified: false
|
| 68 |
- name: F1
|
| 69 |
type: f1
|
| 70 |
-
value:
|
| 71 |
verified: false
|
| 72 |
- task:
|
| 73 |
type: text-generation
|
|
@@ -77,15 +77,15 @@ model-index:
|
|
| 77 |
metrics:
|
| 78 |
- name: Precision
|
| 79 |
type: precision
|
| 80 |
-
value:
|
| 81 |
verified: false
|
| 82 |
- name: Recall
|
| 83 |
type: recall
|
| 84 |
-
value:
|
| 85 |
verified: false
|
| 86 |
- name: F1
|
| 87 |
type: f1
|
| 88 |
-
value:
|
| 89 |
verified: false
|
| 90 |
- task:
|
| 91 |
type: text-generation
|
|
@@ -131,7 +131,7 @@ model-index:
|
|
| 131 |
## Summary
|
| 132 |
|
| 133 |
The model corrects spelling errors and typos in both Russian and English languages by bringing all the words in the text to the norm of the language.
|
| 134 |
-
Corrector had been trained based on the model [
|
| 135 |
An extensive dataset with “artificial” errors was taken as a training corpus: the corpus was assembled on the basis of the Russian-language Wikipedia and transcripts of Russian-language videos, then typos and spelling errors were automatically introduced into it using the library [SAGE](https://github.com/ai-forever/sage).
|
| 136 |
|
| 137 |
## Public references
|
|
@@ -164,7 +164,8 @@ RUSpellRU, MultidomainGold, MedSpellChecker, GitHubTypoCorpusRu are datasets for
|
|
| 164 |
**RUSpellRU**
|
| 165 |
| Model | Precision | Recall | F1 |
|
| 166 |
| --- | --- | --- | --- |
|
| 167 |
-
| sage-mt5-large |
|
|
|
|
| 168 |
| sage-ai-service | 93.5 | 82.4 | 87.6 |
|
| 169 |
| gpt-3.5-turbo | 39.6 | 62.3 | 48.5 |
|
| 170 |
| gpt-4 | 69.5 | 81.0 | 74.8 |
|
|
@@ -172,7 +173,8 @@ RUSpellRU, MultidomainGold, MedSpellChecker, GitHubTypoCorpusRu are datasets for
|
|
| 172 |
**MultidomainGold**
|
| 173 |
| Model | Precision | Recall | F1 |
|
| 174 |
| --- | --- | --- | --- |
|
| 175 |
-
| sage-mt5-large |
|
|
|
|
| 176 |
| sage-ai-service | 70.9 | 68.8 | 69.9 |
|
| 177 |
| gpt-3.5-turbo | 17.8 | 56.1 | 27.0 |
|
| 178 |
| gpt-4 | 31.1 | 78.1 | 44.5 |
|
|
@@ -180,20 +182,39 @@ RUSpellRU, MultidomainGold, MedSpellChecker, GitHubTypoCorpusRu are datasets for
|
|
| 180 |
**MedSpellChecker**
|
| 181 |
| Model | Precision | Recall | F1 |
|
| 182 |
| --- | --- | --- | --- |
|
| 183 |
-
| sage-mt5-large |
|
|
|
|
| 184 |
| sage-ai-service | 73.4 | 76.2 | 74.9 |
|
| 185 |
| gpt-3.5-turbo | 15.1 | 53.6 | 23.5 |
|
| 186 |
| gpt-4 | 48.9 | 88.7 | 63.1 |
|
| 187 |
|
| 188 |
-
|
| 189 |
**GitHubTypoCorpusRu**
|
| 190 |
| Model | Precision | Recall | F1 |
|
| 191 |
| --- | --- | --- | --- |
|
| 192 |
-
| sage-mt5-large |
|
|
|
|
| 193 |
| sage-ai-service | 76.1 | 51.2 | 61.2 |
|
| 194 |
| gpt-3.5-turbo | 23.7 | 43.9 | 30.8 |
|
| 195 |
| gpt-4 | 34.7 | 60.5 | 44.1|
|
| 196 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 197 |
|
| 198 |
## How to use
|
| 199 |
```python
|
|
|
|
| 23 |
metrics:
|
| 24 |
- name: Precision
|
| 25 |
type: precision
|
| 26 |
+
value: 56.2
|
| 27 |
verified: false
|
| 28 |
- name: Recall
|
| 29 |
type: recall
|
| 30 |
+
value: 65.8
|
| 31 |
verified: false
|
| 32 |
- name: F1
|
| 33 |
type: f1
|
| 34 |
+
value: 60.6
|
| 35 |
verified: false
|
| 36 |
- task:
|
| 37 |
type: text-generation
|
|
|
|
| 41 |
metrics:
|
| 42 |
- name: Precision
|
| 43 |
type: precision
|
| 44 |
+
value: 42.1
|
| 45 |
verified: false
|
| 46 |
- name: Recall
|
| 47 |
type: recall
|
| 48 |
+
value: 47.5
|
| 49 |
verified: false
|
| 50 |
- name: F1
|
| 51 |
type: f1
|
| 52 |
+
value: 44.6
|
| 53 |
verified: false
|
| 54 |
- task:
|
| 55 |
type: text-generation
|
|
|
|
| 59 |
metrics:
|
| 60 |
- name: Precision
|
| 61 |
type: precision
|
| 62 |
+
value: 38.6
|
| 63 |
verified: false
|
| 64 |
- name: Recall
|
| 65 |
type: recall
|
| 66 |
+
value: 56.0
|
| 67 |
verified: false
|
| 68 |
- name: F1
|
| 69 |
type: f1
|
| 70 |
+
value: 45.7
|
| 71 |
verified: false
|
| 72 |
- task:
|
| 73 |
type: text-generation
|
|
|
|
| 77 |
metrics:
|
| 78 |
- name: Precision
|
| 79 |
type: precision
|
| 80 |
+
value: 52.8
|
| 81 |
verified: false
|
| 82 |
- name: Recall
|
| 83 |
type: recall
|
| 84 |
+
value: 49.8
|
| 85 |
verified: false
|
| 86 |
- name: F1
|
| 87 |
type: f1
|
| 88 |
+
value: 51.2
|
| 89 |
verified: false
|
| 90 |
- task:
|
| 91 |
type: text-generation
|
|
|
|
| 131 |
## Summary
|
| 132 |
|
| 133 |
The model corrects spelling errors and typos in both Russian and English languages by bringing all the words in the text to the norm of the language.
|
| 134 |
+
Corrector had been trained based on the model [mT5-large](https://huggingface.co/google/mt5-large) architecture.
|
| 135 |
An extensive dataset with “artificial” errors was taken as a training corpus: the corpus was assembled on the basis of the Russian-language Wikipedia and transcripts of Russian-language videos, then typos and spelling errors were automatically introduced into it using the library [SAGE](https://github.com/ai-forever/sage).
|
| 136 |
|
| 137 |
## Public references
|
|
|
|
| 164 |
**RUSpellRU**
|
| 165 |
| Model | Precision | Recall | F1 |
|
| 166 |
| --- | --- | --- | --- |
|
| 167 |
+
| sage-mt5-large | 56.2 | 65.8 | 60.6 |
|
| 168 |
+
| sage-mt5-large (ft.) | 88.4 | 71.6 | 79.1 |
|
| 169 |
| sage-ai-service | 93.5 | 82.4 | 87.6 |
|
| 170 |
| gpt-3.5-turbo | 39.6 | 62.3 | 48.5 |
|
| 171 |
| gpt-4 | 69.5 | 81.0 | 74.8 |
|
|
|
|
| 173 |
**MultidomainGold**
|
| 174 |
| Model | Precision | Recall | F1 |
|
| 175 |
| --- | --- | --- | --- |
|
| 176 |
+
| sage-mt5-large | 42.1 | 47.5 | 44.6 |
|
| 177 |
+
| sage-mt5-large (ft.) | 65.3 | 62.7 | 63.9 |
|
| 178 |
| sage-ai-service | 70.9 | 68.8 | 69.9 |
|
| 179 |
| gpt-3.5-turbo | 17.8 | 56.1 | 27.0 |
|
| 180 |
| gpt-4 | 31.1 | 78.1 | 44.5 |
|
|
|
|
| 182 |
**MedSpellChecker**
|
| 183 |
| Model | Precision | Recall | F1 |
|
| 184 |
| --- | --- | --- | --- |
|
| 185 |
+
| sage-mt5-large | 38.6 | 56.0 | 45.7 |
|
| 186 |
+
| sage-mt5-large (ft.) | 77.7 | 77.5 | 77.6 |
|
| 187 |
| sage-ai-service | 73.4 | 76.2 | 74.9 |
|
| 188 |
| gpt-3.5-turbo | 15.1 | 53.6 | 23.5 |
|
| 189 |
| gpt-4 | 48.9 | 88.7 | 63.1 |
|
| 190 |
|
|
|
|
| 191 |
**GitHubTypoCorpusRu**
|
| 192 |
| Model | Precision | Recall | F1 |
|
| 193 |
| --- | --- | --- | --- |
|
| 194 |
+
| sage-mt5-large | 52.8 | 49.8 | 51.2 |
|
| 195 |
+
| sage-mt5-large (ft.) | 69.5 | 46.0 | 55.3 |
|
| 196 |
| sage-ai-service | 76.1 | 51.2 | 61.2 |
|
| 197 |
| gpt-3.5-turbo | 23.7 | 43.9 | 30.8 |
|
| 198 |
| gpt-4 | 34.7 | 60.5 | 44.1|
|
| 199 |
|
| 200 |
+
**BEA60K**
|
| 201 |
+
| Model | Precision | Recall | F1 |
|
| 202 |
+
| --- | --- | --- | --- |
|
| 203 |
+
| sage-mt5-large | 64.7 | 83.8 | 73.0 |
|
| 204 |
+
| gpt-3.5-turbo | 66.9 | 84.1 | 74.5 |
|
| 205 |
+
| gpt-4 | 68.6 | 85.2 | 76.0 |
|
| 206 |
+
| Bert (https://github.com/neuspell/neuspell) | 65.8 | 79.6 | 72.0 |
|
| 207 |
+
| SC-LSTM (https://github.com/neuspell/neuspell) | 62.2 | 80.3 | 72.0 |
|
| 208 |
+
|
| 209 |
+
**JFLEG**
|
| 210 |
+
| Model | Precision | Recall | F1 |
|
| 211 |
+
| --- | --- | --- | --- |
|
| 212 |
+
| sage-mt5-large | 74.9 | 88.4 | 81.1 |
|
| 213 |
+
| gpt-3.5-turbo | 77.8 | 88.6 | 82.9 |
|
| 214 |
+
| gpt-4 | 77.9 | 88.3 | 82.8 |
|
| 215 |
+
| Bert (https://github.com/neuspell/neuspell) | 78.5 | 85.4 | 81.8 |
|
| 216 |
+
| SC-LSTM (https://github.com/neuspell/neuspell) | 80.6 | 86.1 | 83.2 |
|
| 217 |
+
|
| 218 |
|
| 219 |
## How to use
|
| 220 |
```python
|