remove `<|endoftext|>`
Browse filesMy newly submitted [code](https://github.com/ggml-org/whisper.cpp/pull/3555) removes the mechanism of hard-coding glossaries to make them compatible with custom glossaries. However, the newly submitted code fails the [Ruby test](https://github.com/Jaffe2718/whisper.cpp/actions/runs/20052653983), and after my troubleshooting, it is not a problem with the code, but a special token `<|endoftext|>` in the English ggml model vocabulary, which caused the calculation of [positioning](https://github.com/Jaffe2718/whisper.cpp/blob/master/src/whisper.cpp#L1621-L1631) of the ID of the special token to be wrong. Similar to https://github.com/ggml-org/whisper.cpp/pull/725, remove special token in English model. These models are transformed using my [modified script](https://github.com/Jaffe2718/whisper.cpp/blob/master/models/convert-h5-to-ggml.py), and after my testing, it is compatible with whisper.cpp current official latest version [v1.8.2](https://github.com/ggml-org/whisper.cpp/tree/v1.8.2). It is necessary to remove special tags from the glossary from the perspective of code correctness, so I hope your team will adopt this pull request.
- ggml-base.en.bin +2 -2
- ggml-medium.en.bin +2 -2
- ggml-small.en.bin +2 -2
- ggml-tiny.en.bin +2 -2
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e111a865d56afc4adf1379a2028544b3275dd25839c460953ed9a126632dcda2
|
| 3 |
+
size 147964194
|
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7eeec432c4a60becf9fc91c10a0a6f9803752546ca4bf5f75f686bb73c210495
|
| 3 |
+
size 1533774764
|
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cef314a500e45ace34b8f58529b983ab545b57d8d3c39b359bcabe8cb6c5a795
|
| 3 |
+
size 487614184
|
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8729634ed8e45db72893c34a6c671a2eef06f551eaf1056b8fa92ab45008b425
|
| 3 |
+
size 77704698
|