Update README.md
Browse files
README.md
CHANGED
|
@@ -23,9 +23,9 @@ widget:
|
|
| 23 |
|
| 24 |
SEC-BERT is a family of BERT models for the financial domain, intended to assist financial NLP research and FinTech applications.
|
| 25 |
SEC-BERT consists of the following models:
|
| 26 |
-
* SEC-BERT-BASE (this model): Same architecture as BERT-BASE trained on financial documents.
|
| 27 |
-
* [SEC-BERT-NUM](https://huggingface.co/nlpaueb/sec-bert-num): Same as SEC-BERT-BASE but we replace every number token with a [NUM] pseudo-token handling all numeric expressions in a uniform manner, disallowing their fragmentation
|
| 28 |
-
* [SEC-BERT-SHAPE](https://huggingface.co/nlpaueb/sec-bert-shape): Same as SEC-BERT-BASE but we replace numbers with pseudo-tokens that represent the number’s shape, so numeric expressions (of known shapes) are no longer fragmented, e.g., '53.2' becomes '[XX.X]' and '40,200.5' becomes '[XX,XXX.X]'.
|
| 29 |
</div>
|
| 30 |
|
| 31 |
## Pre-training corpus
|
|
|
|
| 23 |
|
| 24 |
SEC-BERT is a family of BERT models for the financial domain, intended to assist financial NLP research and FinTech applications.
|
| 25 |
SEC-BERT consists of the following models:
|
| 26 |
+
* **SEC-BERT-BASE** (this model): Same architecture as BERT-BASE trained on financial documents.
|
| 27 |
+
* [**SEC-BERT-NUM**](https://huggingface.co/nlpaueb/sec-bert-num): Same as SEC-BERT-BASE but we replace every number token with a [NUM] pseudo-token handling all numeric expressions in a uniform manner, disallowing their fragmentation
|
| 28 |
+
* [**SEC-BERT-SHAPE**](https://huggingface.co/nlpaueb/sec-bert-shape): Same as SEC-BERT-BASE but we replace numbers with pseudo-tokens that represent the number’s shape, so numeric expressions (of known shapes) are no longer fragmented, e.g., '53.2' becomes '[XX.X]' and '40,200.5' becomes '[XX,XXX.X]'.
|
| 29 |
</div>
|
| 30 |
|
| 31 |
## Pre-training corpus
|