Update README.md
Browse files
README.md
CHANGED
|
@@ -32,7 +32,10 @@ Data is the same as the ColPali data described in the paper.
|
|
| 32 |
|
| 33 |
## Model Training
|
| 34 |
|
| 35 |
-
### Dataset
|
|
|
|
|
|
|
|
|
|
| 36 |
Our training dataset of 127,460 query-page pairs is comprised of train sets of openly available academic datasets (63%) and a synthetic dataset made up of pages from web-crawled PDF documents and augmented with VLM-generated (Claude-3 Sonnet) pseudo-questions (37%).
|
| 37 |
Our training set is fully English by design, enabling us to study zero-shot generalization to non-English languages. We explicitly verify no multi-page PDF document is used both [*ViDoRe*](https://huggingface.co/collections/vidore/vidore-benchmark-667173f98e70a1c0fa4db00d) and in the train set to prevent evaluation contamination.
|
| 38 |
A validation set is created with 2% of the samples to tune hyperparameters.
|
|
|
|
| 32 |
|
| 33 |
## Model Training
|
| 34 |
|
| 35 |
+
### Dataset
|
| 36 |
+
|
| 37 |
+
The audio retrieval capabilities are acquired in a 0-shot capacity, as the entire training data is purely image-text matching. Yhe audio and vision tower are frozen during training.
|
| 38 |
+
|
| 39 |
Our training dataset of 127,460 query-page pairs is comprised of train sets of openly available academic datasets (63%) and a synthetic dataset made up of pages from web-crawled PDF documents and augmented with VLM-generated (Claude-3 Sonnet) pseudo-questions (37%).
|
| 40 |
Our training set is fully English by design, enabling us to study zero-shot generalization to non-English languages. We explicitly verify no multi-page PDF document is used both [*ViDoRe*](https://huggingface.co/collections/vidore/vidore-benchmark-667173f98e70a1c0fa4db00d) and in the train set to prevent evaluation contamination.
|
| 41 |
A validation set is created with 2% of the samples to tune hyperparameters.
|