manu commited on
Commit
1c44a5e
·
verified ·
1 Parent(s): 3f99f65

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -32,7 +32,10 @@ Data is the same as the ColPali data described in the paper.
32
 
33
  ## Model Training
34
 
35
- ### Dataset (Fully Image)
 
 
 
36
  Our training dataset of 127,460 query-page pairs is comprised of train sets of openly available academic datasets (63%) and a synthetic dataset made up of pages from web-crawled PDF documents and augmented with VLM-generated (Claude-3 Sonnet) pseudo-questions (37%).
37
  Our training set is fully English by design, enabling us to study zero-shot generalization to non-English languages. We explicitly verify no multi-page PDF document is used both [*ViDoRe*](https://huggingface.co/collections/vidore/vidore-benchmark-667173f98e70a1c0fa4db00d) and in the train set to prevent evaluation contamination.
38
  A validation set is created with 2% of the samples to tune hyperparameters.
 
32
 
33
  ## Model Training
34
 
35
+ ### Dataset
36
+
37
+ The audio retrieval capabilities are acquired in a 0-shot capacity, as the entire training data is purely image-text matching. Yhe audio and vision tower are frozen during training.
38
+
39
  Our training dataset of 127,460 query-page pairs is comprised of train sets of openly available academic datasets (63%) and a synthetic dataset made up of pages from web-crawled PDF documents and augmented with VLM-generated (Claude-3 Sonnet) pseudo-questions (37%).
40
  Our training set is fully English by design, enabling us to study zero-shot generalization to non-English languages. We explicitly verify no multi-page PDF document is used both [*ViDoRe*](https://huggingface.co/collections/vidore/vidore-benchmark-667173f98e70a1c0fa4db00d) and in the train set to prevent evaluation contamination.
41
  A validation set is created with 2% of the samples to tune hyperparameters.