Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -15,6 +15,46 @@ The dataset is **[available on Kaggle](https://www.kaggle.com/datasets/sovitrath
|
|
| 15 |
|
| 16 |
- The base model is **[sovitrath/Phi-3.5-vision-instruct](sovitrath/Phi-3.5-vision-instruct)**.
|
| 17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
## How to Get Started with the Model
|
| 19 |
|
| 20 |
Use the code below to get started with the model.
|
|
@@ -173,44 +213,3 @@ training_args = transformers.TrainingArguments(
|
|
| 173 |
The current best validation loss is **0.377421**.
|
| 174 |
|
| 175 |
The CER on the test set is **0.355**. The Qwen2.5-3B VL test annotations were used as ground truth.
|
| 176 |
-
|
| 177 |
-
## Technical Specifications [optional]
|
| 178 |
-
|
| 179 |
-
### Compute Infrastructure
|
| 180 |
-
|
| 181 |
-
The model was trained on a system with 10GB RTX 3080 GPU, 10th generation i7 CPU, and 32GB RAM.
|
| 182 |
-
|
| 183 |
-
### Framework versions
|
| 184 |
-
|
| 185 |
-
```
|
| 186 |
-
torch==2.5.1
|
| 187 |
-
torchvision==0.20.1
|
| 188 |
-
torchaudio==2.5.1
|
| 189 |
-
flash-attn==2.7.2.post1
|
| 190 |
-
triton==3.1.0
|
| 191 |
-
transformers==4.51.3
|
| 192 |
-
accelerate==1.2.0
|
| 193 |
-
datasets==4.1.1
|
| 194 |
-
huggingface-hub==0.31.1
|
| 195 |
-
peft==0.15.2
|
| 196 |
-
trl==0.18.0
|
| 197 |
-
safetensors==0.4.5
|
| 198 |
-
sentencepiece==0.2.0
|
| 199 |
-
tiktoken==0.8.0
|
| 200 |
-
einops==0.8.0
|
| 201 |
-
opencv-python==4.10.0.84
|
| 202 |
-
pillow==10.2.0
|
| 203 |
-
numpy==2.2.0
|
| 204 |
-
scipy==1.14.1
|
| 205 |
-
tqdm==4.66.4
|
| 206 |
-
pandas==2.2.2
|
| 207 |
-
pyarrow==21.0.0
|
| 208 |
-
regex==2024.11.6
|
| 209 |
-
requests==2.32.3
|
| 210 |
-
python-dotenv==1.1.1
|
| 211 |
-
wandb==0.22.1
|
| 212 |
-
rich==13.9.4
|
| 213 |
-
jiwer==4.0.0
|
| 214 |
-
bitsandbytes==0.45.0
|
| 215 |
-
```
|
| 216 |
-
|
|
|
|
| 15 |
|
| 16 |
- The base model is **[sovitrath/Phi-3.5-vision-instruct](sovitrath/Phi-3.5-vision-instruct)**.
|
| 17 |
|
| 18 |
+
## Technical Specifications
|
| 19 |
+
|
| 20 |
+
### Compute Infrastructure
|
| 21 |
+
|
| 22 |
+
The model was trained on a system with 10GB RTX 3080 GPU, 10th generation i7 CPU, and 32GB RAM.
|
| 23 |
+
|
| 24 |
+
### Framework versions
|
| 25 |
+
|
| 26 |
+
```
|
| 27 |
+
torch==2.5.1
|
| 28 |
+
torchvision==0.20.1
|
| 29 |
+
torchaudio==2.5.1
|
| 30 |
+
flash-attn==2.7.2.post1
|
| 31 |
+
triton==3.1.0
|
| 32 |
+
transformers==4.51.3
|
| 33 |
+
accelerate==1.2.0
|
| 34 |
+
datasets==4.1.1
|
| 35 |
+
huggingface-hub==0.31.1
|
| 36 |
+
peft==0.15.2
|
| 37 |
+
trl==0.18.0
|
| 38 |
+
safetensors==0.4.5
|
| 39 |
+
sentencepiece==0.2.0
|
| 40 |
+
tiktoken==0.8.0
|
| 41 |
+
einops==0.8.0
|
| 42 |
+
opencv-python==4.10.0.84
|
| 43 |
+
pillow==10.2.0
|
| 44 |
+
numpy==2.2.0
|
| 45 |
+
scipy==1.14.1
|
| 46 |
+
tqdm==4.66.4
|
| 47 |
+
pandas==2.2.2
|
| 48 |
+
pyarrow==21.0.0
|
| 49 |
+
regex==2024.11.6
|
| 50 |
+
requests==2.32.3
|
| 51 |
+
python-dotenv==1.1.1
|
| 52 |
+
wandb==0.22.1
|
| 53 |
+
rich==13.9.4
|
| 54 |
+
jiwer==4.0.0
|
| 55 |
+
bitsandbytes==0.45.0
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
## How to Get Started with the Model
|
| 59 |
|
| 60 |
Use the code below to get started with the model.
|
|
|
|
| 213 |
The current best validation loss is **0.377421**.
|
| 214 |
|
| 215 |
The CER on the test set is **0.355**. The Qwen2.5-3B VL test annotations were used as ground truth.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|