sovitrath
/

Phi-3.5-Vision-Instruct-OCR

sovitrath commited on Oct 11

Commit

31e985a

verified ·

1 Parent(s): 790442c

Upload folder using huggingface_hub

Files changed (1) hide show

README.md CHANGED Viewed

@@ -15,6 +15,46 @@ The dataset is **[available on Kaggle](https://www.kaggle.com/datasets/sovitrath
 - The base model is **[sovitrath/Phi-3.5-vision-instruct](sovitrath/Phi-3.5-vision-instruct)**.
 ## How to Get Started with the Model
 Use the code below to get started with the model.
@@ -173,44 +213,3 @@ training_args = transformers.TrainingArguments(
 The current best validation loss is **0.377421**.
 The CER on the test set is **0.355**. The Qwen2.5-3B VL test annotations were used as ground truth.
-## Technical Specifications [optional]
-### Compute Infrastructure
-The model was trained on a system with 10GB RTX 3080 GPU, 10th generation i7 CPU, and 32GB RAM.
-### Framework versions
-```
-torch==2.5.1
-torchvision==0.20.1
-torchaudio==2.5.1
-flash-attn==2.7.2.post1
-triton==3.1.0
-transformers==4.51.3
-accelerate==1.2.0
-datasets==4.1.1
-huggingface-hub==0.31.1
-peft==0.15.2
-trl==0.18.0
-safetensors==0.4.5
-sentencepiece==0.2.0
-tiktoken==0.8.0
-einops==0.8.0
-opencv-python==4.10.0.84
-pillow==10.2.0
-numpy==2.2.0
-scipy==1.14.1
-tqdm==4.66.4
-pandas==2.2.2
-pyarrow==21.0.0
-regex==2024.11.6
-requests==2.32.3
-python-dotenv==1.1.1
-wandb==0.22.1
-rich==13.9.4
-jiwer==4.0.0
-bitsandbytes==0.45.0
-```

 - The base model is **[sovitrath/Phi-3.5-vision-instruct](sovitrath/Phi-3.5-vision-instruct)**.
+## Technical Specifications
+### Compute Infrastructure
+The model was trained on a system with 10GB RTX 3080 GPU, 10th generation i7 CPU, and 32GB RAM.
+### Framework versions
+```
+torch==2.5.1
+torchvision==0.20.1
+torchaudio==2.5.1
+flash-attn==2.7.2.post1
+triton==3.1.0
+transformers==4.51.3
+accelerate==1.2.0
+datasets==4.1.1
+huggingface-hub==0.31.1
+peft==0.15.2
+trl==0.18.0
+safetensors==0.4.5
+sentencepiece==0.2.0
+tiktoken==0.8.0
+einops==0.8.0
+opencv-python==4.10.0.84
+pillow==10.2.0
+numpy==2.2.0
+scipy==1.14.1
+tqdm==4.66.4
+pandas==2.2.2
+pyarrow==21.0.0
+regex==2024.11.6
+requests==2.32.3
+python-dotenv==1.1.1
+wandb==0.22.1
+rich==13.9.4
+jiwer==4.0.0
+bitsandbytes==0.45.0
+```
 ## How to Get Started with the Model
 Use the code below to get started with the model.
 The current best validation loss is **0.377421**.
 The CER on the test set is **0.355**. The Qwen2.5-3B VL test annotations were used as ground truth.