Update README.md
Browse filesAdd disclaimer about model size
README.md
CHANGED
|
@@ -158,6 +158,12 @@ The architecture of granite-vision-3.1-2b-preview consists of the following comp
|
|
| 158 |
|
| 159 |
We built upon LlaVA (https://llava-vl.github.io) to train our model. We use multi-layer encoder features and a denser grid resolution in AnyRes to enhance the model's ability to understand nuanced visual content, which is essential for accurately interpreting document images.
|
| 160 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 161 |
|
| 162 |
**Training Data:**
|
| 163 |
|
|
|
|
| 158 |
|
| 159 |
We built upon LlaVA (https://llava-vl.github.io) to train our model. We use multi-layer encoder features and a denser grid resolution in AnyRes to enhance the model's ability to understand nuanced visual content, which is essential for accurately interpreting document images.
|
| 160 |
|
| 161 |
+
_Note:_
|
| 162 |
+
|
| 163 |
+
We denote our model as Granite-Vision-3.1-2B-Preview, where the version (3.1) and size (2B) of the base large language model
|
| 164 |
+
are explicitly indicated. However, when considering the integrated vision encoder and projector, the total parameter count of our
|
| 165 |
+
model increases to 3 billion parameters.
|
| 166 |
+
|
| 167 |
|
| 168 |
**Training Data:**
|
| 169 |
|