v0.38.0
Browse filesSee https://github.com/quic/ai-hub-models/releases/v0.38.0 for changelog.
README.md
CHANGED
|
@@ -31,8 +31,8 @@ More details on model performance across various devices, can be found
|
|
| 31 |
- **Model Type:** Model_use_case.text_generation
|
| 32 |
- **Model Stats:**
|
| 33 |
- Input sequence length for Prompt Processor: 128
|
| 34 |
-
-
|
| 35 |
-
- Precision: w4a16 + w8a16 (few layers)
|
| 36 |
- Num of key-value heads: 8
|
| 37 |
- Model-1 (Prompt Processor): Llama-PromptProcessor-Quantized
|
| 38 |
- Prompt processor input: 128 tokens + position embeddings + attention mask + KV cache inputs
|
|
@@ -52,6 +52,7 @@ More details on model performance across various devices, can be found
|
|
| 52 |
| Llama-v3.2-3B-Instruct | w4a16 | Snapdragon X Elite CRD | Snapdragon® X Elite | GENIE | 18.4176 | 0.12593600000000002 - 4.029952000000001 | -- | -- |
|
| 53 |
| Llama-v3.2-3B-Instruct | w4a16 | SA8255P ADP | Qualcomm® SA8255P | GENIE | 14.02377 | 0.187414 - 5.997256999999999 | -- | -- |
|
| 54 |
| Llama-v3.2-3B-Instruct | w4 | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite Mobile | GENIE | 13.83 | 0.088195 - 2.82225 | -- | -- |
|
|
|
|
| 55 |
|
| 56 |
## Deploying Llama 3.2 3B on-device
|
| 57 |
|
|
|
|
| 31 |
- **Model Type:** Model_use_case.text_generation
|
| 32 |
- **Model Stats:**
|
| 33 |
- Input sequence length for Prompt Processor: 128
|
| 34 |
+
- Maximum context length: 4096
|
| 35 |
+
- Precision: w4 + w8 (few layers) with fp16 activations and w4a16 + w8a16 (few layers) are supported
|
| 36 |
- Num of key-value heads: 8
|
| 37 |
- Model-1 (Prompt Processor): Llama-PromptProcessor-Quantized
|
| 38 |
- Prompt processor input: 128 tokens + position embeddings + attention mask + KV cache inputs
|
|
|
|
| 52 |
| Llama-v3.2-3B-Instruct | w4a16 | Snapdragon X Elite CRD | Snapdragon® X Elite | GENIE | 18.4176 | 0.12593600000000002 - 4.029952000000001 | -- | -- |
|
| 53 |
| Llama-v3.2-3B-Instruct | w4a16 | SA8255P ADP | Qualcomm® SA8255P | GENIE | 14.02377 | 0.187414 - 5.997256999999999 | -- | -- |
|
| 54 |
| Llama-v3.2-3B-Instruct | w4 | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite Mobile | GENIE | 13.83 | 0.088195 - 2.82225 | -- | -- |
|
| 55 |
+
| Llama-v3.2-3B-Instruct | w4 | SA8295P ADP | Qualcomm® SA8295P | GENIE | 3.523 | 0.37311700000000003 - 2.9849360000000003 | -- | -- |
|
| 56 |
|
| 57 |
## Deploying Llama 3.2 3B on-device
|
| 58 |
|