qaihm-bot commited on
Commit
a4b5d80
·
verified ·
1 Parent(s): 852ef75

See https://github.com/quic/ai-hub-models/releases/v0.38.0 for changelog.

Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -31,8 +31,8 @@ More details on model performance across various devices, can be found
31
  - **Model Type:** Model_use_case.text_generation
32
  - **Model Stats:**
33
  - Input sequence length for Prompt Processor: 128
34
- - Context length: 4096
35
- - Precision: w4a16 + w8a16 (few layers)
36
  - Num of key-value heads: 8
37
  - Model-1 (Prompt Processor): Llama-PromptProcessor-Quantized
38
  - Prompt processor input: 128 tokens + position embeddings + attention mask + KV cache inputs
@@ -52,6 +52,7 @@ More details on model performance across various devices, can be found
52
  | Llama-v3.2-3B-Instruct | w4a16 | Snapdragon X Elite CRD | Snapdragon® X Elite | GENIE | 18.4176 | 0.12593600000000002 - 4.029952000000001 | -- | -- |
53
  | Llama-v3.2-3B-Instruct | w4a16 | SA8255P ADP | Qualcomm® SA8255P | GENIE | 14.02377 | 0.187414 - 5.997256999999999 | -- | -- |
54
  | Llama-v3.2-3B-Instruct | w4 | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite Mobile | GENIE | 13.83 | 0.088195 - 2.82225 | -- | -- |
 
55
 
56
  ## Deploying Llama 3.2 3B on-device
57
 
 
31
  - **Model Type:** Model_use_case.text_generation
32
  - **Model Stats:**
33
  - Input sequence length for Prompt Processor: 128
34
+ - Maximum context length: 4096
35
+ - Precision: w4 + w8 (few layers) with fp16 activations and w4a16 + w8a16 (few layers) are supported
36
  - Num of key-value heads: 8
37
  - Model-1 (Prompt Processor): Llama-PromptProcessor-Quantized
38
  - Prompt processor input: 128 tokens + position embeddings + attention mask + KV cache inputs
 
52
  | Llama-v3.2-3B-Instruct | w4a16 | Snapdragon X Elite CRD | Snapdragon® X Elite | GENIE | 18.4176 | 0.12593600000000002 - 4.029952000000001 | -- | -- |
53
  | Llama-v3.2-3B-Instruct | w4a16 | SA8255P ADP | Qualcomm® SA8255P | GENIE | 14.02377 | 0.187414 - 5.997256999999999 | -- | -- |
54
  | Llama-v3.2-3B-Instruct | w4 | Snapdragon 8 Elite QRD | Snapdragon® 8 Elite Mobile | GENIE | 13.83 | 0.088195 - 2.82225 | -- | -- |
55
+ | Llama-v3.2-3B-Instruct | w4 | SA8295P ADP | Qualcomm® SA8295P | GENIE | 3.523 | 0.37311700000000003 - 2.9849360000000003 | -- | -- |
56
 
57
  ## Deploying Llama 3.2 3B on-device
58