litert-community
/

DeepSeek-R1-Distill-Qwen-1.5B

@@ -17,3 +17,10 @@ This model was converted to LiteRT (aka TFLite) format from [deepseek-ai/DeepSee
 ## Run the model on Android
 Please follow the [instructions](https://github.com/google-ai-edge/mediapipe-samples/blob/main/examples/llm_inference/android/README.md).

 ## Run the model on Android
 Please follow the [instructions](https://github.com/google-ai-edge/mediapipe-samples/blob/main/examples/llm_inference/android/README.md).
+## Benchmarking results
+| Model                                          | Params |                    | GGML tk/s (CPU, 4 threads) | GGML tk/s (CPU, 8 threads) | LiteRT tk/s (XNNPACK, 4 threads) | LiteRT tk/s (XNNPACK, 8 threads) |
+| ---------------------------------------------- | ------ | ------------------ | -------------------------- | -------------------------- | -------------------------------- | -------------------------------- |
+| DeepSeek-R1-Distill-Qwen-1.5B (Int8 quantized) | 1.78 B | Prefill 512 tokens | 64.66                      | 87.18                      | 260.95                           | 299.15                           |
+|                                                |        | Decode 128 tokens  | 23.85                      | 15.37                      | 23.126                           | 10.486                           |