Gemma 3n does not seem to work in the sample application for web
Hey guys,
I am encountering a problem. When I try to run gemma 3n in the sample application for web (https://github.com/google-ai-edge/mediapipe-samples/tree/main/examples/llm_inference/js) it does not seem to work. I get this error( Failed to initialize the task: Array buffer allocation failed). I tried running gemma2-2b-it-gpu-int8.bin and that worked perfectly. So far the only change I did to the files was changing the filename for the model
machine: Apple M3 pro
browser: chrome
Hey guys,
I am encountering a problem. When I try to run gemma 3n in the sample application for web (https://github.com/google-ai-edge/mediapipe-samples/tree/main/examples/llm_inference/js) it does not seem to work. I get this error( Failed to initialize the task: Array buffer allocation failed). I tried running gemma2-2b-it-gpu-int8.bin and that worked perfectly. So far the only change I did to the files was changing the filename for the model
machine: Apple M3 pro
browser: chrome
This is possibly because the model is too big to be loaded onto the browser. Try gemma3-1b-it-int4.task and it will work. The other one you tried is also quantized @8bit , so it barely made it :)
litert-community/Gemma3-12B-IT runs just fine in the sample app so model size is not the issue here
it seems to be a bug with @mediapipe/tasks-genai loading the 3n models (or just the zip format loading). The 3n .task files are actually a zip of TFL3 format files+metadata:
(base) dev@Mac-mini gemma-3n-E2B-it-int4.task % ls
METADATA TF_LITE_VISION_ADAPTER
TF_LITE_EMBEDDER TF_LITE_VISION_ENCODER
TF_LITE_PER_LAYER_EMBEDDER TOKENIZER_MODEL
TF_LITE_PREFILL_DECODE
unlike the litert-community/Gemma3-12B-IT task file which is a single TFL3 format file
Same issue here
Do not work with mediapipe do not load
litert-community/Gemma3-12B-IT runs just fine in the sample app so model size is not the issue here
it seems to be a bug with @mediapipe/tasks-genai loading the 3n models (or just the zip format loading). The 3n .task files are actually a zip of TFL3 format files+metadata:
(base) dev@Mac-mini gemma-3n-E2B-it-int4.task % ls METADATA TF_LITE_VISION_ADAPTER TF_LITE_EMBEDDER TF_LITE_VISION_ENCODER TF_LITE_PER_LAYER_EMBEDDER TOKENIZER_MODEL TF_LITE_PREFILL_DECODEunlike the litert-community/Gemma3-12B-IT task file which is a single TFL3 format file
so we're stuck for now because
- @mediapipe/tasks-genai loading of 3n is bugged
- tflite doesn't support int4 quantization
@google please
The 3n preview is not supported on web just yet. Web LLM inference presently supports:
- all text-only Gemma 3 variants
- MedGemma-27B
- Gemma 2 2B
- the older architectures it initially launched with (Phi 2, Falcon 1B, Stable LM 3B, Gemma 1 2B & 7B)
The full multimodal Gemma 3n is now supported on web with MediaPipe Web LLM Inference API. Additional demos to follow shortly, but model links and usage instructions are already available here.