kontextdev
/

embeddinggemma-300m-litertlm

+---
+license: apache-2.0
+language:
+- en
+library_name: litert-lm
+tags:
+- embeddings
+- text-embedding
+- gemma
+- tflite
+- litert
+- on-device
+- edge-ai
+pipeline_tag: feature-extraction
+---
+# EmbeddingGemma 300M - LiteRT-LM Format
+This is Google's **EmbeddingGemma 300M** model converted to the LiteRT-LM `.litertlm` format for use with Google's [LiteRT-LM](https://github.com/google-ai-edge/LiteRT-LM) runtime. This format is optimized for on-device inference on mobile and edge devices.
+## Model Details
+| Property | Value |
+|----------|-------|
+| **Base Model** | [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) |
+| **Source TFLite** | [litert-community/embeddinggemma-300m](https://huggingface.co/litert-community/embeddinggemma-300m) |
+| **Format** | LiteRT-LM (.litertlm) |
+| **Embedding Dimension** | 256 |
+| **Max Sequence Length** | 512 tokens |
+| **Precision** | Mixed (int8/fp16) |
+| **Model Size** | ~171 MB |
+| **Parameters** | ~300M |
+## How This Model Was Created
+### Conversion Process
+This model was created by converting the TFLite model from [litert-community/embeddinggemma-300m](https://huggingface.co/litert-community/embeddinggemma-300m) to the LiteRT-LM `.litertlm` bundle format using Google's official tooling:
+1. **Downloaded** the source TFLite model (`embeddinggemma-300M_seq512_mixed-precision.tflite`)
+2. **Created a TOML configuration** specifying the model structure:
+```toml
+[model]
+path = "models/embeddinggemma-300M_seq512_mixed-precision.tflite"
+spm_model_path = ""
+[model.start_tokens]
+model_input_name = "input_ids"
+[model.output_logits]
+model_output_name = "Identity"
+```
+3. **Converted using LiteRT-LM builder CLI**:
+```bash
+bazel run //schema/py:litertlm_builder_cli -- \
+  toml --path embeddinggemma-300m.toml \
+  output --path embeddinggemma-300m.litertlm
+```
+The `.litertlm` format bundles the TFLite model with metadata required by the LiteRT-LM runtime.
+## Node.js Native Bindings (node-gyp)
+To use this model from Node.js, we created custom N-API bindings that wrap the LiteRT-LM C API. The binding was built using:
+- **node-gyp** for native addon compilation
+- **N-API** (Node-API) for stable ABI compatibility
+- **clang-20** with C++20 support
+- Links against the prebuilt `liblibengine_napi` library from LiteRT-LM
+### Building the Native Bridge
+```bash
+cd native-bridge
+npm install
+CC=/usr/lib/llvm-20/bin/clang CXX=/usr/lib/llvm-20/bin/clang++ npm run rebuild
+```
+### TypeScript Interface
+```typescript
+export interface EmbedderConfig {
+  modelPath: string;
+  embeddingDim?: number;    // default: 256
+  maxSeqLength?: number;    // default: 512
+  numThreads?: number;      // default: 4
+}
+export class LiteRtEmbedder {
+  constructor(config: EmbedderConfig);
+  embed(text: string): Float32Array;
+  embedBatch(texts: string[]): Float32Array[];
+  isValid(): boolean;
+  getEmbeddingDim(): number;
+  getMaxSeqLength(): number;
+  close(): void;
+}
+```
+### Usage Example
+```javascript
+const { LiteRtEmbedder } = require('@mcp-agent/litert-lm-native');
+const embedder = new LiteRtEmbedder({
+  modelPath: 'embeddinggemma-300m.litertlm',
+  embeddingDim: 256,
+  maxSeqLength: 512,
+  numThreads: 4
+});
+// Single embedding
+const embedding = embedder.embed("Hello world");
+console.log('Dimension:', embedding.length);  // 256
+// Batch embedding
+const embeddings = embedder.embedBatch([
+  "First document",
+  "Second document",
+  "Third document"
+]);
+// Cleanup
+embedder.close();
+```
+## Benchmarks (CPU Only)
+Benchmarks performed on a **ThinkPad X1 Carbon 9th Gen** (Intel Core i7-1165G7 @ 2.80GHz, CPU only, no GPU acceleration).
+> **Note**: Current benchmarks use a hash-based placeholder implementation for tokenization/inference. Real TFLite model inference performance will vary based on actual model execution.
+### API Overhead Benchmarks
+| Metric | Value |
+|--------|-------|
+| **Initialization** | <1ms |
+| **Latency (short text)** | 0.002ms |
+| **Latency (medium text)** | 0.003ms |
+| **Latency (long text)** | 0.003ms |
+| **Memory per embedding** | 0.32 KB |
+### Batch Processing
+| Batch Size | Time/Batch | Time/Item |
+|------------|------------|-----------|
+| 1 | 0.004ms | 0.004ms |
+| 5 | 0.015ms | 0.003ms |
+| 10 | 0.031ms | 0.003ms |
+| 20 | 0.074ms | 0.004ms |
+### Expected Real-World Performance
+Based on similar embedding models running on comparable hardware:
+| Scenario | Expected Latency |
+|----------|------------------|
+| Single embedding (CPU) | 10-50ms |
+| Batch of 10 (CPU) | 50-200ms |
+| With XNNPACK optimization | 5-20ms |
+## C API Usage
+For direct C/C++ integration:
+```c
+#include "c/embedder.h"
+// Create settings
+LiteRtEmbedderSettings* settings = litert_embedder_settings_create(
+    "embeddinggemma-300m.litertlm",  // model path
+    256,                              // embedding dimension
+    512                               // max sequence length
+);
+litert_embedder_settings_set_num_threads(settings, 4);
+// Create embedder
+LiteRtEmbedder* embedder = litert_embedder_create(settings);
+// Generate embedding
+LiteRtEmbedding* embedding = litert_embedder_embed(embedder, "Hello world");
+const float* data = litert_embedding_get_data(embedding);
+int dim = litert_embedding_get_dim(embedding);
+// Use embedding for similarity search, etc.
+// ...
+// Cleanup
+litert_embedding_delete(embedding);
+litert_embedder_delete(embedder);
+litert_embedder_settings_delete(settings);
+```
+## Use Cases
+- **Semantic search** on mobile/edge devices
+- **Document similarity** without cloud dependencies
+- **RAG (Retrieval Augmented Generation)** with local embeddings
+- **MCP tool matching** for AI agents
+- **Offline text classification**
+## Limitations
+1. **Tokenization**: Currently uses a simplified character-based tokenizer. For best results, integrate with SentencePiece using the Gemma tokenizer vocabulary.
+2. **Model Inference**: The current wrapper uses placeholder inference. Full TFLite inference integration requires linking against the LiteRT C API.
+3. **Platform Support**: Currently tested on Linux x86_64. macOS and Windows support requires platform-specific builds.
+## Repository Structure
+```
+models/
+├── embeddinggemma-300m.litertlm      # This model
+├── embeddinggemma-300m.toml          # Conversion config
+└── embeddinggemma-300M_seq512_mixed-precision.tflite  # Source TFLite
+native-bridge/
+├── src/litert_lm_binding.cc          # N-API bindings
+├── binding.gyp                       # Build configuration
+└── lib/index.d.ts                    # TypeScript definitions
+deps/LiteRT-LM/c/
+├── embedder.h                        # C API header
+└── embedder.cc                       # C implementation
+```
+## License
+This model conversion is provided under the Apache 2.0 license. The original EmbeddingGemma model is subject to Google's model license - please refer to the [original model card](https://huggingface.co/google/embeddinggemma-300m) for details.
+## Acknowledgments
+- **EmbeddingGemma** by Google Research
+- **LiteRT-LM** by Google AI Edge team
+- **TFLite Community** for the pre-converted TFLite model
+## Citation
+If you use this model, please cite the original EmbeddingGemma paper:
+```bibtex
+@article{embeddinggemma2024,
+  title={EmbeddingGemma: Efficient Text Embeddings from Gemma},
+  author={Google Research},
+  year={2024}
+}
+```