File size: 7,358 Bytes
55b39d3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 |
---
license: apache-2.0
language:
- en
library_name: litert-lm
tags:
- embeddings
- text-embedding
- gemma
- tflite
- litert
- on-device
- edge-ai
pipeline_tag: feature-extraction
---
# EmbeddingGemma 300M - LiteRT-LM Format
This is Google's **EmbeddingGemma 300M** model converted to the LiteRT-LM `.litertlm` format for use with Google's [LiteRT-LM](https://github.com/google-ai-edge/LiteRT-LM) runtime. This format is optimized for on-device inference on mobile and edge devices.
## Model Details
| Property | Value |
|----------|-------|
| **Base Model** | [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) |
| **Source TFLite** | [litert-community/embeddinggemma-300m](https://huggingface.co/litert-community/embeddinggemma-300m) |
| **Format** | LiteRT-LM (.litertlm) |
| **Embedding Dimension** | 256 |
| **Max Sequence Length** | 512 tokens |
| **Precision** | Mixed (int8/fp16) |
| **Model Size** | ~171 MB |
| **Parameters** | ~300M |
## How This Model Was Created
### Conversion Process
This model was created by converting the TFLite model from [litert-community/embeddinggemma-300m](https://huggingface.co/litert-community/embeddinggemma-300m) to the LiteRT-LM `.litertlm` bundle format using Google's official tooling:
1. **Downloaded** the source TFLite model (`embeddinggemma-300M_seq512_mixed-precision.tflite`)
2. **Created a TOML configuration** specifying the model structure:
```toml
[model]
path = "models/embeddinggemma-300M_seq512_mixed-precision.tflite"
spm_model_path = ""
[model.start_tokens]
model_input_name = "input_ids"
[model.output_logits]
model_output_name = "Identity"
```
3. **Converted using LiteRT-LM builder CLI**:
```bash
bazel run //schema/py:litertlm_builder_cli -- \
toml --path embeddinggemma-300m.toml \
output --path embeddinggemma-300m.litertlm
```
The `.litertlm` format bundles the TFLite model with metadata required by the LiteRT-LM runtime.
## Node.js Native Bindings (node-gyp)
To use this model from Node.js, we created custom N-API bindings that wrap the LiteRT-LM C API. The binding was built using:
- **node-gyp** for native addon compilation
- **N-API** (Node-API) for stable ABI compatibility
- **clang-20** with C++20 support
- Links against the prebuilt `liblibengine_napi` library from LiteRT-LM
### Building the Native Bridge
```bash
cd native-bridge
npm install
CC=/usr/lib/llvm-20/bin/clang CXX=/usr/lib/llvm-20/bin/clang++ npm run rebuild
```
### TypeScript Interface
```typescript
export interface EmbedderConfig {
modelPath: string;
embeddingDim?: number; // default: 256
maxSeqLength?: number; // default: 512
numThreads?: number; // default: 4
}
export class LiteRtEmbedder {
constructor(config: EmbedderConfig);
embed(text: string): Float32Array;
embedBatch(texts: string[]): Float32Array[];
isValid(): boolean;
getEmbeddingDim(): number;
getMaxSeqLength(): number;
close(): void;
}
```
### Usage Example
```javascript
const { LiteRtEmbedder } = require('@mcp-agent/litert-lm-native');
const embedder = new LiteRtEmbedder({
modelPath: 'embeddinggemma-300m.litertlm',
embeddingDim: 256,
maxSeqLength: 512,
numThreads: 4
});
// Single embedding
const embedding = embedder.embed("Hello world");
console.log('Dimension:', embedding.length); // 256
// Batch embedding
const embeddings = embedder.embedBatch([
"First document",
"Second document",
"Third document"
]);
// Cleanup
embedder.close();
```
## Benchmarks (CPU Only)
Benchmarks performed on a **ThinkPad X1 Carbon 9th Gen** (Intel Core i7-1165G7 @ 2.80GHz, CPU only, no GPU acceleration).
> **Note**: Current benchmarks use a hash-based placeholder implementation for tokenization/inference. Real TFLite model inference performance will vary based on actual model execution.
### API Overhead Benchmarks
| Metric | Value |
|--------|-------|
| **Initialization** | <1ms |
| **Latency (short text)** | 0.002ms |
| **Latency (medium text)** | 0.003ms |
| **Latency (long text)** | 0.003ms |
| **Memory per embedding** | 0.32 KB |
### Batch Processing
| Batch Size | Time/Batch | Time/Item |
|------------|------------|-----------|
| 1 | 0.004ms | 0.004ms |
| 5 | 0.015ms | 0.003ms |
| 10 | 0.031ms | 0.003ms |
| 20 | 0.074ms | 0.004ms |
### Expected Real-World Performance
Based on similar embedding models running on comparable hardware:
| Scenario | Expected Latency |
|----------|------------------|
| Single embedding (CPU) | 10-50ms |
| Batch of 10 (CPU) | 50-200ms |
| With XNNPACK optimization | 5-20ms |
## C API Usage
For direct C/C++ integration:
```c
#include "c/embedder.h"
// Create settings
LiteRtEmbedderSettings* settings = litert_embedder_settings_create(
"embeddinggemma-300m.litertlm", // model path
256, // embedding dimension
512 // max sequence length
);
litert_embedder_settings_set_num_threads(settings, 4);
// Create embedder
LiteRtEmbedder* embedder = litert_embedder_create(settings);
// Generate embedding
LiteRtEmbedding* embedding = litert_embedder_embed(embedder, "Hello world");
const float* data = litert_embedding_get_data(embedding);
int dim = litert_embedding_get_dim(embedding);
// Use embedding for similarity search, etc.
// ...
// Cleanup
litert_embedding_delete(embedding);
litert_embedder_delete(embedder);
litert_embedder_settings_delete(settings);
```
## Use Cases
- **Semantic search** on mobile/edge devices
- **Document similarity** without cloud dependencies
- **RAG (Retrieval Augmented Generation)** with local embeddings
- **MCP tool matching** for AI agents
- **Offline text classification**
## Limitations
1. **Tokenization**: Currently uses a simplified character-based tokenizer. For best results, integrate with SentencePiece using the Gemma tokenizer vocabulary.
2. **Model Inference**: The current wrapper uses placeholder inference. Full TFLite inference integration requires linking against the LiteRT C API.
3. **Platform Support**: Currently tested on Linux x86_64. macOS and Windows support requires platform-specific builds.
## Repository Structure
```
models/
βββ embeddinggemma-300m.litertlm # This model
βββ embeddinggemma-300m.toml # Conversion config
βββ embeddinggemma-300M_seq512_mixed-precision.tflite # Source TFLite
native-bridge/
βββ src/litert_lm_binding.cc # N-API bindings
βββ binding.gyp # Build configuration
βββ lib/index.d.ts # TypeScript definitions
deps/LiteRT-LM/c/
βββ embedder.h # C API header
βββ embedder.cc # C implementation
```
## License
This model conversion is provided under the Apache 2.0 license. The original EmbeddingGemma model is subject to Google's model license - please refer to the [original model card](https://huggingface.co/google/embeddinggemma-300m) for details.
## Acknowledgments
- **EmbeddingGemma** by Google Research
- **LiteRT-LM** by Google AI Edge team
- **TFLite Community** for the pre-converted TFLite model
## Citation
If you use this model, please cite the original EmbeddingGemma paper:
```bibtex
@article{embeddinggemma2024,
title={EmbeddingGemma: Efficient Text Embeddings from Gemma},
author={Google Research},
year={2024}
}
```
|