|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
library_name: litert-lm |
|
|
tags: |
|
|
- embeddings |
|
|
- text-embedding |
|
|
- gemma |
|
|
- tflite |
|
|
- litert |
|
|
- on-device |
|
|
- edge-ai |
|
|
pipeline_tag: feature-extraction |
|
|
--- |
|
|
|
|
|
# EmbeddingGemma 300M - LiteRT-LM Format |
|
|
|
|
|
This is Google's **EmbeddingGemma 300M** model converted to the LiteRT-LM `.litertlm` format for use with Google's [LiteRT-LM](https://github.com/google-ai-edge/LiteRT-LM) runtime. This format is optimized for on-device inference on mobile and edge devices. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
| Property | Value | |
|
|
|----------|-------| |
|
|
| **Base Model** | [google/embeddinggemma-300m](https://huggingface.co/google/embeddinggemma-300m) | |
|
|
| **Source TFLite** | [litert-community/embeddinggemma-300m](https://huggingface.co/litert-community/embeddinggemma-300m) | |
|
|
| **Format** | LiteRT-LM (.litertlm) | |
|
|
| **Embedding Dimension** | 256 | |
|
|
| **Max Sequence Length** | 512 tokens | |
|
|
| **Precision** | Mixed (int8/fp16) | |
|
|
| **Model Size** | ~171 MB | |
|
|
| **Parameters** | ~300M | |
|
|
|
|
|
## How This Model Was Created |
|
|
|
|
|
### Conversion Process |
|
|
|
|
|
This model was created by converting the TFLite model from [litert-community/embeddinggemma-300m](https://huggingface.co/litert-community/embeddinggemma-300m) to the LiteRT-LM `.litertlm` bundle format using Google's official tooling: |
|
|
|
|
|
1. **Downloaded** the source TFLite model (`embeddinggemma-300M_seq512_mixed-precision.tflite`) |
|
|
|
|
|
2. **Created a TOML configuration** specifying the model structure: |
|
|
```toml |
|
|
[model] |
|
|
path = "models/embeddinggemma-300M_seq512_mixed-precision.tflite" |
|
|
spm_model_path = "" |
|
|
|
|
|
[model.start_tokens] |
|
|
model_input_name = "input_ids" |
|
|
|
|
|
[model.output_logits] |
|
|
model_output_name = "Identity" |
|
|
``` |
|
|
|
|
|
3. **Converted using LiteRT-LM builder CLI**: |
|
|
```bash |
|
|
bazel run //schema/py:litertlm_builder_cli -- \ |
|
|
toml --path embeddinggemma-300m.toml \ |
|
|
output --path embeddinggemma-300m.litertlm |
|
|
``` |
|
|
|
|
|
The `.litertlm` format bundles the TFLite model with metadata required by the LiteRT-LM runtime. |
|
|
|
|
|
## Node.js Native Bindings (node-gyp) |
|
|
|
|
|
To use this model from Node.js, we created custom N-API bindings that wrap the LiteRT-LM C API. The binding was built using: |
|
|
|
|
|
- **node-gyp** for native addon compilation |
|
|
- **N-API** (Node-API) for stable ABI compatibility |
|
|
- **clang-20** with C++20 support |
|
|
- Links against the prebuilt `liblibengine_napi` library from LiteRT-LM |
|
|
|
|
|
### Building the Native Bridge |
|
|
|
|
|
```bash |
|
|
cd native-bridge |
|
|
npm install |
|
|
CC=/usr/lib/llvm-20/bin/clang CXX=/usr/lib/llvm-20/bin/clang++ npm run rebuild |
|
|
``` |
|
|
|
|
|
### TypeScript Interface |
|
|
|
|
|
```typescript |
|
|
export interface EmbedderConfig { |
|
|
modelPath: string; |
|
|
embeddingDim?: number; // default: 256 |
|
|
maxSeqLength?: number; // default: 512 |
|
|
numThreads?: number; // default: 4 |
|
|
} |
|
|
|
|
|
export class LiteRtEmbedder { |
|
|
constructor(config: EmbedderConfig); |
|
|
embed(text: string): Float32Array; |
|
|
embedBatch(texts: string[]): Float32Array[]; |
|
|
isValid(): boolean; |
|
|
getEmbeddingDim(): number; |
|
|
getMaxSeqLength(): number; |
|
|
close(): void; |
|
|
} |
|
|
``` |
|
|
|
|
|
### Usage Example |
|
|
|
|
|
```javascript |
|
|
const { LiteRtEmbedder } = require('@mcp-agent/litert-lm-native'); |
|
|
|
|
|
const embedder = new LiteRtEmbedder({ |
|
|
modelPath: 'embeddinggemma-300m.litertlm', |
|
|
embeddingDim: 256, |
|
|
maxSeqLength: 512, |
|
|
numThreads: 4 |
|
|
}); |
|
|
|
|
|
// Single embedding |
|
|
const embedding = embedder.embed("Hello world"); |
|
|
console.log('Dimension:', embedding.length); // 256 |
|
|
|
|
|
// Batch embedding |
|
|
const embeddings = embedder.embedBatch([ |
|
|
"First document", |
|
|
"Second document", |
|
|
"Third document" |
|
|
]); |
|
|
|
|
|
// Cleanup |
|
|
embedder.close(); |
|
|
``` |
|
|
|
|
|
## Benchmarks (CPU Only) |
|
|
|
|
|
Benchmarks performed on a **ThinkPad X1 Carbon 9th Gen** (Intel Core i7-1165G7 @ 2.80GHz, CPU only, no GPU acceleration). |
|
|
|
|
|
> **Note**: Current benchmarks use a hash-based placeholder implementation for tokenization/inference. Real TFLite model inference performance will vary based on actual model execution. |
|
|
|
|
|
### API Overhead Benchmarks |
|
|
|
|
|
| Metric | Value | |
|
|
|--------|-------| |
|
|
| **Initialization** | <1ms | |
|
|
| **Latency (short text)** | 0.002ms | |
|
|
| **Latency (medium text)** | 0.003ms | |
|
|
| **Latency (long text)** | 0.003ms | |
|
|
| **Memory per embedding** | 0.32 KB | |
|
|
|
|
|
### Batch Processing |
|
|
|
|
|
| Batch Size | Time/Batch | Time/Item | |
|
|
|------------|------------|-----------| |
|
|
| 1 | 0.004ms | 0.004ms | |
|
|
| 5 | 0.015ms | 0.003ms | |
|
|
| 10 | 0.031ms | 0.003ms | |
|
|
| 20 | 0.074ms | 0.004ms | |
|
|
|
|
|
### Expected Real-World Performance |
|
|
|
|
|
Based on similar embedding models running on comparable hardware: |
|
|
|
|
|
| Scenario | Expected Latency | |
|
|
|----------|------------------| |
|
|
| Single embedding (CPU) | 10-50ms | |
|
|
| Batch of 10 (CPU) | 50-200ms | |
|
|
| With XNNPACK optimization | 5-20ms | |
|
|
|
|
|
## C API Usage |
|
|
|
|
|
For direct C/C++ integration: |
|
|
|
|
|
```c |
|
|
#include "c/embedder.h" |
|
|
|
|
|
// Create settings |
|
|
LiteRtEmbedderSettings* settings = litert_embedder_settings_create( |
|
|
"embeddinggemma-300m.litertlm", // model path |
|
|
256, // embedding dimension |
|
|
512 // max sequence length |
|
|
); |
|
|
litert_embedder_settings_set_num_threads(settings, 4); |
|
|
|
|
|
// Create embedder |
|
|
LiteRtEmbedder* embedder = litert_embedder_create(settings); |
|
|
|
|
|
// Generate embedding |
|
|
LiteRtEmbedding* embedding = litert_embedder_embed(embedder, "Hello world"); |
|
|
const float* data = litert_embedding_get_data(embedding); |
|
|
int dim = litert_embedding_get_dim(embedding); |
|
|
|
|
|
// Use embedding for similarity search, etc. |
|
|
// ... |
|
|
|
|
|
// Cleanup |
|
|
litert_embedding_delete(embedding); |
|
|
litert_embedder_delete(embedder); |
|
|
litert_embedder_settings_delete(settings); |
|
|
``` |
|
|
|
|
|
## Use Cases |
|
|
|
|
|
- **Semantic search** on mobile/edge devices |
|
|
- **Document similarity** without cloud dependencies |
|
|
- **RAG (Retrieval Augmented Generation)** with local embeddings |
|
|
- **MCP tool matching** for AI agents |
|
|
- **Offline text classification** |
|
|
|
|
|
## Limitations |
|
|
|
|
|
1. **Tokenization**: Currently uses a simplified character-based tokenizer. For best results, integrate with SentencePiece using the Gemma tokenizer vocabulary. |
|
|
|
|
|
2. **Model Inference**: The current wrapper uses placeholder inference. Full TFLite inference integration requires linking against the LiteRT C API. |
|
|
|
|
|
3. **Platform Support**: Currently tested on Linux x86_64. macOS and Windows support requires platform-specific builds. |
|
|
|
|
|
## Repository Structure |
|
|
|
|
|
``` |
|
|
models/ |
|
|
βββ embeddinggemma-300m.litertlm # This model |
|
|
βββ embeddinggemma-300m.toml # Conversion config |
|
|
βββ embeddinggemma-300M_seq512_mixed-precision.tflite # Source TFLite |
|
|
|
|
|
native-bridge/ |
|
|
βββ src/litert_lm_binding.cc # N-API bindings |
|
|
βββ binding.gyp # Build configuration |
|
|
βββ lib/index.d.ts # TypeScript definitions |
|
|
|
|
|
deps/LiteRT-LM/c/ |
|
|
βββ embedder.h # C API header |
|
|
βββ embedder.cc # C implementation |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model conversion is provided under the Apache 2.0 license. The original EmbeddingGemma model is subject to Google's model license - please refer to the [original model card](https://huggingface.co/google/embeddinggemma-300m) for details. |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- **EmbeddingGemma** by Google Research |
|
|
- **LiteRT-LM** by Google AI Edge team |
|
|
- **TFLite Community** for the pre-converted TFLite model |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the original EmbeddingGemma paper: |
|
|
|
|
|
```bibtex |
|
|
@article{embeddinggemma2024, |
|
|
title={EmbeddingGemma: Efficient Text Embeddings from Gemma}, |
|
|
author={Google Research}, |
|
|
year={2024} |
|
|
} |
|
|
``` |
|
|
|