--- license: apache-2.0 language: - en - fr - de - es - it - nl - pt - sv - da base_model: lightonai/LightOnOCR-1B-1025 library_name: vllm tags: - ocr - document-understanding - vision-language - pdf - tables - forms ---

# LightOnOCR-0.9B-16k-1025 Smallest vocabulary variant with only 16k-token, ideal for European languages(English/French). **LightOnOCR-1B** is a compact, end-to-end vision–language model for Optical Character Recognition (OCR) and document understanding. It achieves state-of-the-art accuracy in its weight class while being several times faster and cheaper than larger general-purpose VLMs. 📝 **[Read the full blog post](https://huggingface.co/blog/lightonai/lightonocr/)** | 🚀 **[Try the demo](https://huggingface.co/spaces/lightonai/LightOnOCR-1B-Demo)** **Highlights** * ⚡ **Speed:** 5× faster than dots.ocr, 2× faster than PaddleOCR-VL-0.9B, 1.73× faster than DeepSeekOCR * 💸 **Efficiency:** Processes 5.71 pages/s on a single H100 (~493k pages/day) for **<$0.01 per 1,000 pages** * 🧠 **End-to-End:** Fully differentiable, no external OCR pipeline * 🧾 **Versatile:** Handles tables, receipts, forms, multi-column layouts, and math notation * 🌍 **Compact variants:** 32k and 16k vocab options for European languages --- ## Model Overview **LightOnOCR** combines a Vision Transformer encoder(Pixtral-based) with a lightweight text decoder(Qwen3-based) distilled from high-quality open VLMs. It is optimized for document parsing tasks, producing accurate, layout-aware text extraction from high-resolution pages. --- ## Benchmarks | Model | ArXiv | Old Scans | Math | Tables | Multi-Column | Tiny Text | Base | Overall | | :----------------- | :---: | :-------: | :--: | :----: | :----------: | :-------: | :--: | :-----: | | [LightOnOCR-1B-1025](https://huggingface.co/lightonai/LightOnOCR-1B-1025) (151k vocab) | 81.4 | 71.6 | 76.4 | 35.2 | 80.0 | 88.7 | 99.5 | **76.1** | | [LightOnOCR-1B-32k](https://huggingface.co/lightonai/LightOnOCR-0.9B-32k-1025) (32k vocab) | 80.6 | 66.2 | 73.5 | 33.5 | 71.2 | 87.6 | 99.5 | **73.1** | | [LightOnOCR-1B-16k](https://huggingface.co/lightonai/LightOnOCR-0.9B-16k-1025) (16k vocab) | 82.3 | 72.9 | 75.3 | 33.5 | 78.6 | 85.1 | 99.8 | **75.4** | All benchmarks evaluated using **vLLM**. --- ## Installation ```bash uv venv --python 3.12 --seed source .venv/bin/activate uv pip install -U vllm \ --torch-backend=auto \ --extra-index-url https://wheels.vllm.ai/nightly \ --prerelease=allow # if this fails try adding triton-kernels package 'triton-kernels @ git+https://github.com/triton-lang/triton.git@v3.5.0#subdirectory=python/triton_kernels' uv pip install pypdfium2 pillow requests ``` ## Start Server ```bash vllm serve lightonai/LightOnOCR-0.9B-16k-1025 \ --limit-mm-per-prompt '{"image": 1}' \ --async-scheduling ``` ## PDF Inference ```python import base64 import requests import pypdfium2 as pdfium import io ENDPOINT = "http://localhost:8000/v1/chat/completions" MODEL = "lightonai/LightOnOCR-0.9B-16k-1025" # Download PDF from arXiv pdf_url = "https://arxiv.org/pdf/2412.13663" pdf_data = requests.get(pdf_url).content # Open PDF and convert first page to image pdf = pdfium.PdfDocument(pdf_data) page = pdf[0] # Render at 200 DPI (scale factor = 200/72 ≈ 2.77) pil_image = page.render(scale=2.77).to_pil() # Convert to base64 buffer = io.BytesIO() pil_image.save(buffer, format="PNG") image_base64 = base64.b64encode(buffer.getvalue()).decode('utf-8') # Make request payload = { "model": MODEL, "messages": [{ "role": "user", "content": [{ "type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_base64}"} }] }], "max_tokens": 4096, "temperature": 0.2, "top_p": 0.9, } response = requests.post(ENDPOINT, json=payload) text = response.json()['choices'][0]['message']['content'] print(text) ``` --- ## Rendering and Preprocessing Tips * Render PDFs to **PNG** or **JPEG** at a target longest dimension of **1280–1300 px** * Maintain aspect ratio to preserve text geometry * LightOnOCR is robust to moderate skew; heavy rotation correction is optional * Use one image per page; batching supported by vLLM --- ## Variants | Variant | Description | | :--------------------------------------------------------------------------------- | :-------------------------------------------- | | **[LightOnOCR-1B-1025](https://huggingface.co/lightonai/LightOnOCR-1B-1025)** | Full multilingual model (default) | | **[LightOnOCR-1B-32k](https://huggingface.co/lightonai/LightOnOCR-0.9B-32k-1025)** | Fastest pruned-vocabulary version (32k tokens) optimized for European languages | | **[LightOnOCR-1B-16k](https://huggingface.co/lightonai/LightOnOCR-0.9B-16k-1025)** | Most compact variant with smallest vocabulary | --- ## Fine-tuning **Transformers integration is coming soon for training.** LightOnOCR is fully differentiable and supports: * LoRA fine-tuning * Domain adaptation (receipts, scientific articles, forms, etc.) * Multilingual fine-tuning with task-specific corpora Example fine-tuning configurations will be released alongside the dataset. --- ## Data Trained on a diverse large-scale PDF corpus covering: * Scientific papers, books, receipts, invoices, tables, forms, and handwritten text * Multiple languages (Latin alphabet dominant) * Real and synthetic document scans The dataset will be released under an open license. --- ## License Apache License 2.0 --- ## Citation ``` @misc{lightonocr2025, title = {LightOnOCR-1B: End-to-End and Efficient Domain-Specific Vision-Language Models for OCR}, author = {Said Taghadouini and Baptiste Aubertin and Adrien Cavaillès}, year = {2025}, howpublished = {\url{https://huggingface.co/blog/lightonai/lightonocr}} } ```