Typhoon-OCR-1.5-3B-QAT
A quantization-aware trained (QAT) version of Typhoon OCR v1.5, designed for robust and efficient on-device vision-language OCR in English and Thai.
This release maintains strong accuracy while significantly improving performance when running under low-bit quantization (e.g., 4-bit), making it ideal for lightweight environments.
This model is released in bfloat16 and is intended to be used as the pre-quantization base before converting to low-bit formats.
For the 4-bit model, please use the Ollama build here:
https://ollama.com/scb10x/typhoon-ocr1.5-3b
QAT is applied on top of Qwen2.5-VL-3B, enabling improved stability and reduced degradation when deployed below 16-bit precision.
4-bit Ollama version: https://ollama.com/scb10x/typhoon-ocr1.5-3b
Base FP16 model: https://huggingface.co/scb10x/typhoon-ocr1.5-2b
Try our demo available on Demo
Code / Examples available on Github
Release Blog available on OpenTyphoon Blog
Highlights
- Quantization-Aware Training (QAT): Maintains strong OCR accuracy even under aggressive quantization.
- Optimized for On-Device Inference: Faster and more consistent performance on low-resource hardware.
- Enhanced Handwriting & Form Parsing: Retains the v1.5 improvements in handling handwritten notes, forms, irregular layouts, and structured documents.
- Supports Text-Rich & Image-Rich Documents: Effective on tables, diagrams, annotated pages, charts, receipts, and dense reports.
- Thai + English Multilingual OCR: Trained for reliable extraction across bilingual real-world documents.
Intended Use
This is a task-specific OCR model and is intended to be used only with the provided prompt format.
It does not include general VQA or safety guardrails.
Some hallucination may still occur, and users should validate outputs for production scenarios.
Quick Links
- Demo: https://ocr.opentyphoon.ai
- Code / Examples: https://github.com/scb-10x/typhoon-ocr
- Release Blog: https://opentyphoon.ai/blog/en/typhoon-ocr-release
Prompting
prompt = """Extract all text from the image.
Instructions:
- Only return the clean Markdown.
- Do not include any explanation or extra text.
- You must include all information on the page.
Formatting Rules:
- Tables: Render tables using <table>...</table> in clean HTML format.
- Equations: Render equations using LaTeX syntax with inline ($...$) and block ($$...$$).
- Images/Charts/Diagrams: Wrap any clearly defined visual areas (e.g. charts, diagrams, pictures) in:
<figure>
Describe the image's main elements (people, objects, text), note any contextual clues (place, event, culture), mention visible text and its meaning, provide deeper analysis when relevant (especially for financial charts, graphs, or documents), comment on style or architecture if relevant, then give a concise overall summary. Describe in Thai.
</figure>
- Page Numbers: Wrap page numbers in <page_number>...</page_number> (e.g., <page_number>14</page_number>).
- Checkboxes: Use โ for unchecked and โ for checked boxes."""
Quickstart (Ollama)
ollama run scb10x/typhoon-ocr1.5-3b
Support & Community
- Twitter: https://twitter.com/opentyphoon
- Discord: https://discord.gg/us5gAYmrxw
Citation
If you use Typhoon OCR or Typhoon models, please cite:
@misc{typhoon2,
title={Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models},
author={Kunat Pipatanakul et al.},
year={2024},
eprint={2412.13702},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{nonesung2025thaiocrbench,
title={ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai},
author={Surapon Nonesung et al.},
year={2025},
eprint={2511.04479},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 147