Typhoon-OCR-1.5-3B-QAT

A quantization-aware trained (QAT) version of Typhoon OCR v1.5, designed for robust and efficient on-device vision-language OCR in English and Thai.
This release maintains strong accuracy while significantly improving performance when running under low-bit quantization (e.g., 4-bit), making it ideal for lightweight environments.

This model is released in bfloat16 and is intended to be used as the pre-quantization base before converting to low-bit formats.
For the 4-bit model, please use the Ollama build here:
https://ollama.com/scb10x/typhoon-ocr1.5-3b

QAT is applied on top of Qwen2.5-VL-3B, enabling improved stability and reduced degradation when deployed below 16-bit precision.

4-bit Ollama version: https://ollama.com/scb10x/typhoon-ocr1.5-3b
Base FP16 model: https://huggingface.co/scb10x/typhoon-ocr1.5-2b

Try our demo available on Demo

Code / Examples available on Github

Release Blog available on OpenTyphoon Blog

Highlights

Quantization-Aware Training (QAT): Maintains strong OCR accuracy even under aggressive quantization.
Optimized for On-Device Inference: Faster and more consistent performance on low-resource hardware.
Enhanced Handwriting & Form Parsing: Retains the v1.5 improvements in handling handwritten notes, forms, irregular layouts, and structured documents.
Supports Text-Rich & Image-Rich Documents: Effective on tables, diagrams, annotated pages, charts, receipts, and dense reports.
Thai + English Multilingual OCR: Trained for reliable extraction across bilingual real-world documents.

Intended Use

This is a task-specific OCR model and is intended to be used only with the provided prompt format.
It does not include general VQA or safety guardrails.
Some hallucination may still occur, and users should validate outputs for production scenarios.

Quick Links

Demo: https://ocr.opentyphoon.ai
Code / Examples: https://github.com/scb-10x/typhoon-ocr
Release Blog: https://opentyphoon.ai/blog/en/typhoon-ocr-release

Prompting

prompt = """Extract all text from the image.

Instructions:
- Only return the clean Markdown.
- Do not include any explanation or extra text.
- You must include all information on the page.

Formatting Rules:
- Tables: Render tables using <table>...</table> in clean HTML format.
- Equations: Render equations using LaTeX syntax with inline ($...$) and block ($$...$$).
- Images/Charts/Diagrams: Wrap any clearly defined visual areas (e.g. charts, diagrams, pictures) in:

<figure>
Describe the image's main elements (people, objects, text), note any contextual clues (place, event, culture), mention visible text and its meaning, provide deeper analysis when relevant (especially for financial charts, graphs, or documents), comment on style or architecture if relevant, then give a concise overall summary. Describe in Thai.
</figure>

- Page Numbers: Wrap page numbers in <page_number>...</page_number> (e.g., <page_number>14</page_number>).
- Checkboxes: Use ☐ for unchecked and ☑ for checked boxes."""

Quickstart (Ollama)

ollama run scb10x/typhoon-ocr1.5-3b

Support & Community

Twitter: https://twitter.com/opentyphoon
Discord: https://discord.gg/us5gAYmrxw

Citation

If you use Typhoon OCR or Typhoon models, please cite:

@misc{typhoon2,
  title={Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models},
  author={Kunat Pipatanakul et al.},
  year={2024},
  eprint={2412.13702},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

@misc{nonesung2025thaiocrbench,
  title={ThaiOCRBench: A Task-Diverse Benchmark for Vision-Language Understanding in Thai},
  author={Surapon Nonesung et al.},
  year={2025},
  eprint={2511.04479},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

Downloads last month: 147

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for scb10x/typhoon-ocr1.5-3b-qat

Quantizations

2 models