Manga OCR (ONNX)

This is an ONNX version of the Manga OCR model, designed for optical character recognition of Japanese text, with a primary focus on manga.

This model is based on the original work by kha-white/manga-ocr, kha-white/manga-ocr-base and modification by jzhang533/manga-ocr-base-2025. The models in this repository were exported to the ONNX format using Hugging Face Optimum.

Original Model Information

Manga OCR utilizes the Vision Encoder Decoder framework. It is designed to be a high-quality text recognition tool, robust against various scenarios specific to manga:

  • Both vertical and horizontal text
  • Text with furigana
  • Text overlaid on images
  • A wide variety of fonts and font styles
  • Low-quality images

The original training data included manga109-s and synthetic data.

Using the ONNX Models

To use these ONNX models for inference, you will need the optimum library. You can install it as follows:

pip install optimum[onnxruntime]

Here is an example of how to run inference with the ONNX models:

from transformers import TrOCRProcessor
from optimum.onnxruntime import ORTModelForVision2Seq
from PIL import Image

# Load the processor and model
processor = TrOCRProcessor.from_pretrained("l0wgear/manga-ocr-2025-onnx")
model = ORTModelForVision2Seq.from_pretrained("l0wgear/manga-ocr-2025-onnx")

# Load an image
image = Image.open("path/to/your/manga/image.jpg").convert("RGB")

# Process the image and generate text
pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(generated_text)

Acknowledgements

  • Original Author: kha-white for creating the original Manga OCR.
  • Fine-tuning: jzhang533 for training the manga-ocr-base-2025 model.
Downloads last month
210
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support