Manga OCR (ONNX)
This is an ONNX version of the Manga OCR model, designed for optical character recognition of Japanese text, with a primary focus on manga.
This model is based on the original work by kha-white/manga-ocr, kha-white/manga-ocr-base and modification by jzhang533/manga-ocr-base-2025. The models in this repository were exported to the ONNX format using Hugging Face Optimum.
Original Model Information
Manga OCR utilizes the Vision Encoder Decoder framework. It is designed to be a high-quality text recognition tool, robust against various scenarios specific to manga:
- Both vertical and horizontal text
- Text with furigana
- Text overlaid on images
- A wide variety of fonts and font styles
- Low-quality images
The original training data included manga109-s and synthetic data.
Using the ONNX Models
To use these ONNX models for inference, you will need the optimum
library. You can install it as follows:
pip install optimum[onnxruntime]
Here is an example of how to run inference with the ONNX models:
from transformers import TrOCRProcessor
from optimum.onnxruntime import ORTModelForVision2Seq
from PIL import Image
# Load the processor and model
processor = TrOCRProcessor.from_pretrained("l0wgear/manga-ocr-2025-onnx")
model = ORTModelForVision2Seq.from_pretrained("l0wgear/manga-ocr-2025-onnx")
# Load an image
image = Image.open("path/to/your/manga/image.jpg").convert("RGB")
# Process the image and generate text
pixel_values = processor(images=image, return_tensors="pt").pixel_values
generated_ids = model.generate(pixel_values)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)
Acknowledgements
- Downloads last month
- 210