DeepSeek-OCR Google Colab Notebook

A ready-to-use Google Colab notebook for running DeepSeek-OCR, a state-of-the-art optical character recognition model that converts images and documents to markdown format with high accuracy.

πŸš€ Quick Start

Open in Google Colab

Click the badge below to open the notebook directly in Google Colab:

Open In Colab

Or download the notebook from this repository and upload to Google Colab manually.

Steps:

  1. Click the "Open in Colab" badge above
  2. Select Runtime β†’ Change runtime type β†’ GPU (T4 or better recommended)
  3. Run all cells sequentially
  4. Upload your image when prompted
  5. Get markdown-formatted text output

✨ Features

  • Easy Setup: One-click deployment on Google Colab
  • GPU Acceleration: Optimized for NVIDIA GPUs (T4, L4, A100, V100)
  • Flexible Processing: Single image or batch processing support
  • High Quality OCR: Converts documents to markdown with text detection and grounding
  • Multiple Resolution Modes: Tiny, Small, Base, Large, and Gundam (cropped) modes
  • Real-time Preview: View uploaded images before processing

πŸ“‹ Requirements

For Google Colab:

  • GPU Runtime (T4 or better recommended)
  • ~15-20 minutes setup time
  • ~23GB GPU memory (L4 or equivalent)

For Local Setup:

  • NVIDIA GPU with CUDA support (12.1+)
  • Python 3.8+
  • PyTorch 2.0+
  • 22GB+ GPU VRAM

πŸ’‘ Usage

Single Image Processing

  1. Upload your image in the designated cell
  2. Run the inference cell to process the image
  3. Download results from the output directory

Example prompt:

prompt = "<image>\n<|grounding|>Convert the document to markdown."

Batch Processing

Process multiple images at once with automatic iteration through uploaded files. Results are saved to the output directory.

βš™οΈ Model Configuration

The notebook supports different processing modes:

Mode base_size image_size crop_mode Use Case
Tiny 512 512 False Quick processing, lower quality
Small 640 640 False Balanced speed/quality
Base 1024 1024 False Standard quality
Large 1280 1280 False High quality, slower
Gundam 1024 640 True Recommended (cropped processing)

Default Configuration (Recommended):

base_size = 1024
image_size = 640
crop_mode = True

πŸ“€ Output Format

The model outputs:

  • Markdown formatted text with proper heading structure
  • Bounding box coordinates for detected elements (<|det|>)
  • Element references (<|ref|>) for text, titles, tables, etc.
  • Tables converted to markdown format
  • Compression ratio metrics for analysis

Example output structure:

<|ref|>text<|/ref|><|det|>[[x1, y1, x2, y2]]<|/det|>
Extracted text content here...

<|ref|>sub_title<|/ref|><|det|>[[x1, y1, x2, y2]]<|/det|>
## Heading Text

πŸ› οΈ Troubleshooting

Out of Memory (OOM)

  • Use a higher-tier GPU (A100, V100)
  • Reduce image resolution before processing
  • Use smaller processing modes (Tiny or Small)

Flash Attention Installation Fails

  • The notebook removes attn_implementation='flash_attention_2' by default
  • Standard attention mechanism is used as fallback

Model Download Slow

  • First download takes 10-15 minutes (normal)
  • Model is cached after first download
  • Check your Colab internet connection

Image Format Issues

# Ensure RGB format
from PIL import Image
img = Image.open('image.png').convert('RGB')

πŸ“Š Performance Tips

  1. Image Resolution: Use native resolutions (512, 640, 1024, 1280) for best results
  2. Batch Processing: More efficient for multiple images
  3. GPU Selection: L4 or better recommended for faster processing
  4. Compression: Enable test_compress=True to see compression metrics

πŸ“ Repository Files

  • DeepSeek_OCR_Colab.ipynb - Main Google Colab notebook
  • requirements.txt - Python dependencies
  • LICENSE - MIT License
  • README.md - This documentation

πŸ™ Credits

Based on the official DeepSeek-OCR repository:

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

The DeepSeek-OCR model itself is subject to its own license terms from DeepSeek AI.

πŸ”— Links

πŸ“– Citation

If you use this notebook in your research, please cite the original DeepSeek-OCR paper:

@article{deepseek2024ocr,
  title={DeepSeek-OCR: High-Accuracy Document OCR},
  author={DeepSeek AI},
  year={2024}
}

Note: This is a community-contributed notebook wrapper for the DeepSeek-OCR model. For the official model and implementation, please visit the DeepSeek-OCR repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support