DeepSeek-OCR Google Colab Notebook

A ready-to-use Google Colab notebook for running DeepSeek-OCR, a state-of-the-art optical character recognition model that converts images and documents to markdown format with high accuracy.

🚀 Quick Start

Open in Google Colab

Click the badge below to open the notebook directly in Google Colab:

Or download the notebook from this repository and upload to Google Colab manually.

Steps:

Click the "Open in Colab" badge above
Select Runtime → Change runtime type → GPU (T4 or better recommended)
Run all cells sequentially
Upload your image when prompted
Get markdown-formatted text output

✨ Features

Easy Setup: One-click deployment on Google Colab
GPU Acceleration: Optimized for NVIDIA GPUs (T4, L4, A100, V100)
Flexible Processing: Single image or batch processing support
High Quality OCR: Converts documents to markdown with text detection and grounding
Multiple Resolution Modes: Tiny, Small, Base, Large, and Gundam (cropped) modes
Real-time Preview: View uploaded images before processing

📋 Requirements

For Google Colab:

GPU Runtime (T4 or better recommended)
~15-20 minutes setup time
~23GB GPU memory (L4 or equivalent)

For Local Setup:

NVIDIA GPU with CUDA support (12.1+)
Python 3.8+
PyTorch 2.0+
22GB+ GPU VRAM

💡 Usage

Single Image Processing

Upload your image in the designated cell
Run the inference cell to process the image
Download results from the output directory

Example prompt:

prompt = "<image>\n<|grounding|>Convert the document to markdown."

Batch Processing

Process multiple images at once with automatic iteration through uploaded files. Results are saved to the output directory.

⚙️ Model Configuration

The notebook supports different processing modes:

Mode	base_size	image_size	crop_mode	Use Case
Tiny	512	512	False	Quick processing, lower quality
Small	640	640	False	Balanced speed/quality
Base	1024	1024	False	Standard quality
Large	1280	1280	False	High quality, slower
Gundam	1024	640	True	Recommended (cropped processing)

Default Configuration (Recommended):

base_size = 1024
image_size = 640
crop_mode = True

📤 Output Format

The model outputs:

Markdown formatted text with proper heading structure
Bounding box coordinates for detected elements (<|det|>)
Element references (<|ref|>) for text, titles, tables, etc.
Tables converted to markdown format
Compression ratio metrics for analysis

Example output structure:

<|ref|>text<|/ref|><|det|>[[x1, y1, x2, y2]]<|/det|>
Extracted text content here...

<|ref|>sub_title<|/ref|><|det|>[[x1, y1, x2, y2]]<|/det|>
## Heading Text

🛠️ Troubleshooting

Out of Memory (OOM)

Use a higher-tier GPU (A100, V100)
Reduce image resolution before processing
Use smaller processing modes (Tiny or Small)

Flash Attention Installation Fails

The notebook removes attn_implementation='flash_attention_2' by default
Standard attention mechanism is used as fallback

Model Download Slow

First download takes 10-15 minutes (normal)
Model is cached after first download
Check your Colab internet connection

Image Format Issues

# Ensure RGB format
from PIL import Image
img = Image.open('image.png').convert('RGB')

📊 Performance Tips

Image Resolution: Use native resolutions (512, 640, 1024, 1280) for best results
Batch Processing: More efficient for multiple images
GPU Selection: L4 or better recommended for faster processing
Compression: Enable test_compress=True to see compression metrics

📁 Repository Files

DeepSeek_OCR_Colab.ipynb - Main Google Colab notebook
requirements.txt - Python dependencies
LICENSE - MIT License
README.md - This documentation

🙏 Credits

Based on the official DeepSeek-OCR repository:

Repository: deepseek-ai/DeepSeek-OCR
Model: deepseek-ai/DeepSeek-OCR on HuggingFace
Paper: DeepSeek-OCR: High-Accuracy Document OCR

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

The DeepSeek-OCR model itself is subject to its own license terms from DeepSeek AI.

🔗 Links

Hugging Face Model: https://huggingface.co/ahczhg/DeepSeek-OCR-Colab
Hugging Face Space: https://huggingface.co/spaces/ahczhg/DeepSeek-OCR-Colab
Open in Colab:

📖 Citation

If you use this notebook in your research, please cite the original DeepSeek-OCR paper:

@article{deepseek2024ocr,
  title={DeepSeek-OCR: High-Accuracy Document OCR},
  author={DeepSeek AI},
  year={2024}
}

Note: This is a community-contributed notebook wrapper for the DeepSeek-OCR model. For the official model and implementation, please visit the DeepSeek-OCR repository.

Downloads last month: -; Downloads are not tracked for this model. How to track