--- license: mit tags: - ocr - document-processing - computer-vision - deepseek - colab - jupyter - optical-character-recognition - text-detection - document-to-markdown - notebook library_name: transformers pipeline_tag: image-to-text --- # DeepSeek-OCR Google Colab Notebook A ready-to-use Google Colab notebook for running DeepSeek-OCR, a state-of-the-art optical character recognition model that converts images and documents to markdown format with high accuracy. ## 🚀 Quick Start ### Open in Google Colab Click the badge below to open the notebook directly in Google Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ahczhg/DeepSeek-OCR-Colab/blob/main/DeepSeek_OCR_Colab.ipynb) **Or download the notebook from this repository and upload to Google Colab manually.** ### Steps: 1. Click the "Open in Colab" badge above 2. Select **Runtime → Change runtime type → GPU** (T4 or better recommended) 3. Run all cells sequentially 4. Upload your image when prompted 5. Get markdown-formatted text output ## ✨ Features - **Easy Setup**: One-click deployment on Google Colab - **GPU Acceleration**: Optimized for NVIDIA GPUs (T4, L4, A100, V100) - **Flexible Processing**: Single image or batch processing support - **High Quality OCR**: Converts documents to markdown with text detection and grounding - **Multiple Resolution Modes**: Tiny, Small, Base, Large, and Gundam (cropped) modes - **Real-time Preview**: View uploaded images before processing ## 📋 Requirements ### For Google Colab: - GPU Runtime (T4 or better recommended) - ~15-20 minutes setup time - ~23GB GPU memory (L4 or equivalent) ### For Local Setup: - NVIDIA GPU with CUDA support (12.1+) - Python 3.8+ - PyTorch 2.0+ - 22GB+ GPU VRAM ## 💡 Usage ### Single Image Processing 1. **Upload your image** in the designated cell 2. **Run the inference cell** to process the image 3. **Download results** from the output directory Example prompt: ```python prompt = "\n<|grounding|>Convert the document to markdown." ``` ### Batch Processing Process multiple images at once with automatic iteration through uploaded files. Results are saved to the output directory. ## ⚙️ Model Configuration The notebook supports different processing modes: | Mode | base_size | image_size | crop_mode | Use Case | |------|-----------|------------|-----------|----------| | Tiny | 512 | 512 | False | Quick processing, lower quality | | Small | 640 | 640 | False | Balanced speed/quality | | Base | 1024 | 1024 | False | Standard quality | | Large | 1280 | 1280 | False | High quality, slower | | Gundam | 1024 | 640 | True | Recommended (cropped processing) | **Default Configuration (Recommended):** ```python base_size = 1024 image_size = 640 crop_mode = True ``` ## 📤 Output Format The model outputs: - **Markdown formatted text** with proper heading structure - **Bounding box coordinates** for detected elements (`<|det|>`) - **Element references** (`<|ref|>`) for text, titles, tables, etc. - **Tables** converted to markdown format - **Compression ratio** metrics for analysis Example output structure: ``` <|ref|>text<|/ref|><|det|>[[x1, y1, x2, y2]]<|/det|> Extracted text content here... <|ref|>sub_title<|/ref|><|det|>[[x1, y1, x2, y2]]<|/det|> ## Heading Text ``` ## 🛠️ Troubleshooting ### Out of Memory (OOM) - Use a higher-tier GPU (A100, V100) - Reduce image resolution before processing - Use smaller processing modes (Tiny or Small) ### Flash Attention Installation Fails - The notebook removes `attn_implementation='flash_attention_2'` by default - Standard attention mechanism is used as fallback ### Model Download Slow - First download takes 10-15 minutes (normal) - Model is cached after first download - Check your Colab internet connection ### Image Format Issues ```python # Ensure RGB format from PIL import Image img = Image.open('image.png').convert('RGB') ``` ## 📊 Performance Tips 1. **Image Resolution**: Use native resolutions (512, 640, 1024, 1280) for best results 2. **Batch Processing**: More efficient for multiple images 3. **GPU Selection**: L4 or better recommended for faster processing 4. **Compression**: Enable `test_compress=True` to see compression metrics ## 📁 Repository Files - `DeepSeek_OCR_Colab.ipynb` - Main Google Colab notebook - `requirements.txt` - Python dependencies - `LICENSE` - MIT License - `README.md` - This documentation ## 🙏 Credits Based on the official DeepSeek-OCR repository: - **Repository**: [deepseek-ai/DeepSeek-OCR](https://github.com/deepseek-ai/DeepSeek-OCR) - **Model**: [deepseek-ai/DeepSeek-OCR on HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-OCR) - **Paper**: DeepSeek-OCR: High-Accuracy Document OCR ## 📄 License This project is licensed under the MIT License - see the LICENSE file for details. The DeepSeek-OCR model itself is subject to its own license terms from DeepSeek AI. ## 🔗 Links - **Hugging Face Model**: [https://huggingface.co/ahczhg/DeepSeek-OCR-Colab](https://huggingface.co/ahczhg/DeepSeek-OCR-Colab) - **Hugging Face Space**: [https://huggingface.co/spaces/ahczhg/DeepSeek-OCR-Colab](https://huggingface.co/spaces/ahczhg/DeepSeek-OCR-Colab) - **Open in Colab**: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ahczhg/DeepSeek-OCR-Colab/blob/main/DeepSeek_OCR_Colab.ipynb) ## 📖 Citation If you use this notebook in your research, please cite the original DeepSeek-OCR paper: ```bibtex @article{deepseek2024ocr, title={DeepSeek-OCR: High-Accuracy Document OCR}, author={DeepSeek AI}, year={2024} } ``` --- **Note**: This is a community-contributed notebook wrapper for the DeepSeek-OCR model. For the official model and implementation, please visit the [DeepSeek-OCR repository](https://github.com/deepseek-ai/DeepSeek-OCR).