Update to Gradio-based PDF comparison tool with advanced features

- Replace Flask app with modern Gradio interface
- Add comprehensive PDF analysis features:
- Visual difference detection with bounding boxes
- OCR and spell checking capabilities
- Barcode/QR code detection and validation
- CMYK color analysis for print workflows
- Update dependencies for compatibility
- Add detailed README for Hugging Face Space

Files changed (3) hide show

ProofCheck/README.md +47 -95
ProofCheck/pdf_comparator.py +0 -0
ProofCheck/requirements.txt +8 -20

ProofCheck/README.md CHANGED Viewed

@@ -1,117 +1,69 @@
 ---
 title: PDF Comparison Tool
-emoji: 📄
 colorFrom: blue
 colorTo: purple
-sdk: docker
 pinned: false
 license: mit
 ---
-# PDF Comparison Tool
-A comprehensive web-based tool for comparing PDF documents with advanced features including OCR validation, color difference detection, spelling verification, and barcode/QR code detection.
-## 🚀 Live Demo
-This tool is deployed on Hugging Face Spaces and available for immediate use!
-## ✨ Features
-- **PDF Validation**: Ensures uploaded PDFs contain "50 Carroll" using OCR
-- **Color Difference Detection**: Identifies visual differences between PDFs and highlights them with red boxes
-- **Spelling Verification**: Checks text against both English and French dictionaries
-- **Barcode/QR Code Detection**: Automatically detects and reads barcodes and QR codes
-- **Visual Comparison**: Side-by-side comparison with annotated differences
-- **Modern Web Interface**: Responsive design with Bootstrap and custom styling
-## 📋 Requirements
-- Both PDF files must contain the text "50 Carroll" for validation
-- Maximum file size: 16MB per PDF
-- Supported format: PDF only
-## 🎯 How to Use
-1. **Upload PDFs**: Select two PDF files for comparison
-2. **Validation**: The tool automatically checks for "50 Carroll" in both documents
-3. **Processing**: Wait for the analysis to complete (may take a few minutes)
-4. **Results**: View findings in three organized tabs:
-   - **Visual Comparison**: Side-by-side view with red boxes highlighting differences
-   - **Spelling Issues**: Table of spelling errors with suggestions from English and French dictionaries
-   - **Barcodes & QR Codes**: List of detected barcodes with their data and positions
-## 🔧 Technical Details
-### Backend Technologies
-- **Python Flask**: Web framework
-- **OpenCV**: Image processing and comparison
-- **Tesseract OCR**: Text extraction from PDFs
-- **scikit-image**: Structural similarity analysis
-- **pyspellchecker**: Spelling verification
-- **pyzbar**: Barcode and QR code detection
-### Frontend Technologies
-- **HTML5/CSS3**: Modern responsive design
-- **JavaScript**: Dynamic content and AJAX requests
-- **Bootstrap**: UI framework for professional appearance
-### Comparison Algorithms
-- **Color Difference**: Uses Structural Similarity Index (SSIM) for pixel-level comparison
-- **Text Analysis**: OCR-based text extraction with multi-language spell checking
-- **Barcode Detection**: Automatic recognition of various barcode and QR code formats
-## 🛠️ Local Development
-If you want to run this tool locally:
-```bash
-# Clone the repository
-git clone https://huggingface.co/spaces/Digitaljoint/ProofCheck
-# Install dependencies
-pip install -r requirements.txt
-# Install Tesseract OCR
-# macOS: brew install tesseract
-# Ubuntu: sudo apt-get install tesseract-ocr
-# Run the application
-python app.py
-```
-## 📊 Output Examples
-### Visual Comparison
-- Red rectangles highlight color differences between PDFs
-- Side-by-side view for easy comparison
-- Page-by-page analysis
-### Spelling Issues
-- Word-by-word analysis against English and French dictionaries
-- Spelling suggestions for both languages
-- Organized table format with original text and corrections
-### Barcode/QR Code Detection
-- Automatic detection of various barcode formats
-- Extracted data display
-- Position information for each detected code
-## 🔒 Privacy & Security
-- All processing happens locally on the server
-- No data is stored permanently
-- Files are automatically cleaned up after processing
-- No external API calls or data sharing
-## 🤝 Contributing
-This tool is open source and contributions are welcome! Please feel free to submit issues or pull requests.
-## 📄 License
-This project is available under the MIT License.
----
-**Note**: This tool is specifically designed to validate PDFs containing "50 Carroll" and will reject files that don't contain this text. This ensures that only relevant documents are processed for comparison.

 ---
 title: PDF Comparison Tool
+emoji: 🔍
 colorFrom: blue
 colorTo: purple
+sdk: gradio
+sdk_version: 5.44.1
+app_file: pdf_comparator.py
 pinned: false
 license: mit
+short_description: Advanced PDF comparison tool with visual differences, OCR, barcodes, and CMYK analysis
 ---
+# 🔍 Advanced PDF Comparison Tool
+Upload two PDF files to get comprehensive analysis including:
+- **Visual differences** with bounding boxes
+- **OCR and spell checking**
+- **Barcode/QR code detection**
+- **CMYK color analysis**
+## Features
+### Visual Analysis
+- Pixel-level difference detection
+- Bounding box visualization for changes
+- Red overlay highlighting differences
+### OCR & Text Analysis
+- Automatic text extraction from PDFs
+- Spell checking with multi-language support
+- Misspelling detection with visual indicators
+### Barcode Detection
+- QR code and barcode recognition
+- Multiple symbology support (EAN, UPC, DataBar, etc.)
+- Validation and data extraction
+### Print Workflow Support
+- CMYK color analysis for print workflows
+- Color difference quantification
+- Print-ready color breakdowns
+## Usage
+1. Upload two PDF files using the file inputs
+2. Click "Compare PDF Files" to start analysis
+3. View results with comprehensive visualizations
+4. Check barcode detection results in the data tables
+## Color Legend
+- **🔴 Red boxes:** Visual differences between files
+- **🔵 Cyan boxes:** Potential spelling errors (OCR)
+- **🟢 Green boxes:** Detected barcodes/QR codes
+- **📊 Side panel:** CMYK color analysis for print workflows
+## Technical Details
+Built with:
+- Gradio for the web interface
+- OpenCV and PIL for image processing
+- Tesseract for OCR
+- PyZbar for barcode detection
+- Scikit-image for advanced image analysis
+## License
+MIT License - feel free to use and modify for your needs.

ProofCheck/pdf_comparator.py CHANGED Viewed

The diff for this file is too large to render. See raw diff

ProofCheck/requirements.txt CHANGED Viewed

@@ -1,20 +1,8 @@
-Flask==2.3.3
-Werkzeug==2.3.7
-PyPDF2==3.0.1
-pdf2image==1.16.3
-Pillow==10.0.1
-opencv-python==4.8.1.78
-pytesseract==0.3.10
-pyzbar==0.1.9
-pyspellchecker==0.7.2
-nltk==3.8.1
-numpy==1.24.3
-scikit-image==0.21.0
-matplotlib==3.7.2
-pandas==2.0.3
-reportlab==4.0.4
-python-barcode==0.15.1
-zxing-cpp==2.0.0
-dbr==9.6.30
-PyMuPDF==1.23.8
-regex==2023.10.3

+gradio>=4.0.0
+pdf2image>=1.16.0
+Pillow>=9.0.0
+pytesseract>=0.3.10
+pyzbar>=0.1.9
+pyspellchecker>=0.7.0
+numpy>=1.21.0
+scikit-image>=0.19.0