πŸ” YOLOv12x-DINOv3 Watermark Detection Model

Python PyTorch License Ultralytics

State-of-the-art watermark detection powered by YOLOv12 + DINOv3 Vision Transformers

πŸ“Š Model Performance

Metric Value
[email protected] 84.1%
[email protected]:0.95 59.9%
Precision 91.8%
Recall 74.2%

πŸ—οΈ Architecture

  • Base Model: YOLOv12x (Extra-Large variant)
  • Enhancement: DINOv3 ViT-B/16 backbone integration
  • Configuration: Dual P0/P3 feature enhancement
  • Input Size: 1024Γ—1024

Key Features

  • 🧠 DINOv3 Vision Transformer integration at P4 level (40Γ—40Γ—512)
  • πŸ”„ Dual-scale feature fusion for improved small/medium object detection
  • ⚑ Optimized for watermark detection with high precision
  • πŸ›‘οΈ Production-ready with comprehensive error handling

πŸš€ Quick Start

Installation

pip install ultralytics

Inference

from ultralytics import YOLO

# Load model from Hugging Face
model = YOLO('hf://YOUR_USERNAME/yolov12x-dino3-watermark-detection')

# Or load locally
model = YOLO('best.pt')

# Run inference
results = model('image.jpg')

# Process results
for result in results:
    boxes = result.boxes
    for box in boxes:
        print(f"Watermark detected: confidence {box.conf[0]:.2f}")

Batch Processing

# Process multiple images
results = model(['image1.jpg', 'image2.jpg', 'image3.jpg'])

# Process video
results = model('video.mp4', stream=True)

πŸ“‹ Training Details

Parameter Value
Epochs 100
Batch Size 2
Image Size 1024Γ—1024
Optimizer SGD
Learning Rate 0.01 β†’ 0.0001
Momentum 0.937
Weight Decay 0.0005
Augmentation RandAugment, MixUp (0.2), Mosaic

πŸ“ˆ Training Progress

The model was trained for 100 epochs with consistent improvement:

  • Epoch 1: mAP50 = 0.3%
  • Epoch 25: mAP50 = 53.7%
  • Epoch 50: mAP50 = 77.2%
  • Epoch 75: mAP50 = 79.6%
  • Epoch 100: mAP50 = 84.1%

πŸ”§ Model Configuration

The model uses the yolov12x-dino3-vitb16-dual.yaml configuration:

# YOLOv12x with DINOv3 ViT-B/16 Dual Enhancement
backbone:
  - YOLOv12x backbone layers
  - DINOv3 ViT-B/16 integration at P4

head:
  - Multi-scale detection head
  - Dual P0/P3 feature enhancement

πŸ“ Files Included

  • best.pt - Best model checkpoint (highest mAP)
  • last.pt - Final epoch checkpoint
  • args.yaml - Training configuration
  • results.csv - Training metrics log
  • *.png - Training curves and confusion matrices

⚠️ Limitations

  • Optimized specifically for watermark detection
  • Requires GPU with 8GB+ VRAM for inference at 1024Γ—1024
  • Best performance on images similar to training distribution

πŸ“„ License

This model is released under the AGPL-3.0 License.

πŸ™ Acknowledgments

πŸ“ž Citation

If you use this model, please cite the original YOLOv12-DINOv3 repository:

@software{sompote_yolov12_dinov3_2024,
  title={YOLOv12 + DINOv3 Vision Transformers Integration},
  author={Sompote},
  year={2024},
  url={https://github.com/Sompote/DINOV3-YOLOV12}
}
Downloads last month
88
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results