🔍 YOLOv12x-DINOv3 Watermark Detection Model

State-of-the-art watermark detection powered by YOLOv12 + DINOv3 Vision Transformers

📊 Model Performance

Metric	Value
[email protected]	84.1%
[email protected]:0.95	59.9%
Precision	91.8%
Recall	74.2%

🏗️ Architecture

Base Model: YOLOv12x (Extra-Large variant)
Enhancement: DINOv3 ViT-B/16 backbone integration
Configuration: Dual P0/P3 feature enhancement
Input Size: 1024×1024

Key Features

🧠 DINOv3 Vision Transformer integration at P4 level (40×40×512)
🔄 Dual-scale feature fusion for improved small/medium object detection
⚡ Optimized for watermark detection with high precision
🛡️ Production-ready with comprehensive error handling

🚀 Quick Start

Installation

pip install ultralytics

Inference

from ultralytics import YOLO

# Load model from Hugging Face
model = YOLO('hf://YOUR_USERNAME/yolov12x-dino3-watermark-detection')

# Or load locally
model = YOLO('best.pt')

# Run inference
results = model('image.jpg')

# Process results
for result in results:
    boxes = result.boxes
    for box in boxes:
        print(f"Watermark detected: confidence {box.conf[0]:.2f}")

Batch Processing

# Process multiple images
results = model(['image1.jpg', 'image2.jpg', 'image3.jpg'])

# Process video
results = model('video.mp4', stream=True)

📋 Training Details

Parameter	Value
Epochs	100
Batch Size	2
Image Size	1024×1024
Optimizer	SGD
Learning Rate	0.01 → 0.0001
Momentum	0.937
Weight Decay	0.0005
Augmentation	RandAugment, MixUp (0.2), Mosaic

📈 Training Progress

The model was trained for 100 epochs with consistent improvement:

Epoch 1: mAP50 = 0.3%
Epoch 25: mAP50 = 53.7%
Epoch 50: mAP50 = 77.2%
Epoch 75: mAP50 = 79.6%
Epoch 100: mAP50 = 84.1%

🔧 Model Configuration

The model uses the yolov12x-dino3-vitb16-dual.yaml configuration:

# YOLOv12x with DINOv3 ViT-B/16 Dual Enhancement
backbone:
  - YOLOv12x backbone layers
  - DINOv3 ViT-B/16 integration at P4

head:
  - Multi-scale detection head
  - Dual P0/P3 feature enhancement

📁 Files Included

best.pt - Best model checkpoint (highest mAP)
last.pt - Final epoch checkpoint
args.yaml - Training configuration
results.csv - Training metrics log
*.png - Training curves and confusion matrices

⚠️ Limitations

Optimized specifically for watermark detection
Requires GPU with 8GB+ VRAM for inference at 1024×1024
Best performance on images similar to training distribution

📄 License

This model is released under the AGPL-3.0 License.

🙏 Acknowledgments

Sompote/DINOV3-YOLOV12 - Original YOLOv12 + DINOv3 integration
Ultralytics - YOLO framework
Meta AI - DINOv2/v3 Vision Transformers
PyTorch - Deep learning framework

📞 Citation

If you use this model, please cite the original YOLOv12-DINOv3 repository:

@software{sompote_yolov12_dinov3_2024,
  title={YOLOv12 + DINOv3 Vision Transformers Integration},
  author={Sompote},
  year={2024},
  url={https://github.com/Sompote/DINOV3-YOLOV12}
}

Downloads last month: 88

Evaluation results

mAP50
self-reported

84.100
mAP50-95
self-reported

59.900
precision
self-reported

91.800
recall
self-reported

74.200

Metadata error: specify a dataset to view leaderboard