π YOLOv12x-DINOv3 Watermark Detection Model
π Model Performance
| Metric | Value |
|---|---|
| [email protected] | 84.1% |
| [email protected]:0.95 | 59.9% |
| Precision | 91.8% |
| Recall | 74.2% |
ποΈ Architecture
- Base Model: YOLOv12x (Extra-Large variant)
- Enhancement: DINOv3 ViT-B/16 backbone integration
- Configuration: Dual P0/P3 feature enhancement
- Input Size: 1024Γ1024
Key Features
- π§ DINOv3 Vision Transformer integration at P4 level (40Γ40Γ512)
- π Dual-scale feature fusion for improved small/medium object detection
- β‘ Optimized for watermark detection with high precision
- π‘οΈ Production-ready with comprehensive error handling
π Quick Start
Installation
pip install ultralytics
Inference
from ultralytics import YOLO
# Load model from Hugging Face
model = YOLO('hf://YOUR_USERNAME/yolov12x-dino3-watermark-detection')
# Or load locally
model = YOLO('best.pt')
# Run inference
results = model('image.jpg')
# Process results
for result in results:
boxes = result.boxes
for box in boxes:
print(f"Watermark detected: confidence {box.conf[0]:.2f}")
Batch Processing
# Process multiple images
results = model(['image1.jpg', 'image2.jpg', 'image3.jpg'])
# Process video
results = model('video.mp4', stream=True)
π Training Details
| Parameter | Value |
|---|---|
| Epochs | 100 |
| Batch Size | 2 |
| Image Size | 1024Γ1024 |
| Optimizer | SGD |
| Learning Rate | 0.01 β 0.0001 |
| Momentum | 0.937 |
| Weight Decay | 0.0005 |
| Augmentation | RandAugment, MixUp (0.2), Mosaic |
π Training Progress
The model was trained for 100 epochs with consistent improvement:
- Epoch 1: mAP50 = 0.3%
- Epoch 25: mAP50 = 53.7%
- Epoch 50: mAP50 = 77.2%
- Epoch 75: mAP50 = 79.6%
- Epoch 100: mAP50 = 84.1%
π§ Model Configuration
The model uses the yolov12x-dino3-vitb16-dual.yaml configuration:
# YOLOv12x with DINOv3 ViT-B/16 Dual Enhancement
backbone:
- YOLOv12x backbone layers
- DINOv3 ViT-B/16 integration at P4
head:
- Multi-scale detection head
- Dual P0/P3 feature enhancement
π Files Included
best.pt- Best model checkpoint (highest mAP)last.pt- Final epoch checkpointargs.yaml- Training configurationresults.csv- Training metrics log*.png- Training curves and confusion matrices
β οΈ Limitations
- Optimized specifically for watermark detection
- Requires GPU with 8GB+ VRAM for inference at 1024Γ1024
- Best performance on images similar to training distribution
π License
This model is released under the AGPL-3.0 License.
π Acknowledgments
- Sompote/DINOV3-YOLOV12 - Original YOLOv12 + DINOv3 integration
- Ultralytics - YOLO framework
- Meta AI - DINOv2/v3 Vision Transformers
- PyTorch - Deep learning framework
π Citation
If you use this model, please cite the original YOLOv12-DINOv3 repository:
@software{sompote_yolov12_dinov3_2024,
title={YOLOv12 + DINOv3 Vision Transformers Integration},
author={Sompote},
year={2024},
url={https://github.com/Sompote/DINOV3-YOLOV12}
}
- Downloads last month
- 88
Evaluation results
- mAP50self-reported84.100
- mAP50-95self-reported59.900
- precisionself-reported91.800
- recallself-reported74.200