File size: 4,687 Bytes
b9db50e 95249f8 b9db50e c54c26b b9db50e 95249f8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
---
license: cc-by-nc-4.0
tags:
- depth-estimation
- computer-vision
- monocular-depth
- multi-view-geometry
- pose-estimation
library_name: depth-anything-3
pipeline_tag: depth-estimation
---
# Depth Anything 3: DA3-LARGE
<div align="center">
[](https://depth-anything-3.github.io)
[](https://arxiv.org/abs/)
[](https://huggingface.co/spaces/depth-anything/Depth-Anything-3) # noqa: E501
<!-- Benchmark badge removed as per request -->
</div>
## Model Description
DA3 Large model for multi-view depth estimation and camera pose estimation. Foundation model with unified depth-ray representation.
| Property | Value |
|----------|-------|
| **Model Series** | Any-view Model |
| **Parameters** | 0.35B |
| **License** | CC BY-NC 4.0 |
## Capabilities
- β
Relative Depth
- β
Pose Estimation
- β
Pose Conditioning
## Quick Start
### Installation
```bash
git clone https://github.com/ByteDance-Seed/depth-anything-3
cd depth-anything-3
pip install -e .
```
### Basic Example
```python
import torch
from depth_anything_3.api import DepthAnything3
# Load model from Hugging Face Hub
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = DepthAnything3.from_pretrained("depth-anything/da3-large")
model = model.to(device=device)
# Run inference on images
images = ["image1.jpg", "image2.jpg"] # List of image paths, PIL Images, or numpy arrays
prediction = model.inference(
images,
export_dir="output",
export_format="glb" # Options: glb, npz, ply, mini_npz, gs_ply, gs_video
)
# Access results
print(prediction.depth.shape) # Depth maps: [N, H, W] float32
print(prediction.conf.shape) # Confidence maps: [N, H, W] float32
print(prediction.extrinsics.shape) # Camera poses (w2c): [N, 3, 4] float32
print(prediction.intrinsics.shape) # Camera intrinsics: [N, 3, 3] float32
```
### Command Line Interface
```bash
# Process images with auto mode
da3 auto path/to/images \
--export-format glb \
--export-dir output \
--model-dir depth-anything/da3-large
# Use backend for faster repeated inference
da3 backend --model-dir depth-anything/da3-large
da3 auto path/to/images --export-format glb --use-backend
```
## Model Details
- **Developed by:** ByteDance Seed Team
- **Model Type:** Vision Transformer for Visual Geometry
- **Architecture:** Plain transformer with unified depth-ray representation
- **Training Data:** Public academic datasets only
### Key Insights
π A **single plain transformer** (e.g., vanilla DINO encoder) is sufficient as a backbone without architectural specialization. # noqa: E501
β¨ A singular **depth-ray representation** obviates the need for complex multi-task learning.
## Performance
π Depth Anything 3 significantly outperforms:
- **Depth Anything 2** for monocular depth estimation
- **VGGT** for multi-view depth estimation and pose estimation
For detailed benchmarks, please refer to our [paper](https://depth-anything-3.github.io). # noqa: E501
## Limitations
- The model is trained on academic datasets and may have limitations on certain domain-specific images # noqa: E501
- Performance may vary depending on image quality, lighting conditions, and scene complexity
## Citation
If you find Depth Anything 3 useful in your research or projects, please cite:
```bibtex
@article{depthanything3,
title={Depth Anything 3: Recovering the visual space from any views},
author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang}, # noqa: E501
journal={arXiv preprint arXiv:XXXX.XXXXX},
year={2025}
}
```
## Links
- π [Project Page](https://depth-anything-3.github.io)
- π [Paper](https://arxiv.org/abs/)
- π» [GitHub Repository](https://github.com/ByteDance-Seed/depth-anything-3)
- π€ [Hugging Face Demo](https://huggingface.co/spaces/depth-anything/Depth-Anything-3)
- π [Documentation](https://github.com/ByteDance-Seed/depth-anything-3#-useful-documentation)
## Authors
[Haotong Lin](https://haotongl.github.io/) Β· [Sili Chen](https://github.com/SiliChen321) Β· [Junhao Liew](https://liewjunhao.github.io/) Β· [Donny Y. Chen](https://donydchen.github.io) Β· [Zhenyu Li](https://zhyever.github.io/) Β· [Guang Shi](https://scholar.google.com/citations?user=MjXxWbUAAAAJ&hl=en) Β· [Jiashi Feng](https://scholar.google.com.sg/citations?user=Q8iay0gAAAAJ&hl=en) Β· [Bingyi Kang](https://bingykang.github.io/) # noqa: E501 |