Update pipeline tag, fix paper links, correct BibTeX, and add abstract (#1)
Browse files- Update pipeline tag, fix paper links, correct BibTeX, and add abstract (ee22d50d2aeb9a58c06b2079d2d27bc220e801aa)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
|
@@ -1,13 +1,12 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
| 3 |
tags:
|
| 4 |
- depth-estimation
|
| 5 |
- computer-vision
|
| 6 |
- monocular-depth
|
| 7 |
- multi-view-geometry
|
| 8 |
- pose-estimation
|
| 9 |
-
library_name: depth-anything-3
|
| 10 |
-
pipeline_tag: depth-estimation
|
| 11 |
---
|
| 12 |
|
| 13 |
# Depth Anything 3: DA3-BASE
|
|
@@ -15,12 +14,16 @@ pipeline_tag: depth-estimation
|
|
| 15 |
<div align="center">
|
| 16 |
|
| 17 |
[](https://depth-anything-3.github.io)
|
| 18 |
-
[](https://arxiv.org/abs/)
|
| 19 |
[](https://huggingface.co/spaces/depth-anything/Depth-Anything-3) # noqa: E501
|
| 20 |
<!-- Benchmark badge removed as per request -->
|
| 21 |
|
| 22 |
</div>
|
| 23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
## Model Description
|
| 25 |
|
| 26 |
DA3 Base model for multi-view depth estimation and camera pose estimation. Compact foundation model with unified depth-ray representation.
|
|
@@ -108,7 +111,7 @@ da3 auto path/to/images --export-format glb --use-backend
|
|
| 108 |
- **Depth Anything 2** for monocular depth estimation
|
| 109 |
- **VGGT** for multi-view depth estimation and pose estimation
|
| 110 |
|
| 111 |
-
For detailed benchmarks, please refer to our [paper](https://
|
| 112 |
|
| 113 |
## Limitations
|
| 114 |
|
|
@@ -124,7 +127,7 @@ If you find Depth Anything 3 useful in your research or projects, please cite:
|
|
| 124 |
@article{depthanything3,
|
| 125 |
title={Depth Anything 3: Recovering the visual space from any views},
|
| 126 |
author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang}, # noqa: E501
|
| 127 |
-
journal={arXiv preprint arXiv:
|
| 128 |
year={2025}
|
| 129 |
}
|
| 130 |
```
|
|
@@ -132,11 +135,11 @@ If you find Depth Anything 3 useful in your research or projects, please cite:
|
|
| 132 |
## Links
|
| 133 |
|
| 134 |
- 馃彔 [Project Page](https://depth-anything-3.github.io)
|
| 135 |
-
- 馃搫 [Paper](https://arxiv.org/abs/)
|
| 136 |
- 馃捇 [GitHub Repository](https://github.com/ByteDance-Seed/depth-anything-3)
|
| 137 |
- 馃 [Hugging Face Demo](https://huggingface.co/spaces/depth-anything/Depth-Anything-3)
|
| 138 |
- 馃摎 [Documentation](https://github.com/ByteDance-Seed/depth-anything-3#-useful-documentation)
|
| 139 |
|
| 140 |
## Authors
|
| 141 |
|
| 142 |
-
[Haotong Lin](https://haotongl.github.io/) 路 [Sili Chen](https://github.com/SiliChen321) 路 [Junhao Liew](https://liewjunhao.github.io/) 路 [Donny Y. Chen](https://donydchen.github.io) 路 [Zhenyu Li](https://zhyever.github.io/) 路 [Guang Shi](https://scholar.google.com/citations?user=MjXxWbUAAAAJ&hl=en) 路 [Jiashi Feng](https://scholar.google.com.sg/citations?user=Q8iay0gAAAAJ&hl=en) 路 [Bingyi Kang](https://bingykang.github.io/) # noqa: E501
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-to-3d
|
| 4 |
tags:
|
| 5 |
- depth-estimation
|
| 6 |
- computer-vision
|
| 7 |
- monocular-depth
|
| 8 |
- multi-view-geometry
|
| 9 |
- pose-estimation
|
|
|
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
# Depth Anything 3: DA3-BASE
|
|
|
|
| 14 |
<div align="center">
|
| 15 |
|
| 16 |
[](https://depth-anything-3.github.io)
|
| 17 |
+
[](https://arxiv.org/abs/2511.10647)
|
| 18 |
[](https://huggingface.co/spaces/depth-anything/Depth-Anything-3) # noqa: E501
|
| 19 |
<!-- Benchmark badge removed as per request -->
|
| 20 |
|
| 21 |
</div>
|
| 22 |
|
| 23 |
+
## Abstract
|
| 24 |
+
|
| 25 |
+
We present Depth Anything 3 (DA3), a model that predicts spatially consistent geometry from an arbitrary number of visual inputs, with or without known camera poses. In pursuit of minimal modeling, DA3 yields two key insights: a single plain transformer (e.g., vanilla DINO encoder) is sufficient as a backbone without architectural specialization, and a singular depth-ray prediction target obviates the need for complex multi-task learning. Through our teacher-student training paradigm, the model achieves a level of detail and generalization on par with Depth Anything 2 (DA2). We establish a new visual geometry benchmark covering camera pose estimation, any-view geometry and visual rendering. On this benchmark, DA3 sets a new state-of-the-art across all tasks, surpassing prior SOTA VGGT by an average of 44.3% in camera pose accuracy and 25.1% in geometric accuracy. Moreover, it outperforms DA2 in monocular depth estimation. All models are trained exclusively on public academic datasets.
|
| 26 |
+
|
| 27 |
## Model Description
|
| 28 |
|
| 29 |
DA3 Base model for multi-view depth estimation and camera pose estimation. Compact foundation model with unified depth-ray representation.
|
|
|
|
| 111 |
- **Depth Anything 2** for monocular depth estimation
|
| 112 |
- **VGGT** for multi-view depth estimation and pose estimation
|
| 113 |
|
| 114 |
+
For detailed benchmarks, please refer to our [paper](https://arxiv.org/abs/2511.10647). # noqa: E501
|
| 115 |
|
| 116 |
## Limitations
|
| 117 |
|
|
|
|
| 127 |
@article{depthanything3,
|
| 128 |
title={Depth Anything 3: Recovering the visual space from any views},
|
| 129 |
author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang}, # noqa: E501
|
| 130 |
+
journal={arXiv preprint arXiv:2511.10647},
|
| 131 |
year={2025}
|
| 132 |
}
|
| 133 |
```
|
|
|
|
| 135 |
## Links
|
| 136 |
|
| 137 |
- 馃彔 [Project Page](https://depth-anything-3.github.io)
|
| 138 |
+
- 馃搫 [Paper](https://arxiv.org/abs/2511.10647)
|
| 139 |
- 馃捇 [GitHub Repository](https://github.com/ByteDance-Seed/depth-anything-3)
|
| 140 |
- 馃 [Hugging Face Demo](https://huggingface.co/spaces/depth-anything/Depth-Anything-3)
|
| 141 |
- 馃摎 [Documentation](https://github.com/ByteDance-Seed/depth-anything-3#-useful-documentation)
|
| 142 |
|
| 143 |
## Authors
|
| 144 |
|
| 145 |
+
[Haotong Lin](https://haotongl.github.io/) 路 [Sili Chen](https://github.com/SiliChen321) 路 [Junhao Liew](https://liewjunhao.github.io/) 路 [Donny Y. Chen](https://donydchen.github.io) 路 [Zhenyu Li](https://zhyever.github.io/) 路 [Guang Shi](https://scholar.google.com/citations?user=MjXxWbUAAAAJ&hl=en) 路 [Jiashi Feng](https://scholar.google.com.sg/citations?user=Q8iay0gAAAAJ&hl=en) 路 [Bingyi Kang](https://bingykang.github.io/) # noqa: E501
|