haotongl nielsr HF Staff commited on
Commit
f4a6c9b
verified
1 Parent(s): 6544606

Update pipeline tag, fix paper links, correct BibTeX, and add abstract (#1)

Browse files

- Update pipeline tag, fix paper links, correct BibTeX, and add abstract (ee22d50d2aeb9a58c06b2079d2d27bc220e801aa)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +10 -7
README.md CHANGED
@@ -1,13 +1,12 @@
1
  ---
2
  license: apache-2.0
 
3
  tags:
4
  - depth-estimation
5
  - computer-vision
6
  - monocular-depth
7
  - multi-view-geometry
8
  - pose-estimation
9
- library_name: depth-anything-3
10
- pipeline_tag: depth-estimation
11
  ---
12
 
13
  # Depth Anything 3: DA3-BASE
@@ -15,12 +14,16 @@ pipeline_tag: depth-estimation
15
  <div align="center">
16
 
17
  [![Project Page](https://img.shields.io/badge/Project_Page-Depth_Anything_3-green)](https://depth-anything-3.github.io)
18
- [![Paper](https://img.shields.io/badge/arXiv-Depth_Anything_3-red)](https://arxiv.org/abs/)
19
  [![Demo](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue)](https://huggingface.co/spaces/depth-anything/Depth-Anything-3) # noqa: E501
20
  <!-- Benchmark badge removed as per request -->
21
 
22
  </div>
23
 
 
 
 
 
24
  ## Model Description
25
 
26
  DA3 Base model for multi-view depth estimation and camera pose estimation. Compact foundation model with unified depth-ray representation.
@@ -108,7 +111,7 @@ da3 auto path/to/images --export-format glb --use-backend
108
  - **Depth Anything 2** for monocular depth estimation
109
  - **VGGT** for multi-view depth estimation and pose estimation
110
 
111
- For detailed benchmarks, please refer to our [paper](https://depth-anything-3.github.io). # noqa: E501
112
 
113
  ## Limitations
114
 
@@ -124,7 +127,7 @@ If you find Depth Anything 3 useful in your research or projects, please cite:
124
  @article{depthanything3,
125
  title={Depth Anything 3: Recovering the visual space from any views},
126
  author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang}, # noqa: E501
127
- journal={arXiv preprint arXiv:XXXX.XXXXX},
128
  year={2025}
129
  }
130
  ```
@@ -132,11 +135,11 @@ If you find Depth Anything 3 useful in your research or projects, please cite:
132
  ## Links
133
 
134
  - 馃彔 [Project Page](https://depth-anything-3.github.io)
135
- - 馃搫 [Paper](https://arxiv.org/abs/)
136
  - 馃捇 [GitHub Repository](https://github.com/ByteDance-Seed/depth-anything-3)
137
  - 馃 [Hugging Face Demo](https://huggingface.co/spaces/depth-anything/Depth-Anything-3)
138
  - 馃摎 [Documentation](https://github.com/ByteDance-Seed/depth-anything-3#-useful-documentation)
139
 
140
  ## Authors
141
 
142
- [Haotong Lin](https://haotongl.github.io/) 路 [Sili Chen](https://github.com/SiliChen321) 路 [Junhao Liew](https://liewjunhao.github.io/) 路 [Donny Y. Chen](https://donydchen.github.io) 路 [Zhenyu Li](https://zhyever.github.io/) 路 [Guang Shi](https://scholar.google.com/citations?user=MjXxWbUAAAAJ&hl=en) 路 [Jiashi Feng](https://scholar.google.com.sg/citations?user=Q8iay0gAAAAJ&hl=en) 路 [Bingyi Kang](https://bingykang.github.io/) # noqa: E501
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: image-to-3d
4
  tags:
5
  - depth-estimation
6
  - computer-vision
7
  - monocular-depth
8
  - multi-view-geometry
9
  - pose-estimation
 
 
10
  ---
11
 
12
  # Depth Anything 3: DA3-BASE
 
14
  <div align="center">
15
 
16
  [![Project Page](https://img.shields.io/badge/Project_Page-Depth_Anything_3-green)](https://depth-anything-3.github.io)
17
+ [![Paper](https://img.shields.io/badge/arXiv-Depth_Anything_3-red)](https://arxiv.org/abs/2511.10647)
18
  [![Demo](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue)](https://huggingface.co/spaces/depth-anything/Depth-Anything-3) # noqa: E501
19
  <!-- Benchmark badge removed as per request -->
20
 
21
  </div>
22
 
23
+ ## Abstract
24
+
25
+ We present Depth Anything 3 (DA3), a model that predicts spatially consistent geometry from an arbitrary number of visual inputs, with or without known camera poses. In pursuit of minimal modeling, DA3 yields two key insights: a single plain transformer (e.g., vanilla DINO encoder) is sufficient as a backbone without architectural specialization, and a singular depth-ray prediction target obviates the need for complex multi-task learning. Through our teacher-student training paradigm, the model achieves a level of detail and generalization on par with Depth Anything 2 (DA2). We establish a new visual geometry benchmark covering camera pose estimation, any-view geometry and visual rendering. On this benchmark, DA3 sets a new state-of-the-art across all tasks, surpassing prior SOTA VGGT by an average of 44.3% in camera pose accuracy and 25.1% in geometric accuracy. Moreover, it outperforms DA2 in monocular depth estimation. All models are trained exclusively on public academic datasets.
26
+
27
  ## Model Description
28
 
29
  DA3 Base model for multi-view depth estimation and camera pose estimation. Compact foundation model with unified depth-ray representation.
 
111
  - **Depth Anything 2** for monocular depth estimation
112
  - **VGGT** for multi-view depth estimation and pose estimation
113
 
114
+ For detailed benchmarks, please refer to our [paper](https://arxiv.org/abs/2511.10647). # noqa: E501
115
 
116
  ## Limitations
117
 
 
127
  @article{depthanything3,
128
  title={Depth Anything 3: Recovering the visual space from any views},
129
  author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang}, # noqa: E501
130
+ journal={arXiv preprint arXiv:2511.10647},
131
  year={2025}
132
  }
133
  ```
 
135
  ## Links
136
 
137
  - 馃彔 [Project Page](https://depth-anything-3.github.io)
138
+ - 馃搫 [Paper](https://arxiv.org/abs/2511.10647)
139
  - 馃捇 [GitHub Repository](https://github.com/ByteDance-Seed/depth-anything-3)
140
  - 馃 [Hugging Face Demo](https://huggingface.co/spaces/depth-anything/Depth-Anything-3)
141
  - 馃摎 [Documentation](https://github.com/ByteDance-Seed/depth-anything-3#-useful-documentation)
142
 
143
  ## Authors
144
 
145
+ [Haotong Lin](https://haotongl.github.io/) 路 [Sili Chen](https://github.com/SiliChen321) 路 [Junhao Liew](https://liewjunhao.github.io/) 路 [Donny Y. Chen](https://donydchen.github.io) 路 [Zhenyu Li](https://zhyever.github.io/) 路 [Guang Shi](https://scholar.google.com/citations?user=MjXxWbUAAAAJ&hl=en) 路 [Jiashi Feng](https://scholar.google.com.sg/citations?user=Q8iay0gAAAAJ&hl=en) 路 [Bingyi Kang](https://bingykang.github.io/) # noqa: E501