Update pipeline tag, fix paper links, correct BibTeX, and add abstract (#1)

Browse files

- Update pipeline tag, fix paper links, correct BibTeX, and add abstract (ee22d50d2aeb9a58c06b2079d2d27bc220e801aa)

Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show

README.md +10 -7

README.md CHANGED Viewed

@@ -1,13 +1,12 @@
 ---
 license: apache-2.0
 tags:
 - depth-estimation
 - computer-vision
 - monocular-depth
 - multi-view-geometry
 - pose-estimation
-library_name: depth-anything-3
-pipeline_tag: depth-estimation
 ---
 # Depth Anything 3: DA3-BASE
@@ -15,12 +14,16 @@ pipeline_tag: depth-estimation
 <div align="center">
 [![Project Page](https://img.shields.io/badge/Project_Page-Depth_Anything_3-green)](https://depth-anything-3.github.io)
-[![Paper](https://img.shields.io/badge/arXiv-Depth_Anything_3-red)](https://arxiv.org/abs/)
 [![Demo](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue)](https://huggingface.co/spaces/depth-anything/Depth-Anything-3)  # noqa: E501
 <!-- Benchmark badge removed as per request -->
 </div>
 ## Model Description
 DA3 Base model for multi-view depth estimation and camera pose estimation. Compact foundation model with unified depth-ray representation.
@@ -108,7 +111,7 @@ da3 auto path/to/images --export-format glb --use-backend
 - **Depth Anything 2** for monocular depth estimation
 - **VGGT** for multi-view depth estimation and pose estimation
-For detailed benchmarks, please refer to our [paper](https://depth-anything-3.github.io).  # noqa: E501
 ## Limitations
@@ -124,7 +127,7 @@ If you find Depth Anything 3 useful in your research or projects, please cite:
 @article{depthanything3,
   title={Depth Anything 3: Recovering the visual space from any views},
   author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang},  # noqa: E501
-  journal={arXiv preprint arXiv:XXXX.XXXXX},
   year={2025}
 }
 ```
@@ -132,11 +135,11 @@ If you find Depth Anything 3 useful in your research or projects, please cite:
 ## Links
 - 🏠 [Project Page](https://depth-anything-3.github.io)
-- 📄 [Paper](https://arxiv.org/abs/)
 - 💻 [GitHub Repository](https://github.com/ByteDance-Seed/depth-anything-3)
 - 🤗 [Hugging Face Demo](https://huggingface.co/spaces/depth-anything/Depth-Anything-3)
 - 📚 [Documentation](https://github.com/ByteDance-Seed/depth-anything-3#-useful-documentation)
 ## Authors
-[Haotong Lin](https://haotongl.github.io/) · [Sili Chen](https://github.com/SiliChen321) · [Junhao Liew](https://liewjunhao.github.io/) · [Donny Y. Chen](https://donydchen.github.io) · [Zhenyu Li](https://zhyever.github.io/) · [Guang Shi](https://scholar.google.com/citations?user=MjXxWbUAAAAJ&hl=en) · [Jiashi Feng](https://scholar.google.com.sg/citations?user=Q8iay0gAAAAJ&hl=en) · [Bingyi Kang](https://bingykang.github.io/)  # noqa: E501

 ---
 license: apache-2.0
+pipeline_tag: image-to-3d
 tags:
 - depth-estimation
 - computer-vision
 - monocular-depth
 - multi-view-geometry
 - pose-estimation
 ---
 # Depth Anything 3: DA3-BASE
 <div align="center">
 [![Project Page](https://img.shields.io/badge/Project_Page-Depth_Anything_3-green)](https://depth-anything-3.github.io)
+[![Paper](https://img.shields.io/badge/arXiv-Depth_Anything_3-red)](https://arxiv.org/abs/2511.10647)
 [![Demo](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue)](https://huggingface.co/spaces/depth-anything/Depth-Anything-3)  # noqa: E501
 <!-- Benchmark badge removed as per request -->
 </div>
+## Abstract
+We present Depth Anything 3 (DA3), a model that predicts spatially consistent geometry from an arbitrary number of visual inputs, with or without known camera poses. In pursuit of minimal modeling, DA3 yields two key insights: a single plain transformer (e.g., vanilla DINO encoder) is sufficient as a backbone without architectural specialization, and a singular depth-ray prediction target obviates the need for complex multi-task learning. Through our teacher-student training paradigm, the model achieves a level of detail and generalization on par with Depth Anything 2 (DA2). We establish a new visual geometry benchmark covering camera pose estimation, any-view geometry and visual rendering. On this benchmark, DA3 sets a new state-of-the-art across all tasks, surpassing prior SOTA VGGT by an average of 44.3% in camera pose accuracy and 25.1% in geometric accuracy. Moreover, it outperforms DA2 in monocular depth estimation. All models are trained exclusively on public academic datasets.
 ## Model Description
 DA3 Base model for multi-view depth estimation and camera pose estimation. Compact foundation model with unified depth-ray representation.
 - **Depth Anything 2** for monocular depth estimation
 - **VGGT** for multi-view depth estimation and pose estimation
+For detailed benchmarks, please refer to our [paper](https://arxiv.org/abs/2511.10647).  # noqa: E501
 ## Limitations
 @article{depthanything3,
   title={Depth Anything 3: Recovering the visual space from any views},
   author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang},  # noqa: E501
+  journal={arXiv preprint arXiv:2511.10647},
   year={2025}
 }
 ```
 ## Links
 - 🏠 [Project Page](https://depth-anything-3.github.io)
+- 📄 [Paper](https://arxiv.org/abs/2511.10647)
 - 💻 [GitHub Repository](https://github.com/ByteDance-Seed/depth-anything-3)
 - 🤗 [Hugging Face Demo](https://huggingface.co/spaces/depth-anything/Depth-Anything-3)
 - 📚 [Documentation](https://github.com/ByteDance-Seed/depth-anything-3#-useful-documentation)
 ## Authors
+[Haotong Lin](https://haotongl.github.io/) · [Sili Chen](https://github.com/SiliChen321) · [Junhao Liew](https://liewjunhao.github.io/) · [Donny Y. Chen](https://donydchen.github.io) · [Zhenyu Li](https://zhyever.github.io/) · [Guang Shi](https://scholar.google.com/citations?user=MjXxWbUAAAAJ&hl=en) · [Jiashi Feng](https://scholar.google.com.sg/citations?user=Q8iay0gAAAAJ&hl=en) · [Bingyi Kang](https://bingykang.github.io/)  # noqa: E501