replace depth anything v1 with v2

Browse files

Files changed (6) hide show

README.md +44 -77
depth_anything_v2_vitl.pth +3 -0
v1/README.md +98 -0
config.json → v1/config.json +0 -0
model.safetensors → v1/model.safetensors +0 -0
preprocessor_config.json → v1/preprocessor_config.json +0 -0

README.md CHANGED Viewed

@@ -1,98 +1,65 @@
 ---
-license: apache-2.0
-tags:
-  - vision
 pipeline_tag: depth-estimation
-widget:
-  - inference: false
 ---
-# Depth Anything (large-sized model, Transformers version)
-Depth Anything model. It was introduced in the paper [Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data](https://arxiv.org/abs/2401.10891) by Lihe Yang et al. and first released in [this repository](https://github.com/LiheYoung/Depth-Anything).
-[Online demo](https://huggingface.co/spaces/LiheYoung/Depth-Anything) is also provided.
-Disclaimer: The team releasing Depth Anything did not write a model card for this model so this model card has been written by the Hugging Face team.
-## Model description
-Depth Anything leverages the [DPT](https://huggingface.co/docs/transformers/model_doc/dpt) architecture with a [DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2) backbone.
-The model is trained on ~62 million images, obtaining state-of-the-art results for both relative and absolute depth estimation.
-<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/depth_anything_overview.jpg"
-alt="drawing" width="600"/>
-<small> Depth Anything overview. Taken from the <a href="https://arxiv.org/abs/2401.10891">original paper</a>.</small>
-## Intended uses & limitations
-You can use the raw model for tasks like zero-shot depth estimation. See the [model hub](https://huggingface.co/models?search=depth-anything) to look for
-other versions on a task that interests you.
-### How to use
-Here is how to use this model to perform zero-shot depth estimation:
-```python
-from transformers import pipeline
-from PIL import Image
-import requests
-# load pipe
-pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-large-hf")
-# load image
-url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
-image = Image.open(requests.get(url, stream=True).raw)
-# inference
-depth = pipe(image)["depth"]
 ```
-Alternatively, one can use the classes themselves:
 ```python
-from transformers import AutoImageProcessor, AutoModelForDepthEstimation
 import torch
-import numpy as np
-from PIL import Image
-import requests
-url = "http://images.cocodataset.org/val2017/000000039769.jpg"
-image = Image.open(requests.get(url, stream=True).raw)
-image_processor = AutoImageProcessor.from_pretrained("LiheYoung/depth-anything-large-hf")
-model = AutoModelForDepthEstimation.from_pretrained("LiheYoung/depth-anything-large-hf")
-# prepare image for the model
-inputs = image_processor(images=image, return_tensors="pt")
-with torch.no_grad():
-    outputs = model(**inputs)
-    predicted_depth = outputs.predicted_depth
-# interpolate to original size
-prediction = torch.nn.functional.interpolate(
-    predicted_depth.unsqueeze(1),
-    size=image.size[::-1],
-    mode="bicubic",
-    align_corners=False,
-)
 ```
-For more code examples, we refer to the [documentation](https://huggingface.co/transformers/main/model_doc/depth_anything.html#).
-### BibTeX entry and citation info
 ```bibtex
-@misc{yang2024depth,
-      title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data},
-      author={Lihe Yang and Bingyi Kang and Zilong Huang and Xiaogang Xu and Jiashi Feng and Hengshuang Zhao},
-      year={2024},
-      eprint={2401.10891},
-      archivePrefix={arXiv},
-      primaryClass={cs.CV}
 }
-```

 ---
+license: cc-by-nc-4.0
+language:
+- en
 pipeline_tag: depth-estimation
+library_name: depth-anything-v2
+tags:
+- depth
+- relative depth
 ---
+# Depth-Anything-V2-Large
+## Introduction
+Depth Anything V2 is trained from 595K synthetic labeled images and 62M+ real unlabeled images, providing the most capable monocular depth estimation (MDE) model with the following features:
+- more fine-grained details than Depth Anything V1
+- more robust than Depth Anything V1 and SD-based models (e.g., Marigold, Geowizard)
+- more efficient (10x faster) and more lightweight than SD-based models
+- impressive fine-tuned performance with our pre-trained models
+## Installation
+```bash
+git clone https://huggingface.co/spaces/depth-anything/Depth-Anything-V2
+cd Depth-Anything-V2
+pip install -r requirements.txt
 ```
+## Usage
+Download the [model](https://huggingface.co/depth-anything/Depth-Anything-V2-Large/resolve/main/depth_anything_v2_vitl.pth?download=true) first and put it under the `checkpoints` directory.
 ```python
+import cv2
 import torch
+from depth_anything_v2.dpt import DepthAnythingV2
+model = DepthAnythingV2(encoder='vitl', features=256, out_channels=[256, 512, 1024, 1024])
+model.load_state_dict(torch.load('checkpoints/depth_anything_v2_vitl.pth', map_location='cpu'))
+model.eval()
+raw_img = cv2.imread('your/image/path')
+depth = model.infer_image(raw_img) # HxW raw depth map
 ```
+## Citation
+If you find this project useful, please consider citing:
 ```bibtex
+@article{depth_anything_v2,
+  title={Depth Anything V2},
+  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
+  journal={arXiv:2406.09414},
+  year={2024}
 }
+@inproceedings{depth_anything_v1,
+  title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data},
+  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
+  booktitle={CVPR},
+  year={2024}
+}

depth_anything_v2_vitl.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a7ea19fa0ed99244e67b624c72b8580b7e9553043245905be58796a608eb9345
+size 1341395338

v1/README.md ADDED Viewed

	@@ -0,0 +1,98 @@

+---
+license: apache-2.0
+tags:
+  - vision
+pipeline_tag: depth-estimation
+widget:
+  - inference: false
+---
+# Depth Anything (large-sized model, Transformers version)
+Depth Anything model. It was introduced in the paper [Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data](https://arxiv.org/abs/2401.10891) by Lihe Yang et al. and first released in [this repository](https://github.com/LiheYoung/Depth-Anything).
+[Online demo](https://huggingface.co/spaces/LiheYoung/Depth-Anything) is also provided.
+Disclaimer: The team releasing Depth Anything did not write a model card for this model so this model card has been written by the Hugging Face team.
+## Model description
+Depth Anything leverages the [DPT](https://huggingface.co/docs/transformers/model_doc/dpt) architecture with a [DINOv2](https://huggingface.co/docs/transformers/model_doc/dinov2) backbone.
+The model is trained on ~62 million images, obtaining state-of-the-art results for both relative and absolute depth estimation.
+<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/depth_anything_overview.jpg"
+alt="drawing" width="600"/>
+<small> Depth Anything overview. Taken from the <a href="https://arxiv.org/abs/2401.10891">original paper</a>.</small>
+## Intended uses & limitations
+You can use the raw model for tasks like zero-shot depth estimation. See the [model hub](https://huggingface.co/models?search=depth-anything) to look for
+other versions on a task that interests you.
+### How to use
+Here is how to use this model to perform zero-shot depth estimation:
+```python
+from transformers import pipeline
+from PIL import Image
+import requests
+# load pipe
+pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-large-hf")
+# load image
+url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
+image = Image.open(requests.get(url, stream=True).raw)
+# inference
+depth = pipe(image)["depth"]
+```
+Alternatively, one can use the classes themselves:
+```python
+from transformers import AutoImageProcessor, AutoModelForDepthEstimation
+import torch
+import numpy as np
+from PIL import Image
+import requests
+url = "http://images.cocodataset.org/val2017/000000039769.jpg"
+image = Image.open(requests.get(url, stream=True).raw)
+image_processor = AutoImageProcessor.from_pretrained("LiheYoung/depth-anything-large-hf")
+model = AutoModelForDepthEstimation.from_pretrained("LiheYoung/depth-anything-large-hf")
+# prepare image for the model
+inputs = image_processor(images=image, return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+    predicted_depth = outputs.predicted_depth
+# interpolate to original size
+prediction = torch.nn.functional.interpolate(
+    predicted_depth.unsqueeze(1),
+    size=image.size[::-1],
+    mode="bicubic",
+    align_corners=False,
+)
+```
+For more code examples, we refer to the [documentation](https://huggingface.co/transformers/main/model_doc/depth_anything.html#).
+### BibTeX entry and citation info
+```bibtex
+@misc{yang2024depth,
+      title={Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data},
+      author={Lihe Yang and Bingyi Kang and Zilong Huang and Xiaogang Xu and Jiashi Feng and Hengshuang Zhao},
+      year={2024},
+      eprint={2401.10891},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+```

config.json → v1/config.json RENAMED Viewed

File without changes

model.safetensors → v1/model.safetensors RENAMED Viewed

File without changes

preprocessor_config.json → v1/preprocessor_config.json RENAMED Viewed

File without changes