Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images
Abstract
The Cloud-Adapter approach enhances cloud segmentation accuracy using a parameter-efficient method that adapts a pretrained vision foundation model with a lightweight spatial perception module.
Cloud segmentation is a critical challenge in remote sensing image interpretation, as its accuracy directly impacts the effectiveness of subsequent data processing and analysis. Recently, vision foundation models (VFM) have demonstrated powerful generalization capabilities across various visual tasks. In this paper, we present a parameter-efficient adaptive approach, termed Cloud-Adapter, designed to enhance the accuracy and robustness of cloud segmentation. Our method leverages a VFM pretrained on general domain data, which remains frozen, eliminating the need for additional training. Cloud-Adapter incorporates a lightweight spatial perception module that initially utilizes a convolutional neural network (ConvNet) to extract dense spatial representations. These multi-scale features are then aggregated and serve as contextual inputs to an adapting module, which modulates the frozen transformer layers within the VFM. Experimental results demonstrate that the Cloud-Adapter approach, utilizing only 0.6% of the trainable parameters of the frozen backbone, achieves substantial performance gains. Cloud-Adapter consistently attains state-of-the-art (SOTA) performance across a wide variety of cloud segmentation datasets from multiple satellite sources, sensor series, data processing levels, land cover scenarios, and annotation granularities. We have released the source code and pretrained models at https://github.com/XavierJiezou/Cloud-Adapter to support further research.
Community
- Cloud-Adapter bridges the gap between domain-specific and generalized models, showcasing the utility of VFMs in remote sensing tasks. 
- It achieves superior segmentation accuracy across diverse satellite datasets while maintaining computational efficiency. 
- By releasing code and pretrained models, the authors foster open collaboration, paving the way for broader applications in environmental monitoring, disaster management, and beyond. 
Github code: https://github.com/XavierJiezou/Cloud-Adapter
Huggingface: https://huggingface.co/XavierJiezou/cloud-adapter-models
Page Demo: https://xavierjiezou.github.io/Cloud-Adapter/
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MANet: Fine-Tuning Segment Anything Model for Multimodal Remote Sensing Semantic Segmentation (2024)
- Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion (2024)
- Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentation (2024)
- CTA-Net: A CNN-Transformer Aggregation Network for Improving Multi-Scale Feature Extraction (2024)
- Deep Multimodal Fusion for Semantic Segmentation of Remote Sensing Earth Observation Data (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
 You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: 
@librarian-bot
	 recommend
Models citing this paper 2
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
 Kai Li
							Kai Li 
					 
					