Papers
arxiv:2510.13626

LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

Published on Oct 15
ยท Submitted by Siyin Wang (SII) on Oct 16
Authors:
,
,
,
,
,
,
,
,
,
,
,

Abstract

State-of-the-art Visual-Language-Action models show high benchmark scores but are brittle to various perturbations, particularly in camera viewpoints and robot initial states, and often ignore language instructions.

AI-generated summary

Visual-Language-Action (VLA) models report impressive success rates on robotic manipulation benchmarks, yet these results may mask fundamental weaknesses in robustness. We perform a systematic vulnerability analysis by introducing controlled perturbations across seven dimensions: objects layout, camera viewpoints, robot initial states, language instructions, light conditions, background textures and sensor noise. We comprehensively analyzed multiple state-of-the-art models and revealed consistent brittleness beneath apparent competence. Our analysis exposes critical weaknesses: models exhibit extreme sensitivity to perturbation factors, including camera viewpoints and robot initial states, with performance dropping from 95% to below 30% under modest perturbations. Surprisingly, models are largely insensitive to language variations, with further experiments revealing that models tend to ignore language instructions completely. Our findings challenge the assumption that high benchmark scores equate to true competency and highlight the need for evaluation practices that assess reliability under realistic variation.

Community

Paper author Paper submitter

๐Ÿš€ Introducing LIBERO-Plus: A Comprehensive Benchmark for Vision-Language-Action Models

We are excited to unveil LIBERO-Plus, an advanced robustness evaluation tool for Vision-Language-Action (VLA) models. LIBERO-Plus allows researchers to understand how these models perform under various environmental perturbations, shedding light on their vulnerabilities in real-world settings.

Paper author Paper submitter

๐Ÿ” Novel Findings: Uncovering Hidden Vulnerabilities

  • Models exhibit extreme sensitivity to perturbation factors, including camera viewpoints and robot initial states, with performance dropping from 95% to below 30% under modest perturbations.

  • Models are largely insensitive to language variations, with further experiments revealing that models tend to ignore language instructions completely.

  • Models exhibit a reliance on superficial visual cues, such as positional bias, rather than a genuine semantic understanding of task-relevant objects.

  • Compositional Generalization is intrinsically non-decomposable.

  • Training Data Diversity Significantly Improves Robustness.

...

For more detailed information, please check out our paper.

Paper author Paper submitter

โš™๏ธ Easy to Use: Seamless Transition to LIBERO-Plus

LIBERO-Plus makes it incredibly easy for users to evaluate the robustness of their existing models. With just a few simple steps, you can seamlessly switch from LIBERO to LIBERO-Plus, unlocking powerful tools for automatic and fine-grained evaluation.

Paper author Paper submitter

๐Ÿ“Š Comprehensive, Automatic, and Fine-Grained Benchmark

LIBERO-Plus offers a robust benchmarking framework with 7 perturbation dimensions and 21 sub-dimensions. It provides a fine-grained difficulty scale from L1 to L5, allowing users to systematically assess model performance across various challenges. The construction is automated, including both training and testing datasets, making it easier than ever to conduct comprehensive assessments.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.13626 in a Space README.md to link it from this page.

Collections including this paper 3