Abstract
PICABench and PICAEval evaluate physical realism in image editing by assessing eight sub-dimensions and using VLM-as-a-judge with human annotations, highlighting the need for physics-based solutions.
Image editing has achieved remarkable progress recently. Modern editing models could already follow complex instructions to manipulate the original content. However, beyond completing the editing instructions, the accompanying physical effects are the key to the generation realism. For example, removing an object should also remove its shadow, reflections, and interactions with nearby objects. Unfortunately, existing models and benchmarks mainly focus on instruction completion but overlook these physical effects. So, at this moment, how far are we from physically realistic image editing? To answer this, we introduce PICABench, which systematically evaluates physical realism across eight sub-dimension (spanning optics, mechanics, and state transitions) for most of the common editing operations (add, remove, attribute change, etc). We further propose the PICAEval, a reliable evaluation protocol that uses VLM-as-a-judge with per-case, region-level human annotations and questions. Beyond benchmarking, we also explore effective solutions by learning physics from videos and construct a training dataset PICA-100K. After evaluating most of the mainstream models, we observe that physical realism remains a challenging problem with large rooms to explore. We hope that our benchmark and proposed solutions can serve as a foundation for future work moving from naive content editing toward physically consistent realism.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Factuality Matters: When Image Generation and Editing Meet Structured Visuals (2025)
- ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation (2025)
- EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing (2025)
- SpotEdit: Evaluating Visually-Guided Image Editing Methods (2025)
- Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset (2025)
- MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks (2025)
- Towards Scalable and Consistent 3D Editing (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/picabench-how-far-are-we-from-physically-realistic-image-editing
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper