arxiv:2510.01354

WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents

Published on Oct 1

· Submitted by

Yinuo Liu on Oct 6

Upvote

Authors:

Yinuo Liu ,

Abstract

A comprehensive benchmark study evaluates the detection of prompt injection attacks against web agents, revealing that current detectors perform well against explicit attacks but struggle with subtle ones.

AI-generated summary

Multiple prompt injection attacks have been proposed against web agents. At the same time, various methods have been developed to detect general prompt injection attacks, but none have been systematically evaluated for web agents. In this work, we bridge this gap by presenting the first comprehensive benchmark study on detecting prompt injection attacks targeting web agents. We begin by introducing a fine-grained categorization of such attacks based on the threat model. We then construct datasets containing both malicious and benign samples: malicious text segments generated by different attacks, benign text segments from four categories, malicious images produced by attacks, and benign images from two categories. Next, we systematize both text-based and image-based detection methods. Finally, we evaluate their performance across multiple scenarios. Our key findings show that while some detectors can identify attacks that rely on explicit textual instructions or visible image perturbations with moderate to high accuracy, they largely fail against attacks that omit explicit instructions or employ imperceptible perturbations. Our datasets and code are released at: https://github.com/Norrrrrrr-lyn/WAInjectBench.

View arXiv page View PDF GitHub 4 Add to collection

Community

Norrrrrrr

Paper author Paper submitter 18 days ago

This paper presents WAInjectBench, a benchmark for evaluating prompt injection detection methods in web agents. It covers diverse attack strategies in both text and image, and provides datasets plus systematic evaluation of existing detectors. Code and data are available here: https://github.com/Norrrrrrr-lyn/WAInjectBench

librarian-bot

17 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.01354 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.01354 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.