How AI Pentesting Works¶

AI Pentesting combines crawling capabilities with agents to discover vulnerabilities through injection testing and business logic analysis.

Crawling¶

AI Pentesting uses agentic crawling to explore applications:

State-aware exploration
Natural language instructions
Error recovery
Context understanding

Agents use LLM reasoning to navigate web applications and discover endpoints.

Scope & blocklists: what the agent controls, and what it doesn't

AI Pentesting agents honor the scan's crawling and API testing scope when deciding where to navigate and which endpoints to actively attack. An entry in the api_testing blocklist guarantees that the agent itself will not directly fire requests against a matching endpoint and that the security-check engine will not fuzz it.

It does not, today, act as a network-level proxy. If the page the agent is testing issues its own XHR/fetch calls to a blocklisted endpoint as part of its normal behavior, the browser will still execute those requests — exactly as it would for a real user. They will show up in the scan logs as captured traffic, but no active security checks will run against them.

To guarantee an endpoint is never contacted during a scan — including by the application itself — block it at your WAF/gateway for the duration of the scan, or use the crawling blocklist to prevent the pages that fire those calls from being visited. A network-level proxy enforcing the blocklist on outbound browser traffic is on the roadmap. See Scope Configuration — What the blocklist does and does not prevent today for the full explanation.

XSS Testing¶

Agents discover XSS vulnerabilities through:

Context-aware injection
Multi-step exploitation
Response analysis
Payload adaptation
State tracking

Agents reason about DOM structure, JavaScript execution contexts, and input validation to craft XSS payloads.

SQL Injection Testing¶

The SQLi Agent is enabled by default and focuses on SQL injection reconnaissance and targeted exploitation:

Reconnaissance-first testing that prioritizes database-backed endpoints
Context-aware payloading adapted to observed request and response patterns
High-value surface prioritization on filters, search, reporting, and exports
Optional natural-language guidance to focus on the most critical areas

Business Logic Testing¶

Agents discover business logic vulnerabilities including:

Authorization testing: BOLA, tenant isolation, and privilege escalation (see BOLA Agent)
Workflow bypasses
State manipulation
Multi-step attacks

Agents use multi-user authentication to test authorization boundaries (see BOLA Agent for configuration).

Agent Workflow¶

Discovery: Agents explore the application using powerful crawling
Analysis: Agents reason about application structure and behavior
Testing: Agents execute injection attacks and business logic tests
Validation: Agents verify vulnerabilities and collect evidence

Agent reasoning is visible in scan logs, showing why agents took specific actions and how they adapted their strategies.

AI Models¶

A common question from customers is "which LLM does AI Pentesting use?" — often with the assumption that everything is powered by a single model like ChatGPT. It isn't. AI Pentesting is a multi-model system: different stages of a scan, and different agents, are powered by different models chosen for the job at hand.

A Portfolio of Models, not a Single Model¶

Escape routes work across a portfolio of frontier commercial models and open-weight models hosted on Escape's own infrastructure. No single provider or model powers the product end-to-end. We treat models as interchangeable components and select the best one for each sub-task based on continuous internal benchmarks.

Why Different Models for Different Tasks?¶

Security testing is not one problem — it's a pipeline of very different cognitive tasks, each with different requirements:

Stage	What the model needs to be good at	Why it matters
Crawling & navigation	Visual + DOM reasoning, tool use, cost efficiency	Agents click, type, and navigate thousands of pages per scan — speed and cost dominate. See Agentic Crawling.
Exploit design & payload generation	Creativity, divergent thinking, broad world knowledge	Finding a new attack path or crafting a novel payload benefits from models that explore the solution space aggressively.
Exploit validation & confirmation	Strict, deterministic reasoning, low hallucination, adherence to evidence	Confirming an exploit worked must be precise — a false positive here becomes a false positive in your Issue queue.
Business logic & multi-step planning	Long-context reasoning, planning, state tracking	Multi-step attacks (BOLA chains, workflow bypasses) require reasoning over long transcripts of requests, responses, and prior agent actions.
Evidence summarization & remediation write-ups	Instruction-following, concision, technical writing	The reproduction steps, cURL commands, and remediation guidance surfaced in a finding are generated from raw evidence.

A model that is excellent at one of these stages is often not the best choice for another. Using a single model everywhere would mean trading off creativity against rigor on every single task.

Model Selection per Agent¶

Each agent (see BOLA Agent, XSS Agent, SQLi Agent, Business Logic Agent, Regression Agent, Multi-Agent Pentest) composes the stages above with different weightings:

The BOLA Agent leans heavily on long-context planning models to track identities, roles, and multi-step authorization flows.
The XSS Agent pairs a creative payload-generation model with a strict validation model that only flags a finding when execution is unambiguously confirmed.
The SQLi Agent uses reconnaissance-oriented reasoning to prioritize database-backed endpoints before escalating to confirmed exploitation.
The Business Logic Agent depends on models with strong stateful reasoning to chain requests across a workflow.
The Multi-Agent Pentest orchestrates multiple agents in parallel, each running its own model mix.

The exact routing — which model is used at which step, with which prompts, tools, and guardrails — is part of Escape's private IP and is what makes the product work. Customers benefit from this routing without having to build it themselves.

Always Evaluating the Newest Models¶

Frontier models move fast. Escape runs every new major release (GPT, Claude, Gemini, Llama, and others) against an internal security-evaluation harness composed of real-world vulnerability reproduction tasks, benchmark applications, and regression suites. A new model is promoted only when it beats the incumbent on our metrics — not on a public leaderboard. This means:

The model mix today is not the model mix six months ago.
Improvements in upstream model capabilities translate into better findings, reproductions, and remediations without changes on the customer's side.
If a new model regresses on security reasoning, we don't ship it — even if it's cheaper or faster.

Data Handling¶

Regardless of which model is used, customer data handling is governed by Escape's Privacy & Security policy:

No training on customer data. Customer traffic, payloads, findings, and evidence are never used to train third-party models.
Zero-retention agreements are in place with commercial model providers where available, so prompts and responses are not retained by the provider.
Open-weight models used in sensitive contexts run on Escape-controlled infrastructure, so data never leaves Escape's trust boundary.
Organization administrators can disable all AI Pentesting activity at any time via the AI Pentesting Kill Switch.

What about reproducibility?

Because the model mix evolves over time, an AI Pentesting scan is intentionally not bit-for-bit reproducible — that's a feature, not a bug. It's what lets agents find things that a frozen, signature-based scanner cannot. Determinism where it matters (finding identity, reproduction steps, evidence) is provided by Escape's own engine, not by the underlying LLM.

Multi-Agent Pentest: Autonomous multi-agent penetration testing
BOLA Agent: Authorization testing agent
XSS Agent: XSS testing agent
SQLi Agent: SQL injection testing agent
Business Logic Agent: Business workflow testing agent
API Testing Configuration: API testing configuration options
Frontend DAST Configuration: WebApp testing configuration options
Agentic Crawling: Technical details on crawling