How AI Pentesting Works¶
AI Pentesting combines crawling capabilities with agents to discover vulnerabilities through injection testing and business logic analysis.
Crawling¶
AI Pentesting uses agentic crawling to explore applications:
- State-aware exploration
- Natural language instructions
- Error recovery
- Context understanding
Agents use LLM reasoning to navigate web applications and discover endpoints.
Scope & blocklists: what the agent controls, and what it doesn't
AI Pentesting agents honor the scan's crawling and API testing scope when deciding where to navigate and which endpoints to actively attack. An entry in the api_testing blocklist guarantees that the agent itself will not directly fire requests against a matching endpoint and that the security-check engine will not fuzz it.
It does not, today, act as a network-level proxy. If the page the agent is testing issues its own XHR/fetch calls to a blocklisted endpoint as part of its normal behavior, the browser will still execute those requests — exactly as it would for a real user. They will show up in the scan logs as captured traffic, but no active security checks will run against them.
To guarantee an endpoint is never contacted during a scan — including by the application itself — block it at your WAF/gateway for the duration of the scan, or use the crawling blocklist to prevent the pages that fire those calls from being visited. A network-level proxy enforcing the blocklist on outbound browser traffic is on the roadmap. See Scope Configuration — What the blocklist does and does not prevent today for the full explanation.
XSS Testing¶
Agents discover XSS vulnerabilities through:
- Context-aware injection
- Multi-step exploitation
- Response analysis
- Payload adaptation
- State tracking
Agents reason about DOM structure, JavaScript execution contexts, and input validation to craft XSS payloads.
SQL Injection Testing¶
The SQLi Agent is enabled by default and focuses on SQL injection reconnaissance and targeted exploitation:
- Reconnaissance-first testing that prioritizes database-backed endpoints
- Context-aware payloading adapted to observed request and response patterns
- High-value surface prioritization on filters, search, reporting, and exports
- Optional natural-language guidance to focus on the most critical areas
Business Logic Testing¶
Agents discover business logic vulnerabilities including:
- Authorization testing: BOLA, tenant isolation, and privilege escalation (see BOLA Agent)
- Workflow bypasses
- State manipulation
- Multi-step attacks
Agents use multi-user authentication to test authorization boundaries (see BOLA Agent for configuration).
Agent Workflow¶
- Discovery: Agents explore the application using powerful crawling
- Analysis: Agents reason about application structure and behavior
- Testing: Agents execute injection attacks and business logic tests
- Validation: Agents verify vulnerabilities and collect evidence
Agent reasoning is visible in scan logs, showing why agents took specific actions and how they adapted their strategies.
AI Models¶
A common question from customers is "which LLM does AI Pentesting use?" — often with the assumption that everything is powered by a single model like ChatGPT. It isn't. AI Pentesting is a multi-model system: different stages of a scan, and different agents, are powered by different models chosen for the job at hand.
A Portfolio of Models, not a Single Model¶
Escape routes work across a portfolio of frontier commercial models and open-weight models hosted on Escape's own infrastructure. No single provider or model powers the product end-to-end. We treat models as interchangeable components and select the best one for each sub-task based on continuous internal benchmarks.
Why Different Models for Different Tasks?¶
Security testing is not one problem — it's a pipeline of very different cognitive tasks, each with different requirements:
| Stage | What the model needs to be good at | Why it matters |
|---|---|---|
| Crawling & navigation | Visual + DOM reasoning, tool use, cost efficiency | Agents click, type, and navigate thousands of pages per scan — speed and cost dominate. See Agentic Crawling. |
| Exploit design & payload generation | Creativity, divergent thinking, broad world knowledge | Finding a new attack path or crafting a novel payload benefits from models that explore the solution space aggressively. |
| Exploit validation & confirmation | Strict, deterministic reasoning, low hallucination, adherence to evidence | Confirming an exploit worked must be precise — a false positive here becomes a false positive in your Issue queue. |
| Business logic & multi-step planning | Long-context reasoning, planning, state tracking | Multi-step attacks (BOLA chains, workflow bypasses) require reasoning over long transcripts of requests, responses, and prior agent actions. |
| Evidence summarization & remediation write-ups | Instruction-following, concision, technical writing | The reproduction steps, cURL commands, and remediation guidance surfaced in a finding are generated from raw evidence. |
A model that is excellent at one of these stages is often not the best choice for another. Using a single model everywhere would mean trading off creativity against rigor on every single task.
Model Selection per Agent¶
Each agent (see BOLA Agent, XSS Agent, SQLi Agent, Business Logic Agent, Regression Agent, Multi-Agent Pentest) composes the stages above with different weightings:
- The BOLA Agent leans heavily on long-context planning models to track identities, roles, and multi-step authorization flows.
- The XSS Agent pairs a creative payload-generation model with a strict validation model that only flags a finding when execution is unambiguously confirmed.
- The SQLi Agent uses reconnaissance-oriented reasoning to prioritize database-backed endpoints before escalating to confirmed exploitation.
- The Business Logic Agent depends on models with strong stateful reasoning to chain requests across a workflow.
- The Multi-Agent Pentest orchestrates multiple agents in parallel, each running its own model mix.
The exact routing — which model is used at which step, with which prompts, tools, and guardrails — is part of Escape's private IP and is what makes the product work. Customers benefit from this routing without having to build it themselves.
Always Evaluating the Newest Models¶
Frontier models move fast. Escape runs every new major release (GPT, Claude, Gemini, Llama, and others) against an internal security-evaluation harness composed of real-world vulnerability reproduction tasks, benchmark applications, and regression suites. A new model is promoted only when it beats the incumbent on our metrics — not on a public leaderboard. This means:
- The model mix today is not the model mix six months ago.
- Improvements in upstream model capabilities translate into better findings, reproductions, and remediations without changes on the customer's side.
- If a new model regresses on security reasoning, we don't ship it — even if it's cheaper or faster.
Data Handling¶
Regardless of which model is used, customer data handling is governed by Escape's Privacy & Security policy:
- No training on customer data. Customer traffic, payloads, findings, and evidence are never used to train third-party models.
- Zero-retention agreements are in place with commercial model providers where available, so prompts and responses are not retained by the provider.
- Open-weight models used in sensitive contexts run on Escape-controlled infrastructure, so data never leaves Escape's trust boundary.
- Organization administrators can disable all AI Pentesting activity at any time via the AI Pentesting Kill Switch.
What about reproducibility?
Because the model mix evolves over time, an AI Pentesting scan is intentionally not bit-for-bit reproducible — that's a feature, not a bug. It's what lets agents find things that a frozen, signature-based scanner cannot. Determinism where it matters (finding identity, reproduction steps, evidence) is provided by Escape's own engine, not by the underlying LLM.
Related Documentation¶
- Multi-Agent Pentest: Autonomous multi-agent penetration testing
- BOLA Agent: Authorization testing agent
- XSS Agent: XSS testing agent
- SQLi Agent: SQL injection testing agent
- Business Logic Agent: Business workflow testing agent
- API Testing Configuration: API testing configuration options
- Frontend DAST Configuration: WebApp testing configuration options
- Agentic Crawling: Technical details on crawling