Skip to content

Multi-Agent Pentest

The Multi-Agent Pentest is an autonomous, black-box penetration testing engine that deploys a team of coordinated AI agents inside a sandboxed environment. A core agent orchestrates specialised child agents that perform reconnaissance, targeted exploitation, validation, and reporting, mirroring the workflow of a human pentester.

Capabilities

  • Multi-agent orchestration: A core agent spawns and coordinates focused child agents (e.g. "SQLi Discovery", "XSS Validation", "Auth Testing") to divide the attack surface
  • Full sandbox environment: Agents run inside a sandbox with classic pentesting tools, a browser, and an http proxy.
  • Broad vulnerability coverage: XSS, SQL injection, IDOR/BOLA, SSRF, command injection, access control, and business logic flaws
  • Evidence-rich reporting: Findings include curl requests, responses, commands, and step-by-step reasoning
  • Real-time activity streaming: Agent thinking and tool calls are streamed as scan events so you can follow the pentest live

How It Is Used

The Multi-Agent Pentest runs automatically after the initial crawling phase. You cannot enable, disable, or tune the orchestrator directly from this page.

Use the AI Pentesting stepper to influence the orchestrator:

  • Scope mode: Use Standard when related discovered assets can be included in the pentest, or Strict when testing must stay limited to the listed URLs.
  • Scope restrictions: Keep crawling or active API testing away from destructive, sensitive, or out-of-scope paths.
  • Authentication: Add users and natural-language sign-in instructions when protected surfaces matter.
  • Fine-Tune (Optional) > Context: Add high-value endpoints or workflows, areas to avoid, vulnerability classes to focus on, authentication quirks, session quirks, and known technologies.
  • Fine-Tune (Optional) > Duration: Control the maximum scan duration and rate limit from the stepper.
  • Fine-Tune (Optional) > Artifacts: Attach supporting files, such as pentest reports, documentation, OpenAPI exports, or screenshots.

The orchestrator receives the crawler output, the configured users, the context, the scope, the duration budget, and the attached artifacts. It then coordinates child agents and prioritizes work within those boundaries.

Vulnerability Categories

Findings are automatically classified into one of the following categories:

Category Examples
XSS Reflected, stored, and DOM-based cross-site scripting
SQL Injection Error-based, union-based, blind, and time-based SQL injection
SSRF Server-side request forgery, internal service access
Command Injection OS command injection, remote code execution
Access Control IDOR, privilege escalation, authentication bypass, broken authorization
Business Logic Workflow manipulation, race conditions, state tampering

Requirements

  • Reachable target: The scan must be able to reach the target URL
  • Supported scans: Web applications, REST APIs, and GraphQL APIs
  • Authentication (optional): Configure when important surfaces are behind login

API scans (REST / GraphQL)

For API scans, the engine targets the API base URL (no browser workflow) and receives a schema artifact pre-seeded inside the sandbox workspace so the agents can enumerate the attack surface before probing it:

  • REST: the OpenAPI specification (when available through the scan) is written to /workspace/schema.openapi.json. If no OpenAPI spec is attached, the known endpoint list is written to /workspace/schema.endpoints.json.
  • GraphQL: the queries / mutations / subscriptions metadata is written to /workspace/schema.operations.json, and the full SDL is written to /workspace/schema.graphql when available.

Limitations

  • Coverage depends on what the agents can discover and reach within the scan timeout
  • The default scan timeout is 6 hours and can be adjusted in the stepper up to 24 hours
  • Agents stay on the configured target domain and do not navigate to external sites
  • Context improves focus but does not replace scan scope or authentication setup