JS Analysis Agent¶
Modern web apps ship most of their business logic to the browser. The JS Analysis Agent reads the JavaScript that your app actually serves to real users, unbundles it when a sourcemap is available, extracts the API contracts it implies, scans the code for leaks, and fires an authenticated LLM-driven pentest against the APIs it discovered. It's one of Escape's widest-surface agents: a single pipeline that turns a pile of minified bundles into a known attack surface and a set of concrete findings.
What It Produces¶
The pipeline emits several classes of output per scan:
- Static-analysis findings on first-party JS:
ISSUE_JS_HARDCODED_SECRET: bearer tokens, AWS access keys (AKIA,ASIA), GitHub PATs (ghp_), GitLab PATs (glpat-), Stripe keys (sk_live,pk_live,sk_test,pk_test), Slack tokens (xoxb-,xoxp-,xoxa-), JWTs, PEM-formatted private keys, and inlineAuthorization/X-Api-Key/X-Auth-Tokenvalues.ISSUE_JS_DATA_LEAK: sensitive values written tolocalStorageorsessionStorageunder telling keys (token,auth,session,password,secret,key,credential,jwt,apikey,api_key), andconsole.logcalls that dump credentials.ISSUE_JS_DANGEROUS_EVAL:eval()ornew Function()calls that pass user-controlled input.
- API inventory extracted from JS source:
- REST endpoints:
fetch(...), generichttp.get/post/put/patch/delete(...)shorthand, jQuery$.ajax, concatenated URL paths, templated URLs with${...}interpolation, and object-form clients. - GraphQL: endpoint URLs plus named queries, mutations, and subscriptions.
- WebSocket:
new WebSocket(...)and related patterns. - The extracted routes are consolidated via a rename and de-duplication pass, then emitted as an OpenAPI spec, an
ApiServiceRest(orApiServiceGraphQL/ApiServiceWebSocket) asset, and anAssetLink(DEFINES)so the new asset joins the graph and downstream DAST, ASM, and AI Pentesting scanners can target it.
- REST endpoints:
- Business-logic findings from an LLM pentest agent:
BusinessLogicVulnerabilityIssuefindings from an agent that replays valid-shaped authenticated requests against the discovered REST API, using the generated OpenAPI spec + JS source as prompt context.
- HAR export for downstream BLST checks:
- A recorded HAR of the agent's traffic lands on S3 and propagates through a
TempHarExportevent, so the BLST agentic check fleet can replay the same authenticated exchanges and mine them for deeper issues.
- A recorded HAR of the agent's traffic lands on S3 and propagates through a
- Technology fingerprints:
- Every third-party library served from
node_modules/,vendor/, orbower_components/lands in the Escape technology map. Feeds the Technologies inventory.
- Every third-party library served from
How It Works¶
The pipeline runs as a chain of tasks inside the scanner-next fleet:
- Bundle download and sourcemap detection: every JS asset the scanner sees during a WebApp or ASM scan is pulled. The dispatcher reads the trailing
//# sourceMappingURL=comment. When a sourcemap is served, the bundle routes to the unbundle task; otherwise the raw bundle is analyzed directly. - Unbundling: the
js-bundle-unbundletask uses thesource-maplibrary'sSourceMapConsumerto expand a minified bundle back into its original source files. Minified bundles with named sources read like the original codebase, and every downstream detector runs against the real file names. - Beautification fallback: when the bundle is minified and no sourcemap ships, a lightweight beautifier breaks statements across lines so the regex detectors catch patterns that would otherwise live on a single 300-KB line.
- First-party vs third-party classification: each JS source is mapped back to the originating host and the sourcemap's declared source path. Code under
node_modules/,vendor/, orbower_components/, or served from a host outside the scope domain, is classified as third party and becomes a technology fingerprint. First-party JS moves on to the next two stages. - Leak detection: the first-party sources are scanned for the secret, data-leak, and dangerous-eval patterns listed above. Every finding carries a 120-character window around the match so triage sees the context without opening the file.
- API spec extraction: a regex pre-pass seeds API hints from every recognized HTTP-client pattern. A budgeted LLM pass then renames synthetic placeholders, deduplicates routes that differ only by path parameter naming, groups endpoints by inferred base URL, and assembles an OpenAPI spec per discovered service. GraphQL operation names and WebSocket endpoints are consolidated the same way. The output is a
JsSourceRestBundleWithOpenAPIper REST service plusApiServiceGraphQLandApiServiceWebSocketevents for the other shapes. - Asset emission: the
js-source-rest-openapi-emitsplitter unpacks each REST bundle into the canonicalOpenAPI + ApiServiceRest + AssetLink(DEFINES)triplet so the rest of the graph (BLST, SQLi agent, asset flushers) sees a normal API asset. - LLM pentest against the derived API: the
js-source-rest-openapi-pentest-agenttask runs one LLM agent per discovered REST service. The agent's primary job is coverage: fire valid, JS-informed requests at as many distinct OpenAPI operations as possible so the resulting HAR hands downstream BLST agentic checks a rich replay surface. Direct business-logic findings are a small reserved secondary output. The agent runs with hard caps (50 HTTP requests, 30 reasoning iterations, 500k tokens per run) and uses theExchangeCoverageclassifier from BLST, not raw HTTP status codes, to decide whether a call was useful. - HAR export: recorded agent exchanges are serialized as HAR 1.2 and uploaded for downstream BLST. HAR export runs in a
finallyso partial coverage still flows even if the agent crashes mid-run.
Why This Matters¶
Three wins stack up from this pipeline:
- Shadow API discovery from the browser side: every API the front-end actually talks to ends up in the inventory, even routes that were never documented or that sit on a separate service. Pairs with Shadow API Discovery to close the gap between what ships and what's declared.
- Real exploitability from the JS surface: hardcoded Stripe keys or bearer tokens in a React bundle are already-public incidents. The leak detectors turn them into findings within the same scan that rendered the page.
- Higher downstream coverage without human input: the derived OpenAPI spec plus the LLM-generated authenticated traffic give BLST a much bigger replay corpus than a hand-filed schema, which is why this agent matters even when it doesn't raise its own findings.
Scope and Limits¶
- First-party leak detection only: third-party libraries under
node_modules/or served from a different host are fingerprinted, not scanned for secrets. Vendor disclosures are the source of truth there. - Regex plus bounded LLM for extraction: the extraction pipeline is fast and defensively capped (up to 200 hints per bundle for the LLM pass, 80k characters of context per bundle). Deep semantic analysis of custom HTTP clients is out of scope.
- Prompt-injection defenses: every OpenAPI spec and JS source the agent sees comes from the target application, which means it can contain hostile text. The agent wraps inlined data in a per-render random fence and tells the model explicitly that anything between the fences is data, never instructions.
- Pentest agent behind
experimental.js_analysis: the upstream JS analysis pipeline (bundle download, classification, leak detection, OpenAPI extraction, asset emission) runs in GA. The agentic pentest stays behind theexperimental.js_analysisscanner-config flag until the coverage telemetry settles. - Sourcemap courtesy: we never request a sourcemap that the site doesn't serve, and we never bypass access controls to reach one.
Configuration¶
Most of the pipeline is zero-configuration. The LLM pentest stage is guarded by experimental.js_analysis:
With the flag off, the upstream JS analysis stages still run and still produce findings, assets, and OpenAPI specs. Only the js-source-rest-openapi-pentest-agent task is skipped.
Related¶
- Business Logic Agent for general-purpose business-logic reasoning over any target.
- Whitebox Agent for source-aware AI pentesting on repository code.
- WebApp Technology Detection for how classified third-party JS feeds the wider technology map.
- Shadow API Discovery for the parent "find the undocumented surface" story that this agent contributes to.