Skip to content

Agentic Crawling

Agentic Crawling in WebApp DAST

Crawling web applications is a complex task that requires comprehension of actions, causality and chains between actions, input formats allowed, recovering from errors and more.

We have built special capabilities in the WebApp DAST engine to boost your application crawling coverage, by using LLM agents and giving them control on each page we detect.

Natural Language Crawling Instructions

Every web application has its quirks and specific business logic. You can directly influence the crawler's efficiency and success by guiding it with simple natural language instructions.

You can enable the agentic crawling feature via the following configuration:

frontend_dast:
  agentic_crawling:
    enabled: true
    instructions: >
      Do not change the user password. If you are logged out, log back in by
      using user@example.com and the password is helloworld.

      Make sure to search, create, delete objects on each page to fully test
      each feature. If you are on the Escape Private Locations page, try creating a new private location named "hello world", and delete it.

This configuration will guide the LLM agent to help the scan-reauthenticate if it gets logged out, and also perform specific actions on a specific page you might want to target.

This will yield better API coverage by giving clear instructions and additional helpful information.

For example:

  • a special employee ID you might need in a form.
  • a specific user flow on a specific page (combine this with hotstart to guide it even further!)
  • allowing a deletion action that the agent might avoid usually (its default instructions are to avoid destructive actions)

Reviewing results

You can review the agentic crawling logs by searching your scan logs, in the "Logs" tab.

Here you will be able to view reasoning, actions, screenshots during the scan.

Simply search for Agentic Page Crawler, or even better, use the "Stage" filter, by adding Agentic Actions for reviewing tool calls and clicks, interactions with the page, and Agentic Reasoning for the agent's reasoning and thinking during the crawling of the pages.

Agentic Page Crawler Logs Figure 1: Agentic Page Crawler Logs

Agentic Page Crawler Reasoning Figure 2: Agentic Page Crawler Reasoning

Agentic Page Crawler Reasoning Figure 3: Agentic Page Crawler Reasoning

And you should see that the agent succeeded !

From a natural language instruction

Agentic Page Crawler Config Figure 4: Agentic Page Crawler Configuration

To a proof of actions performed, with screenshots

Agentic Page Crawler Success Figure 5: Agentic Page Crawler successfully executed the task

And a final output of the crawler that will summarize what was done. Agentic Page Crawler Final Summary Output Figure 6: Agentic Page Crawler final summary output