Site Crawler
How GrowDeck discovers and maps your website.
Last updated: May 2025
Overview
The GrowDeck crawler uses Playwright โ a full browser engine โ to visit your website the same way a real user would. This means JavaScript-rendered content, SPAs, and dynamically loaded pages are all crawled correctly.
What gets extracted
For each page, the crawler extracts:
- URL, title, meta description
- H1โH6 heading structure
- Body text (cleaned, no nav/footer)
- Internal links (used for link graph)
- Canonical URL
- Page type classification (homepage, blog, product, landing, etc.)
- Word count and content depth
- Schema markup presence
- Open Graph metadata
- Page load performance signals
Crawler settings
- Depth limit: Maximum pages to crawl per run (default: 500)
- Respect robots.txt: Always on โ the crawler checks robots.txt before crawling
- Concurrency: 3 parallel browser contexts (configurable)
- Delay: 500ms between requests to avoid rate limiting
How it handles SPAs
Playwright waits for networkidle before extracting content. Dynamic routes in Next.js, React, and Vue are handled correctly.
networkidle before extracting content.Crawler technology: Sandflare microVMs
GrowDeck runs every crawl in an isolated Sandflare microVM for security and performance. Each page is visited in a fresh browser-agent sandbox with Chromium and Playwright pre-installed.
This isolation ensures:
- Security โ Malicious JavaScript or tracking scripts can't escape the sandbox
- Clean state โ Every crawl starts fresh with no cookies or cached data
- Scalability โ Sandboxes launch in under 1 second with automatic cleanup
Learn more in the Sandflare docs, or see the crawler implementation in our open-source repository.
Triggering a crawl
Crawls can be triggered:
- Manually from the site dashboard โ Crawl tab
- On a schedule (daily/weekly)PRO โ configure in site settings
- Via API:
POST /api/v1/sites/:siteId/jobswith{ "type": "CRAWL" }
Crawl status
PENDINGโ queued, waiting for a workerPROCESSINGโ actively crawlingCOMPLETEDโ finished, pages extractedFAILEDโ error occurred (check job logs)
After a crawl
Crawled pages populate the Pages tab. The keyword engineautomatically runs after each completed crawl to refresh opportunity scoring.
