WEBSITE_SCRAPER

v1.0.0

[SYS] Comprehensive website content extraction system: Sitemap parsing + Text processing

$ cat description.txt Advanced website content extraction workflow that automatically discovers and processes sitemaps to extract text content from all pages on a website. Intelligently handles various sitemap formats, cleans and processes HTML content, and stores the extracted text in a structured database for knowledge retrieval and AI applications.

CORE_FEATURES:

Intelligent Discovery

> Automatic sitemap detection
> Robots.txt parsing
> Multi-level sitemap support

Content Extraction

> Clean text extraction
> Navigation removal
> Format normalization

Data Processing

> URL structure mapping
> Batch processing
> Supabase integration

Content Storage

> Structured database storage
> URL and content mapping
> RAG-ready format

$ system_requirements
    
MODELS: none required
STORAGE: supabase
SERVICES: none required
OUTPUT: supabase table entries
PRICING: supabase - free tier        
EST. PER RUN COST: free

PROCESS_FLOW:

AUTOMATION_BENEFITS:

> Create a private knowledge base from any website
> Automate content extraction for AI training datasets
> Build RAG systems with domain-specific content
> Monitor website content changes over time
> Generate searchable content archives without manual processing

€79

PURCHASE_TEMPLATE

* Compatible with all n8n installations v1.0.0+