WEBSITE_SCRAPER
v1.0.0
[SYS] Comprehensive website content extraction system: Sitemap parsing + Text processing
CORE_FEATURES:
Intelligent Discovery
> Automatic sitemap detection
> Robots.txt parsing
> Multi-level sitemap support
Content Extraction
> Clean text extraction
> Navigation removal
> Format normalization
Data Processing
> URL structure mapping
> Batch processing
> Supabase integration
Content Storage
> Structured database storage
> URL and content mapping
> RAG-ready format
$ system_requirements
MODELS: none required
STORAGE: supabase
SERVICES: none required
OUTPUT: supabase table entries
PRICING: supabase - free tier
EST. PER RUN COST: free
PROCESS_FLOW:
AUTOMATION_BENEFITS:
- > Create a private knowledge base from any website
- > Automate content extraction for AI training datasets
- > Build RAG systems with domain-specific content
- > Monitor website content changes over time
- > Generate searchable content archives without manual processing