# Testing VPSBenchmarks Scraper with Browser Rendering API ## Local Testing ### 1. Start the development server ```bash npm run dev ``` ### 2. Trigger the scheduled handler manually In a separate terminal, run: ```bash curl "http://localhost:8793/__scheduled?cron=0+9+*+*+*" ``` ### 3. Check the logs The scraper will: - Use Browser Rendering API to fetch rendered HTML from vpsbenchmarks.com - Extract benchmark data from the rendered page - Insert/update records in the D1 database - Log the number of benchmarks found and inserted Expected output: ``` [Scraper] Starting VPSBenchmarks.com scrape with Browser Rendering API [Scraper] Fetching rendered HTML from vpsbenchmarks.com [Scraper] Rendered HTML length: XXXXX [Scraper] Extracted X benchmarks from HTML [Scraper] Found X benchmark entries [DB] Inserted/Updated: Provider PlanName [Scraper] Completed in XXXms: X inserted, X skipped, X errors ``` ## Production Deployment ### 1. Deploy to Cloudflare Workers ```bash npm run deploy ``` ### 2. Verify the cron trigger The scraper will run automatically daily at 9:00 AM UTC. ### 3. Check D1 database for new records ```bash npx wrangler d1 execute cloud-instances-db --command="SELECT COUNT(*) as total FROM vps_benchmarks" npx wrangler d1 execute cloud-instances-db --command="SELECT * FROM vps_benchmarks ORDER BY created_at DESC LIMIT 10" ``` ## Browser Rendering API Usage ### Free Tier Limits - 10 minutes per day - Sufficient for daily scraping (each run should take < 1 minute) ### API Endpoints Used 1. **POST /content** - Fetch fully rendered HTML - Waits for JavaScript to execute - Returns complete DOM after rendering - Options: `waitUntil`, `rejectResourceTypes`, `timeout` 2. **POST /scrape** (fallback) - Extract specific elements - Target specific CSS selectors - Returns structured element data - Useful if full HTML extraction fails ### Error Handling The scraper includes multiple fallback strategies: 1. Try `/content` endpoint first (full HTML) 2. If no data found, try `/scrape` endpoint (targeted extraction) 3. Multiple parsing patterns for different HTML structures 4. Graceful degradation if API fails ### Troubleshooting **No benchmarks found:** - Check if vpsbenchmarks.com changed their HTML structure - Examine the rendered HTML output in logs - Adjust CSS selectors in `scrapeBenchmarksWithScrapeAPI()` - Update parsing patterns in `extractBenchmarksFromHTML()` **Browser Rendering API errors:** - Check daily quota usage - Verify BROWSER binding is configured in wrangler.toml - Check network connectivity to Browser Rendering API - Review timeout settings (default: 30 seconds) **Database insertion errors:** - Verify vps_benchmarks table schema - Check unique constraint on (provider_name, plan_name, country_code) - Ensure all required fields are not null ## Next Steps After successful testing: 1. **Analyze the rendered HTML** to identify exact CSS selectors 2. **Update parsing logic** based on actual site structure 3. **Test with real data** to ensure accuracy 4. **Monitor logs** after deployment for any issues 5. **Validate data quality** in the database