- 동시접속자 기반 월간 대역폭 자동 추정 - DAU(일일활성사용자) 추정치 표시 (동접 × 10-14) - 대역폭 기반 Linode/Vultr 자동 선택 로직 - 비용 분석에 대역폭 비용 포함 - 지역 미선택시 서울/도쿄/오사카/싱가포르 기본 표시 - 지역별 서버 분리 표시 (GROUP BY instance + region) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.1 KiB
3.1 KiB
Testing VPSBenchmarks Scraper with Browser Rendering API
Local Testing
1. Start the development server
npm run dev
2. Trigger the scheduled handler manually
In a separate terminal, run:
curl "http://localhost:8793/__scheduled?cron=0+9+*+*+*"
3. Check the logs
The scraper will:
- Use Browser Rendering API to fetch rendered HTML from vpsbenchmarks.com
- Extract benchmark data from the rendered page
- Insert/update records in the D1 database
- Log the number of benchmarks found and inserted
Expected output:
[Scraper] Starting VPSBenchmarks.com scrape with Browser Rendering API
[Scraper] Fetching rendered HTML from vpsbenchmarks.com
[Scraper] Rendered HTML length: XXXXX
[Scraper] Extracted X benchmarks from HTML
[Scraper] Found X benchmark entries
[DB] Inserted/Updated: Provider PlanName
[Scraper] Completed in XXXms: X inserted, X skipped, X errors
Production Deployment
1. Deploy to Cloudflare Workers
npm run deploy
2. Verify the cron trigger
The scraper will run automatically daily at 9:00 AM UTC.
3. Check D1 database for new records
npx wrangler d1 execute cloud-instances-db --command="SELECT COUNT(*) as total FROM vps_benchmarks"
npx wrangler d1 execute cloud-instances-db --command="SELECT * FROM vps_benchmarks ORDER BY created_at DESC LIMIT 10"
Browser Rendering API Usage
Free Tier Limits
- 10 minutes per day
- Sufficient for daily scraping (each run should take < 1 minute)
API Endpoints Used
-
POST /content - Fetch fully rendered HTML
- Waits for JavaScript to execute
- Returns complete DOM after rendering
- Options:
waitUntil,rejectResourceTypes,timeout
-
POST /scrape (fallback) - Extract specific elements
- Target specific CSS selectors
- Returns structured element data
- Useful if full HTML extraction fails
Error Handling
The scraper includes multiple fallback strategies:
- Try
/contentendpoint first (full HTML) - If no data found, try
/scrapeendpoint (targeted extraction) - Multiple parsing patterns for different HTML structures
- Graceful degradation if API fails
Troubleshooting
No benchmarks found:
- Check if vpsbenchmarks.com changed their HTML structure
- Examine the rendered HTML output in logs
- Adjust CSS selectors in
scrapeBenchmarksWithScrapeAPI() - Update parsing patterns in
extractBenchmarksFromHTML()
Browser Rendering API errors:
- Check daily quota usage
- Verify BROWSER binding is configured in wrangler.toml
- Check network connectivity to Browser Rendering API
- Review timeout settings (default: 30 seconds)
Database insertion errors:
- Verify vps_benchmarks table schema
- Check unique constraint on (provider_name, plan_name, country_code)
- Ensure all required fields are not null
Next Steps
After successful testing:
- Analyze the rendered HTML to identify exact CSS selectors
- Update parsing logic based on actual site structure
- Test with real data to ensure accuracy
- Monitor logs after deployment for any issues
- Validate data quality in the database