Files
cloud-orchestrator/test-scraper.md
kappa 4cb9da06dc feat: 대역폭 추정 및 DAU 표시 기능 추가
- 동시접속자 기반 월간 대역폭 자동 추정
- DAU(일일활성사용자) 추정치 표시 (동접 × 10-14)
- 대역폭 기반 Linode/Vultr 자동 선택 로직
- 비용 분석에 대역폭 비용 포함
- 지역 미선택시 서울/도쿄/오사카/싱가포르 기본 표시
- 지역별 서버 분리 표시 (GROUP BY instance + region)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 09:40:36 +09:00

3.1 KiB

Testing VPSBenchmarks Scraper with Browser Rendering API

Local Testing

1. Start the development server

npm run dev

2. Trigger the scheduled handler manually

In a separate terminal, run:

curl "http://localhost:8793/__scheduled?cron=0+9+*+*+*"

3. Check the logs

The scraper will:

  • Use Browser Rendering API to fetch rendered HTML from vpsbenchmarks.com
  • Extract benchmark data from the rendered page
  • Insert/update records in the D1 database
  • Log the number of benchmarks found and inserted

Expected output:

[Scraper] Starting VPSBenchmarks.com scrape with Browser Rendering API
[Scraper] Fetching rendered HTML from vpsbenchmarks.com
[Scraper] Rendered HTML length: XXXXX
[Scraper] Extracted X benchmarks from HTML
[Scraper] Found X benchmark entries
[DB] Inserted/Updated: Provider PlanName
[Scraper] Completed in XXXms: X inserted, X skipped, X errors

Production Deployment

1. Deploy to Cloudflare Workers

npm run deploy

2. Verify the cron trigger

The scraper will run automatically daily at 9:00 AM UTC.

3. Check D1 database for new records

npx wrangler d1 execute cloud-instances-db --command="SELECT COUNT(*) as total FROM vps_benchmarks"
npx wrangler d1 execute cloud-instances-db --command="SELECT * FROM vps_benchmarks ORDER BY created_at DESC LIMIT 10"

Browser Rendering API Usage

Free Tier Limits

  • 10 minutes per day
  • Sufficient for daily scraping (each run should take < 1 minute)

API Endpoints Used

  1. POST /content - Fetch fully rendered HTML

    • Waits for JavaScript to execute
    • Returns complete DOM after rendering
    • Options: waitUntil, rejectResourceTypes, timeout
  2. POST /scrape (fallback) - Extract specific elements

    • Target specific CSS selectors
    • Returns structured element data
    • Useful if full HTML extraction fails

Error Handling

The scraper includes multiple fallback strategies:

  1. Try /content endpoint first (full HTML)
  2. If no data found, try /scrape endpoint (targeted extraction)
  3. Multiple parsing patterns for different HTML structures
  4. Graceful degradation if API fails

Troubleshooting

No benchmarks found:

  • Check if vpsbenchmarks.com changed their HTML structure
  • Examine the rendered HTML output in logs
  • Adjust CSS selectors in scrapeBenchmarksWithScrapeAPI()
  • Update parsing patterns in extractBenchmarksFromHTML()

Browser Rendering API errors:

  • Check daily quota usage
  • Verify BROWSER binding is configured in wrangler.toml
  • Check network connectivity to Browser Rendering API
  • Review timeout settings (default: 30 seconds)

Database insertion errors:

  • Verify vps_benchmarks table schema
  • Check unique constraint on (provider_name, plan_name, country_code)
  • Ensure all required fields are not null

Next Steps

After successful testing:

  1. Analyze the rendered HTML to identify exact CSS selectors
  2. Update parsing logic based on actual site structure
  3. Test with real data to ensure accuracy
  4. Monitor logs after deployment for any issues
  5. Validate data quality in the database