Files

kappa 4cb9da06dc feat: 대역폭 추정 및 DAU 표시 기능 추가

- 동시접속자 기반 월간 대역폭 자동 추정
- DAU(일일활성사용자) 추정치 표시 (동접 × 10-14)
- 대역폭 기반 Linode/Vultr 자동 선택 로직
- 비용 분석에 대역폭 비용 포함
- 지역 미선택시 서울/도쿄/오사카/싱가포르 기본 표시
- 지역별 서버 분리 표시 (GROUP BY instance + region)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-25 09:40:36 +09:00

3.1 KiB

Raw Permalink Blame History

Testing VPSBenchmarks Scraper with Browser Rendering API

Local Testing

1. Start the development server

npm run dev

2. Trigger the scheduled handler manually

In a separate terminal, run:

curl "http://localhost:8793/__scheduled?cron=0+9+*+*+*"

3. Check the logs

The scraper will:

Use Browser Rendering API to fetch rendered HTML from vpsbenchmarks.com
Extract benchmark data from the rendered page
Insert/update records in the D1 database
Log the number of benchmarks found and inserted

Expected output:

[Scraper] Starting VPSBenchmarks.com scrape with Browser Rendering API
[Scraper] Fetching rendered HTML from vpsbenchmarks.com
[Scraper] Rendered HTML length: XXXXX
[Scraper] Extracted X benchmarks from HTML
[Scraper] Found X benchmark entries
[DB] Inserted/Updated: Provider PlanName
[Scraper] Completed in XXXms: X inserted, X skipped, X errors

Production Deployment

1. Deploy to Cloudflare Workers

npm run deploy

2. Verify the cron trigger

The scraper will run automatically daily at 9:00 AM UTC.

3. Check D1 database for new records

npx wrangler d1 execute cloud-instances-db --command="SELECT COUNT(*) as total FROM vps_benchmarks"
npx wrangler d1 execute cloud-instances-db --command="SELECT * FROM vps_benchmarks ORDER BY created_at DESC LIMIT 10"

Browser Rendering API Usage

Free Tier Limits

10 minutes per day
Sufficient for daily scraping (each run should take < 1 minute)

API Endpoints Used

POST /content - Fetch fully rendered HTML
- Waits for JavaScript to execute
- Returns complete DOM after rendering
- Options: waitUntil, rejectResourceTypes, timeout
POST /scrape (fallback) - Extract specific elements
- Target specific CSS selectors
- Returns structured element data
- Useful if full HTML extraction fails

Error Handling

The scraper includes multiple fallback strategies:

Try /content endpoint first (full HTML)
If no data found, try /scrape endpoint (targeted extraction)
Multiple parsing patterns for different HTML structures
Graceful degradation if API fails

Troubleshooting

No benchmarks found:

Check if vpsbenchmarks.com changed their HTML structure
Examine the rendered HTML output in logs
Adjust CSS selectors in scrapeBenchmarksWithScrapeAPI()
Update parsing patterns in extractBenchmarksFromHTML()

Browser Rendering API errors:

Check daily quota usage
Verify BROWSER binding is configured in wrangler.toml
Check network connectivity to Browser Rendering API
Review timeout settings (default: 30 seconds)

Database insertion errors:

Verify vps_benchmarks table schema
Check unique constraint on (provider_name, plan_name, country_code)
Ensure all required fields are not null

Next Steps

After successful testing:

Analyze the rendered HTML to identify exact CSS selectors
Update parsing logic based on actual site structure
Test with real data to ensure accuracy
Monitor logs after deployment for any issues
Validate data quality in the database

3.1 KiB Raw Permalink Blame History