feat: 대역폭 추정 및 DAU 표시 기능 추가

- 동시접속자 기반 월간 대역폭 자동 추정 - DAU(일일활성사용자) 추정치 표시 (동접 × 10-14) - 대역폭 기반 Linode/Vultr 자동 선택 로직 - 비용 분석에 대역폭 비용 포함 - 지역 미선택시 서울/도쿄/오사카/싱가포르 기본 표시 - 지역별 서버 분리 표시 (GROUP BY instance + region) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-25 09:40:36 +09:00
commit 4cb9da06dc
3337 changed files with 1048645 additions and 0 deletions
--- a/test-scraper.md
+++ b/test-scraper.md
@@ -0,0 +1,103 @@
+# Testing VPSBenchmarks Scraper with Browser Rendering API
+
+## Local Testing
+
+### 1. Start the development server
+```bash
+npm run dev
+```
+
+### 2. Trigger the scheduled handler manually
+In a separate terminal, run:
+```bash
+curl "http://localhost:8793/__scheduled?cron=0+9+*+*+*"
+```
+
+### 3. Check the logs
+The scraper will:
+- Use Browser Rendering API to fetch rendered HTML from vpsbenchmarks.com
+- Extract benchmark data from the rendered page
+- Insert/update records in the D1 database
+- Log the number of benchmarks found and inserted
+
+Expected output:
+```
+[Scraper] Starting VPSBenchmarks.com scrape with Browser Rendering API
+[Scraper] Fetching rendered HTML from vpsbenchmarks.com
+[Scraper] Rendered HTML length: XXXXX
+[Scraper] Extracted X benchmarks from HTML
+[Scraper] Found X benchmark entries
+[DB] Inserted/Updated: Provider PlanName
+[Scraper] Completed in XXXms: X inserted, X skipped, X errors
+```
+
+## Production Deployment
+
+### 1. Deploy to Cloudflare Workers
+```bash
+npm run deploy
+```
+
+### 2. Verify the cron trigger
+The scraper will run automatically daily at 9:00 AM UTC.
+
+### 3. Check D1 database for new records
+```bash
+npx wrangler d1 execute cloud-instances-db --command="SELECT COUNT(*) as total FROM vps_benchmarks"
+npx wrangler d1 execute cloud-instances-db --command="SELECT * FROM vps_benchmarks ORDER BY created_at DESC LIMIT 10"
+```
+
+## Browser Rendering API Usage
+
+### Free Tier Limits
+- 10 minutes per day
+- Sufficient for daily scraping (each run should take < 1 minute)
+
+### API Endpoints Used
+
+1. **POST /content** - Fetch fully rendered HTML
+   - Waits for JavaScript to execute
+   - Returns complete DOM after rendering
+   - Options: `waitUntil`, `rejectResourceTypes`, `timeout`
+
+2. **POST /scrape** (fallback) - Extract specific elements
+   - Target specific CSS selectors
+   - Returns structured element data
+   - Useful if full HTML extraction fails
+
+### Error Handling
+
+The scraper includes multiple fallback strategies:
+1. Try `/content` endpoint first (full HTML)
+2. If no data found, try `/scrape` endpoint (targeted extraction)
+3. Multiple parsing patterns for different HTML structures
+4. Graceful degradation if API fails
+
+### Troubleshooting
+
+**No benchmarks found:**
+- Check if vpsbenchmarks.com changed their HTML structure
+- Examine the rendered HTML output in logs
+- Adjust CSS selectors in `scrapeBenchmarksWithScrapeAPI()`
+- Update parsing patterns in `extractBenchmarksFromHTML()`
+
+**Browser Rendering API errors:**
+- Check daily quota usage
+- Verify BROWSER binding is configured in wrangler.toml
+- Check network connectivity to Browser Rendering API
+- Review timeout settings (default: 30 seconds)
+
+**Database insertion errors:**
+- Verify vps_benchmarks table schema
+- Check unique constraint on (provider_name, plan_name, country_code)
+- Ensure all required fields are not null
+
+## Next Steps
+
+After successful testing:
+
+1. **Analyze the rendered HTML** to identify exact CSS selectors
+2. **Update parsing logic** based on actual site structure
+3. **Test with real data** to ensure accuracy
+4. **Monitor logs** after deployment for any issues
+5. **Validate data quality** in the database