- 동시접속자 기반 월간 대역폭 자동 추정 - DAU(일일활성사용자) 추정치 표시 (동접 × 10-14) - 대역폭 기반 Linode/Vultr 자동 선택 로직 - 비용 분석에 대역폭 비용 포함 - 지역 미선택시 서울/도쿄/오사카/싱가포르 기본 표시 - 지역별 서버 분리 표시 (GROUP BY instance + region) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
104 lines
3.1 KiB
Markdown
104 lines
3.1 KiB
Markdown
# Testing VPSBenchmarks Scraper with Browser Rendering API
|
|
|
|
## Local Testing
|
|
|
|
### 1. Start the development server
|
|
```bash
|
|
npm run dev
|
|
```
|
|
|
|
### 2. Trigger the scheduled handler manually
|
|
In a separate terminal, run:
|
|
```bash
|
|
curl "http://localhost:8793/__scheduled?cron=0+9+*+*+*"
|
|
```
|
|
|
|
### 3. Check the logs
|
|
The scraper will:
|
|
- Use Browser Rendering API to fetch rendered HTML from vpsbenchmarks.com
|
|
- Extract benchmark data from the rendered page
|
|
- Insert/update records in the D1 database
|
|
- Log the number of benchmarks found and inserted
|
|
|
|
Expected output:
|
|
```
|
|
[Scraper] Starting VPSBenchmarks.com scrape with Browser Rendering API
|
|
[Scraper] Fetching rendered HTML from vpsbenchmarks.com
|
|
[Scraper] Rendered HTML length: XXXXX
|
|
[Scraper] Extracted X benchmarks from HTML
|
|
[Scraper] Found X benchmark entries
|
|
[DB] Inserted/Updated: Provider PlanName
|
|
[Scraper] Completed in XXXms: X inserted, X skipped, X errors
|
|
```
|
|
|
|
## Production Deployment
|
|
|
|
### 1. Deploy to Cloudflare Workers
|
|
```bash
|
|
npm run deploy
|
|
```
|
|
|
|
### 2. Verify the cron trigger
|
|
The scraper will run automatically daily at 9:00 AM UTC.
|
|
|
|
### 3. Check D1 database for new records
|
|
```bash
|
|
npx wrangler d1 execute cloud-instances-db --command="SELECT COUNT(*) as total FROM vps_benchmarks"
|
|
npx wrangler d1 execute cloud-instances-db --command="SELECT * FROM vps_benchmarks ORDER BY created_at DESC LIMIT 10"
|
|
```
|
|
|
|
## Browser Rendering API Usage
|
|
|
|
### Free Tier Limits
|
|
- 10 minutes per day
|
|
- Sufficient for daily scraping (each run should take < 1 minute)
|
|
|
|
### API Endpoints Used
|
|
|
|
1. **POST /content** - Fetch fully rendered HTML
|
|
- Waits for JavaScript to execute
|
|
- Returns complete DOM after rendering
|
|
- Options: `waitUntil`, `rejectResourceTypes`, `timeout`
|
|
|
|
2. **POST /scrape** (fallback) - Extract specific elements
|
|
- Target specific CSS selectors
|
|
- Returns structured element data
|
|
- Useful if full HTML extraction fails
|
|
|
|
### Error Handling
|
|
|
|
The scraper includes multiple fallback strategies:
|
|
1. Try `/content` endpoint first (full HTML)
|
|
2. If no data found, try `/scrape` endpoint (targeted extraction)
|
|
3. Multiple parsing patterns for different HTML structures
|
|
4. Graceful degradation if API fails
|
|
|
|
### Troubleshooting
|
|
|
|
**No benchmarks found:**
|
|
- Check if vpsbenchmarks.com changed their HTML structure
|
|
- Examine the rendered HTML output in logs
|
|
- Adjust CSS selectors in `scrapeBenchmarksWithScrapeAPI()`
|
|
- Update parsing patterns in `extractBenchmarksFromHTML()`
|
|
|
|
**Browser Rendering API errors:**
|
|
- Check daily quota usage
|
|
- Verify BROWSER binding is configured in wrangler.toml
|
|
- Check network connectivity to Browser Rendering API
|
|
- Review timeout settings (default: 30 seconds)
|
|
|
|
**Database insertion errors:**
|
|
- Verify vps_benchmarks table schema
|
|
- Check unique constraint on (provider_name, plan_name, country_code)
|
|
- Ensure all required fields are not null
|
|
|
|
## Next Steps
|
|
|
|
After successful testing:
|
|
|
|
1. **Analyze the rendered HTML** to identify exact CSS selectors
|
|
2. **Update parsing logic** based on actual site structure
|
|
3. **Test with real data** to ensure accuracy
|
|
4. **Monitor logs** after deployment for any issues
|
|
5. **Validate data quality** in the database
|