feat: 대역폭 추정 및 DAU 표시 기능 추가
- 동시접속자 기반 월간 대역폭 자동 추정 - DAU(일일활성사용자) 추정치 표시 (동접 × 10-14) - 대역폭 기반 Linode/Vultr 자동 선택 로직 - 비용 분석에 대역폭 비용 포함 - 지역 미선택시 서울/도쿄/오사카/싱가포르 기본 표시 - 지역별 서버 분리 표시 (GROUP BY instance + region) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
103
test-scraper.md
Normal file
103
test-scraper.md
Normal file
@@ -0,0 +1,103 @@
|
||||
# Testing VPSBenchmarks Scraper with Browser Rendering API
|
||||
|
||||
## Local Testing
|
||||
|
||||
### 1. Start the development server
|
||||
```bash
|
||||
npm run dev
|
||||
```
|
||||
|
||||
### 2. Trigger the scheduled handler manually
|
||||
In a separate terminal, run:
|
||||
```bash
|
||||
curl "http://localhost:8793/__scheduled?cron=0+9+*+*+*"
|
||||
```
|
||||
|
||||
### 3. Check the logs
|
||||
The scraper will:
|
||||
- Use Browser Rendering API to fetch rendered HTML from vpsbenchmarks.com
|
||||
- Extract benchmark data from the rendered page
|
||||
- Insert/update records in the D1 database
|
||||
- Log the number of benchmarks found and inserted
|
||||
|
||||
Expected output:
|
||||
```
|
||||
[Scraper] Starting VPSBenchmarks.com scrape with Browser Rendering API
|
||||
[Scraper] Fetching rendered HTML from vpsbenchmarks.com
|
||||
[Scraper] Rendered HTML length: XXXXX
|
||||
[Scraper] Extracted X benchmarks from HTML
|
||||
[Scraper] Found X benchmark entries
|
||||
[DB] Inserted/Updated: Provider PlanName
|
||||
[Scraper] Completed in XXXms: X inserted, X skipped, X errors
|
||||
```
|
||||
|
||||
## Production Deployment
|
||||
|
||||
### 1. Deploy to Cloudflare Workers
|
||||
```bash
|
||||
npm run deploy
|
||||
```
|
||||
|
||||
### 2. Verify the cron trigger
|
||||
The scraper will run automatically daily at 9:00 AM UTC.
|
||||
|
||||
### 3. Check D1 database for new records
|
||||
```bash
|
||||
npx wrangler d1 execute cloud-instances-db --command="SELECT COUNT(*) as total FROM vps_benchmarks"
|
||||
npx wrangler d1 execute cloud-instances-db --command="SELECT * FROM vps_benchmarks ORDER BY created_at DESC LIMIT 10"
|
||||
```
|
||||
|
||||
## Browser Rendering API Usage
|
||||
|
||||
### Free Tier Limits
|
||||
- 10 minutes per day
|
||||
- Sufficient for daily scraping (each run should take < 1 minute)
|
||||
|
||||
### API Endpoints Used
|
||||
|
||||
1. **POST /content** - Fetch fully rendered HTML
|
||||
- Waits for JavaScript to execute
|
||||
- Returns complete DOM after rendering
|
||||
- Options: `waitUntil`, `rejectResourceTypes`, `timeout`
|
||||
|
||||
2. **POST /scrape** (fallback) - Extract specific elements
|
||||
- Target specific CSS selectors
|
||||
- Returns structured element data
|
||||
- Useful if full HTML extraction fails
|
||||
|
||||
### Error Handling
|
||||
|
||||
The scraper includes multiple fallback strategies:
|
||||
1. Try `/content` endpoint first (full HTML)
|
||||
2. If no data found, try `/scrape` endpoint (targeted extraction)
|
||||
3. Multiple parsing patterns for different HTML structures
|
||||
4. Graceful degradation if API fails
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
**No benchmarks found:**
|
||||
- Check if vpsbenchmarks.com changed their HTML structure
|
||||
- Examine the rendered HTML output in logs
|
||||
- Adjust CSS selectors in `scrapeBenchmarksWithScrapeAPI()`
|
||||
- Update parsing patterns in `extractBenchmarksFromHTML()`
|
||||
|
||||
**Browser Rendering API errors:**
|
||||
- Check daily quota usage
|
||||
- Verify BROWSER binding is configured in wrangler.toml
|
||||
- Check network connectivity to Browser Rendering API
|
||||
- Review timeout settings (default: 30 seconds)
|
||||
|
||||
**Database insertion errors:**
|
||||
- Verify vps_benchmarks table schema
|
||||
- Check unique constraint on (provider_name, plan_name, country_code)
|
||||
- Ensure all required fields are not null
|
||||
|
||||
## Next Steps
|
||||
|
||||
After successful testing:
|
||||
|
||||
1. **Analyze the rendered HTML** to identify exact CSS selectors
|
||||
2. **Update parsing logic** based on actual site structure
|
||||
3. **Test with real data** to ensure accuracy
|
||||
4. **Monitor logs** after deployment for any issues
|
||||
5. **Validate data quality** in the database
|
||||
Reference in New Issue
Block a user