feat(phase-5-3): 모니터링 강화
logger.ts, metrics.ts, /api/metrics 추가 Version: e3bcb4ae
This commit is contained in:
170
docs/metrics-api-example.md
Normal file
170
docs/metrics-api-example.md
Normal file
@@ -0,0 +1,170 @@
|
||||
# Metrics API Example Response
|
||||
|
||||
## Endpoint: GET /api/metrics
|
||||
|
||||
**Authentication:** Bearer Token (WEBHOOK_SECRET)
|
||||
|
||||
**Example Request:**
|
||||
```bash
|
||||
curl -X GET https://telegram-summary-bot.kappa-d8e.workers.dev/api/metrics \
|
||||
-H "Authorization: Bearer your-webhook-secret"
|
||||
```
|
||||
|
||||
## Response Format
|
||||
|
||||
### Successful Response (200 OK)
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-01-19T07:35:42.123Z",
|
||||
"circuitBreakers": {
|
||||
"openai": {
|
||||
"state": "CLOSED",
|
||||
"failures": 0,
|
||||
"lastFailureTime": null,
|
||||
"stats": {
|
||||
"totalRequests": 1250,
|
||||
"totalFailures": 5,
|
||||
"totalSuccesses": 1245
|
||||
},
|
||||
"config": {
|
||||
"failureThreshold": 3,
|
||||
"resetTimeoutMs": 30000,
|
||||
"monitoringWindowMs": 60000
|
||||
}
|
||||
}
|
||||
},
|
||||
"metrics": {
|
||||
"api_calls": {
|
||||
"openai": {
|
||||
"count": 1250,
|
||||
"avg_duration": 0
|
||||
}
|
||||
},
|
||||
"errors": {
|
||||
"retry_exhausted": 0,
|
||||
"circuit_breaker_open": 0
|
||||
},
|
||||
"cache": {
|
||||
"hit_rate": 0
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Circuit States
|
||||
|
||||
| State | Description | Behavior |
|
||||
|-------|-------------|----------|
|
||||
| `CLOSED` | Normal operation | All requests pass through |
|
||||
| `HALF_OPEN` | Testing recovery | Single test request allowed |
|
||||
| `OPEN` | Service unavailable | All requests blocked immediately |
|
||||
|
||||
### When Circuit is OPEN
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2026-01-19T07:40:15.456Z",
|
||||
"circuitBreakers": {
|
||||
"openai": {
|
||||
"state": "OPEN",
|
||||
"failures": 3,
|
||||
"lastFailureTime": "2026-01-19T07:40:10.123Z",
|
||||
"stats": {
|
||||
"totalRequests": 1253,
|
||||
"totalFailures": 8,
|
||||
"totalSuccesses": 1245
|
||||
},
|
||||
"config": {
|
||||
"failureThreshold": 3,
|
||||
"resetTimeoutMs": 30000,
|
||||
"monitoringWindowMs": 60000
|
||||
}
|
||||
}
|
||||
},
|
||||
"metrics": {
|
||||
"api_calls": {
|
||||
"openai": {
|
||||
"count": 1253,
|
||||
"avg_duration": 0
|
||||
}
|
||||
},
|
||||
"errors": {
|
||||
"retry_exhausted": 0,
|
||||
"circuit_breaker_open": 1
|
||||
},
|
||||
"cache": {
|
||||
"hit_rate": 0
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Error Responses
|
||||
|
||||
**401 Unauthorized:**
|
||||
```json
|
||||
{
|
||||
"error": "Unauthorized"
|
||||
}
|
||||
```
|
||||
|
||||
**500 Internal Server Error:**
|
||||
```json
|
||||
{
|
||||
"error": "Error message here"
|
||||
}
|
||||
```
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Monitoring Dashboard
|
||||
- Poll this endpoint periodically to track Circuit Breaker health
|
||||
- Alert when state changes to OPEN
|
||||
- Track failure rate trends
|
||||
|
||||
### Debugging
|
||||
- Check Circuit Breaker state during production issues
|
||||
- Verify if service degradation is due to circuit being open
|
||||
- Review recent failure counts
|
||||
|
||||
### Performance Analysis
|
||||
- Monitor total requests and success rate
|
||||
- Identify patterns in failure occurrences
|
||||
- Validate circuit breaker thresholds
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
The `metrics` section is designed to be extensible:
|
||||
|
||||
```typescript
|
||||
// Planned additions:
|
||||
metrics: {
|
||||
api_calls: {
|
||||
openai: { count: number; avg_duration: number },
|
||||
namecheap: { count: number; avg_duration: number },
|
||||
brave: { count: number; avg_duration: number }
|
||||
},
|
||||
errors: {
|
||||
retry_exhausted: number,
|
||||
circuit_breaker_open: number,
|
||||
timeout: number
|
||||
},
|
||||
cache: {
|
||||
hit_rate: number,
|
||||
total_hits: number,
|
||||
total_misses: number
|
||||
},
|
||||
database: {
|
||||
query_count: number,
|
||||
avg_duration: number
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- Metrics reset on Worker restart (no persistence)
|
||||
- Circuit Breaker state is per-instance (not shared across Workers)
|
||||
- `lastFailureTime` is only populated when failures exist
|
||||
- All timestamps are in ISO 8601 format (UTC)
|
||||
Reference in New Issue
Block a user