k3s: document kr2 kubelet memory reserve as intentional OOM mitigation

Ties the existing /etc/rancher/k3s/config.yaml kubelet-arg (system-reserved=8Gi,
eviction-hard<2Gi) to the 2026-04-19 OOM freeze incident so it won't be
flagged as mystery asymmetry in future audits. Closes item 6 of 2026-04-20
K3s improvements.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
kaffa
2026-04-21 07:44:09 +09:00
parent f8c4274124
commit 782fff8fe9

View File

@@ -36,6 +36,21 @@ CronJob `kube-system/descheduler`, 30분 주기, helm `descheduler/descheduler`
- evict 제외: kube-system, longhorn-system
- 배경: 2026-04-19 kr2(30GB) OOM freeze — K3s pod 33개 + Incus 9개 = 42 워크로드 과적, 커널 freeze 후 물리 재부팅
### kr2 kubelet memory reserve (2026-04-19 OOM 대응)
kr2만 `/etc/rancher/k3s/config.yaml` 에 명시 설정:
```yaml
kubelet-arg:
- "system-reserved=memory=8Gi"
- "eviction-hard=memory.available<2Gi"
```
- capacity 32Gi → allocatable 21.6Gi (약 10Gi system reserve)
- 배경: 위 2026-04-19 OOM freeze 재발 방지. kr2는 K3s control-plane + Incus (default 2 + inbest 7) 동시 호스팅으로 RAM 30GiB 중 non-K3s 예약 필요
- 타 노드(kr1 62GiB / hp1·hp2 188GiB)는 reserve 없음 — kr2 전용 조치
- 변경 금지 — 제거 시 OOM 재현 위험. 미래 kr2 RAM 증설 시 reserve 크기 축소 재검토 가능
### Longhorn 자동 복구 설정 (2026-04-19)
| 설정 | 값 | 효과 |