From 782fff8fe964cd664b53b8b8f96a6d7db2c5d43b Mon Sep 17 00:00:00 2001 From: kaffa Date: Tue, 21 Apr 2026 07:44:09 +0900 Subject: [PATCH] k3s: document kr2 kubelet memory reserve as intentional OOM mitigation Ties the existing /etc/rancher/k3s/config.yaml kubelet-arg (system-reserved=8Gi, eviction-hard<2Gi) to the 2026-04-19 OOM freeze incident so it won't be flagged as mystery asymmetry in future audits. Closes item 6 of 2026-04-20 K3s improvements. Co-Authored-By: Claude Opus 4.7 (1M context) --- infra/compute/infra-hosts.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/infra/compute/infra-hosts.md b/infra/compute/infra-hosts.md index 4618d3c..5aec35c 100644 --- a/infra/compute/infra-hosts.md +++ b/infra/compute/infra-hosts.md @@ -36,6 +36,21 @@ CronJob `kube-system/descheduler`, 30분 주기, helm `descheduler/descheduler` - evict 제외: kube-system, longhorn-system - 배경: 2026-04-19 kr2(30GB) OOM freeze — K3s pod 33개 + Incus 9개 = 42 워크로드 과적, 커널 freeze 후 물리 재부팅 +### kr2 kubelet memory reserve (2026-04-19 OOM 대응) + +kr2만 `/etc/rancher/k3s/config.yaml` 에 명시 설정: + +```yaml +kubelet-arg: + - "system-reserved=memory=8Gi" + - "eviction-hard=memory.available<2Gi" +``` + +- capacity 32Gi → allocatable 21.6Gi (약 10Gi system reserve) +- 배경: 위 2026-04-19 OOM freeze 재발 방지. kr2는 K3s control-plane + Incus (default 2 + inbest 7) 동시 호스팅으로 RAM 30GiB 중 non-K3s 예약 필요 +- 타 노드(kr1 62GiB / hp1·hp2 188GiB)는 reserve 없음 — kr2 전용 조치 +- 변경 금지 — 제거 시 OOM 재현 위험. 미래 kr2 RAM 증설 시 reserve 크기 축소 재검토 가능 + ### Longhorn 자동 복구 설정 (2026-04-19) | 설정 | 값 | 효과 |