From 620abeae79d7d038e4aae8ec44aa8a625dd1a5be Mon Sep 17 00:00:00 2001 From: kaffa Date: Sun, 19 Apr 2026 14:36:03 +0900 Subject: [PATCH] =?UTF-8?q?infra-hosts:=20Descheduler=20=EC=84=A4=EC=B9=98?= =?UTF-8?q?=20=EA=B8=B0=EB=A1=9D=20(kr2=20OOM=20freeze=20=EB=8C=80?= =?UTF-8?q?=EC=9D=91)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- infra/compute/infra-hosts.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/infra/compute/infra-hosts.md b/infra/compute/infra-hosts.md index 128ae6c..8f74411 100644 --- a/infra/compute/infra-hosts.md +++ b/infra/compute/infra-hosts.md @@ -27,6 +27,15 @@ tags: [infra, network, kr-zone, openwrt] 서울존 4대(kr1, kr2, hp1, hp2)를 K3s v1.34.5+k3s1 클러스터로 구성. **kr1/kr2는 control-plane, hp1/hp2는 worker(k3s-agent)**. +### Descheduler (2026-04-19 설치) + +CronJob `kube-system/descheduler`, 30분 주기, helm `descheduler/descheduler` v0.35.1. +- **LowNodeUtilization**: 메모리/CPU 30% 미만 노드 → 70% 초과 노드에서 pod evict하여 분산 +- **RemoveDuplicates**: 같은 Deployment pod이 한 노드에 몰리면 분산 +- **RemovePodsHavingTooManyRestarts**: 재시작 10회 초과 pod 정리 +- evict 제외: kube-system, longhorn-system +- 배경: 2026-04-19 kr2(30GB) OOM freeze — K3s pod 33개 + Incus 9개 = 42 워크로드 과적, 커널 freeze 후 물리 재부팅 + | 노드 | LAN IP | OS | |------|--------|----| | incus-hp1 | 192.168.9.227 | Debian 13 (trixie) |