obsidian/infra/postgresql-ha.md at ac8101869508b56696a4381f5c687ced27abb434

Files

Heimdall ac81018695 infra: APISIX ingress controller 복구 및 PostgreSQL HA 구조 명확화

- apisix.md: ingress controller 2026-04-08 복구, GatewayProxy + ApisixRoute CRD 호환 검증, 옛 helm values 문제 정정, ApisixRoute 예시 추가
- apisix.md: 외부 통합 etcd 클러스터 사용 명시 (K3s 내부 StatefulSet 아님), global_rules는 chaitin-waf 미포함 (라우트별 적용)
- postgresql-ha.md: pgcat가 HAProxy 단일 백엔드 경유로 변경 (2026-04-08 사고 기록), Patroni 노드 IP 직접 박지 말 것 경고
- gitea.md: 컨테이너 레지스트리 섹션 추가, gitea-registry secret 네임스페이스별 수동 복사 필요 명시

2026-04-08 08:34:07 +00:00

6.1 KiB

Raw Blame History

title, updated, tags

title

updated

개요

PostgreSQL 3노드 HA 클러스터. Patroni가 자동 failover를 관리하고, etcd를 DCS(Distributed Consensus Store)로 사용.

K3s의 kine 데이터스토어로 사용 중. Supabase Free tier에서 로컬로 이전 완료 (2026-04-05).

PostgreSQL 클러스터

노드	호스트	IP	역할
postgres-1	incus-hp2	10.100.2.5	Replica 또는 Leader
postgres-2	incus-kr1	10.100.3.185	Replica 또는 Leader
postgres-3	incus-kr2	10.100.1.83	Replica 또는 Leader

PostgreSQL 17.9, Patroni 4.1.0
Patroni 설정: /etc/patroni.yml
Patroni 서비스: /etc/systemd/system/patroni.service (ExecStart: /opt/patroni/bin/patroni /etc/patroni.yml)
클러스터 이름: nocodb-cluster
레플리케이션: async streaming
Patroni REST API: 각 노드 8008 포트
컨테이너 메모리 제한: 2GiB (limits.memory)
shared_buffers: 512MB, effective_cache_size: 1536MB, work_mem: 8MB, maintenance_work_mem: 128MB, wal_buffers: 16MB

DB 목록

DB	용도
kine	K3s 데이터스토어 (kine)
nocodb	NocoDB
n8n	n8n 워크플로
outline	Outline 위키

Patroni 명령어

# 클러스터 상태 확인
incus exec postgres-1 -- /opt/patroni/bin/patronictl -c /etc/patroni.yml list

# 수동 switchover
incus exec postgres-1 -- /opt/patroni/bin/patronictl -c /etc/patroni.yml switchover

# Replica reinitialize
incus exec postgres-1 -- /opt/patroni/bin/patronictl -c /etc/patroni.yml reinit nocodb-cluster postgres-3 --force

etcd 클러스터 (Patroni DCS)

노드	위치	IP	방식
etcd-nas	Synology NAS (서울)	192.168.9.100	Docker (`quay.io/coreos/etcd:v3.5.21`)
etcd-hp2	Incus 컨테이너 hp2 (서울)	10.100.2.214	Alpine + `apk add etcd` (v3.5.16)
etcd-jp1	Incus 컨테이너 jp1 (도쿄)	10.253.101.233	Alpine + `apk add etcd` (v3.5.16)

NAS: 데이터 /volume1/docker/etcd/data, --restart=always
hp2/jp1: openrc 서비스 (/etc/init.d/etcd), command_background=true
Patroni etcd namespace: /patroni

etcd 확인 명령어

# 클러스터 멤버 확인
incus exec postgres-1 -- etcdctl --endpoints=http://192.168.9.100:2379 member list -w table

# 엔드포인트 상태
incus exec postgres-1 -- etcdctl --endpoints=http://192.168.9.100:2379,http://10.100.2.214:2379,http://10.253.101.233:2379 endpoint status -w table

etcd에 저장된 데이터

prefix	용도
`/patroni`	Patroni DCS (Leader election, 설정)
`/apisix/osaka`	APISIX 오사카 라우팅 설정
`/apisix/tokyo`	APISIX sandbox-tokyo 라우팅 설정 (2026-04-08 NixOS 전환 후 미사용, 데이터는 보존)
`/apisix/seoul`	APISIX 서울 K3s 라우팅 설정

K3s kine 연결

K3s → HAProxy(OpenWrt 192.168.9.1:5432) → Patroni Leader PostgreSQL

K3s config

# /etc/rancher/k3s/config.yaml (kr1, kr2)
datastore-endpoint: "postgres://kine:kine@192.168.9.1:5432/kine"

HAProxy (OpenWrt)

/etc/haproxy.cfg에 PostgreSQL backend 설정. Patroni REST API(/primary 엔드포인트)로 Leader를 자동 감지.

frontend ft_postgres
    bind :5432
    default_backend bk_postgres_primary

backend bk_postgres_primary
    option httpchk GET /primary
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server postgres-1 10.100.2.5:5432 check port 8008
    server postgres-2 10.100.3.185:5432 check port 8008
    server postgres-3 10.100.1.83:5432 check port 8008

Patroni failover 시 HAProxy가 자동으로 새 Leader를 감지 (~3초)
K3s config 변경 없이 Leader 전환 대응

애플리케이션 접속 경로

NocoDB, n8n 등 K3s 내부 애플리케이션은 pgcat(연결 풀링)을 통해 PostgreSQL에 접속.

nocodb/n8n → pgcat (db.svc.cluster.local:6432) → HAProxy 192.168.9.1:5432 → Patroni Leader

db/pgcat-config ConfigMap의 각 풀의 shards.0.servers는 HAProxy 단일 백엔드만 가리켜야 함 (2026-04-08 변경):

[pools.nocodb.shards.0]
database = "nocodb"
servers = [["192.168.9.1", 5432, "primary"]]

[pools.n8n.shards.0]
database = "n8n"
servers = [["192.168.9.1", 5432, "primary"]]

⚠️ 하지 말 것: pgcat에 Patroni 노드 IP(10.100.2.5/3.185/1.83)를 직접 박지 말 것. Patroni failover가 발생하면 pgcat는 옛 primary를 계속 가리키게 되어 nocodb/n8n이 read-only 에러 발생.

pgcat는 풀링 전용으로만 쓰고, leader 탐지는 OpenWrt HAProxy에 위임. query_parser_enabled = false 설정 (read/write splitting 비활성).

2026-04-08 사고 기록

Patroni failover 발생 → pgcat가 옛 primary IP(10.100.2.5)를 hardcoded 참조 → nocodb 마이그레이션 시 cannot execute UPDATE in a read-only transaction 에러로 4시간 가량 CrashLoopBackOff. n8n은 마이그레이션이 없어서 표면화되지는 않았으나 동일한 잠재 문제 존재. 위의 단일 백엔드 구조로 변경하여 항구 해결.

APISIX etcd 통합 (2026-04-06)

기존 각 사이트별 독립 etcd를 통합 etcd 클러스터로 이전.

사이트	기존 etcd	이전 후	prefix
osaka	Docker waf-etcd (로컬)	통합 클러스터 (192.168.9.100)	`/apisix/osaka`
sandbox-tokyo	Docker apisix-etcd (로컬)	통합 클러스터 (10.253.101.233)	`/apisix/tokyo` (2026-04-08 NixOS 전환으로 sandbox-tokyo APISIX 자체가 미가동)
서울 K3s	StatefulSet 3노드 (K3s 내부)	통합 클러스터 (192.168.9.100)	`/apisix/seoul`

데이터 이전: etcdctl make-mirror --prefix /apisix/ --dest-prefix /apisix-{site}/
K3s 내 apisix-etcd StatefulSet + PVC 삭제 완료

6.1 KiB Raw Blame History

개요