obsidian/infra/postgresql-ha.md at 33ce94a75afc46bd8e7cf2bf84434a2897750844

kaffa/obsidian

Fork 0

Files

heimdall 33ce94a75a pgpool 전면 전환 + pgcat 퇴역: postgresql-ha.md 전면 갱신

2026-04-16 12:24:39 +09:00

7.5 KiB

Raw Blame History

title, updated, tags

title

updated

개요

PostgreSQL 3노드 HA 클러스터. Patroni가 자동 failover를 관리하고, etcd를 DCS(Distributed Consensus Store)로 사용.

K3s의 kine 데이터스토어로 사용 중.

PostgreSQL 클러스터

노드	호스트	IP	역할
postgres-1	incus-hp2	10.100.2.5	Replica 또는 Leader
postgres-2	incus-kr1	10.100.3.185	Replica 또는 Leader
postgres-3	incus-kr2	10.100.1.83	Replica 또는 Leader

PostgreSQL 17.9, Patroni 4.1.0
Patroni 설정: /etc/patroni.yml
Patroni 서비스: /etc/systemd/system/patroni.service (ExecStart: /opt/patroni/bin/patroni /etc/patroni.yml)
클러스터 이름: nocodb-cluster
레플리케이션: async streaming
Patroni REST API: 각 노드 8008 포트
컨테이너 메모리 제한: 2GiB (limits.memory)
shared_buffers: 512MB, effective_cache_size: 1536MB, work_mem: 8MB, maintenance_work_mem: 128MB, wal_buffers: 16MB

DB 목록

DB	용도
kine	K3s 데이터스토어 (kine)
nocodb	NocoDB
n8n	n8n 워크플로
outline	Outline 위키

Patroni 명령어

# 클러스터 상태 확인
incus exec postgres-1 -- /opt/patroni/bin/patronictl -c /etc/patroni.yml list

# 수동 switchover
incus exec postgres-1 -- /opt/patroni/bin/patronictl -c /etc/patroni.yml switchover

# Replica reinitialize
incus exec postgres-1 -- /opt/patroni/bin/patronictl -c /etc/patroni.yml reinit nocodb-cluster postgres-3 --force

etcd 클러스터 (Patroni DCS)

노드	위치	IP	방식
etcd-nas	Synology NAS (서울)	192.168.9.100	Docker (`quay.io/coreos/etcd:v3.5.21`)
etcd-mbp	kaffa-macbookpro (서울, Tailscale)	100.115.154.78	Docker (`quay.io/coreos/etcd:v3.5.17`) via colima, peer/client는 socat으로 Tailscale 노출
etcd-jp1	Incus 컨테이너 jp1 (도쿄)	10.253.101.233	Alpine + `apk add etcd` (v3.5.16)

NAS: 데이터 /volume1/docker/etcd/data, --restart=always
jp1: openrc 서비스 (/etc/init.d/etcd), command_background=true
mbp: docker container etcd (named volume etcd-data), client/peer 모두 호스트 127.0.0.1만 노출 → socat이 Tailscale IP 100.115.154.78의 2379/2380으로 forward (~/Library/LaunchAgents/com.kaffa.etcd-socat{,-peer}.plist)
Patroni etcd namespace: /patroni

etcd 확인 명령어

# 클러스터 멤버 확인
incus exec postgres-1 -- etcdctl --endpoints=http://192.168.9.100:2379 member list -w table

# 엔드포인트 상태
incus exec postgres-1 -- etcdctl --endpoints=http://192.168.9.100:2379,http://100.115.154.78:2379,http://10.253.101.233:2379 endpoint status -w table

etcd에 저장된 데이터

prefix	용도
`/patroni`	Patroni DCS (Leader election, 설정)
`/apisix/osaka`	APISIX 오사카 라우팅 설정
`/apisix/tokyo`	APISIX sandbox-tokyo 라우팅 설정 (미사용, 데이터 보존)
`/apisix/seoul`	APISIX 서울 K3s 라우팅 설정

K3s kine 연결

K3s → HAProxy(OpenWrt 192.168.9.1:5432) → Patroni Leader PostgreSQL

K3s config

# /etc/rancher/k3s/config.yaml (kr1, kr2)
datastore-endpoint: "postgres://kine:kine@192.168.9.1:5432/kine"

HAProxy (OpenWrt)

/etc/haproxy.cfg에 PostgreSQL backend 설정. Patroni REST API(/primary 엔드포인트)로 Leader를 자동 감지.

frontend ft_postgres
    bind :5432
    default_backend bk_postgres_primary

backend bk_postgres_primary
    option httpchk GET /primary
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server postgres-1 10.100.2.5:5432 check port 8008
    server postgres-2 10.100.3.185:5432 check port 8008
    server postgres-3 10.100.1.83:5432 check port 8008

Patroni failover 시 HAProxy가 자동으로 새 Leader를 감지 (~3초)
K3s config 변경 없이 Leader 전환 대응

애플리케이션 접속 경로 — pgpool-II

모든 K3s 내부 애플리케이션(NocoDB, n8n, Outline)은 pgpool-II를 통해 Patroni 3노드에 직접 접속. HAProxy 미경유.

NocoDB/n8n/Outline → pgpool.db.svc.cluster.local:9999 → Patroni 3노드 직결

이전 pgcat+HAProxy 구조는 2026-04-16 폐기. ../history/2026-04-16-pgpool-full-migration

pgpool 구성

이미지: pgpool/pgpool:4.4.3 (공식, DockerHub 최신 태그)
K8s 매니페스트: kaffa/helm-charts repo pgpool/ 디렉토리 (ArgoCD pgpool Application, selfHeal+prune)
replicas=2, podAntiAffinity (topologyKey kubernetes.io/hostname)
PodDisruptionBudget minAvailable: 1
Service: ClusterIP pgpool.db.svc.cluster.local:9999

pgpool.conf 핵심

backend_clustering_mode = 'streaming_replication'

backend_hostname0/1/2 = 10.100.2.5 / 10.100.3.185 / 10.100.1.83
backend_weight = 1, backend_flag = ALLOW_TO_FAILOVER

sr_check_user = 'sr_check'   (REPLICATION + pg_monitor 역할, pg에 생성 필요)
sr_check_period = 10
health_check_period = 10

failover_on_backend_error = off   (Patroni가 promotion 수행)
failover_command = ''
follow_primary_command = ''
use_watchdog = off

load_balance_mode = off   (모든 쿼리 primary — read-your-write 일관성)
num_init_children = 32
max_pool = 4
connection_cache = on
pool_passwd = ''   (미사용 — 아래 인증 참조)

인증

Postgres password_encryption = scram-sha-256 cluster-wide.

allow_clear_text_frontend_auth = on + pool_hba method password → 클라이언트 plaintext 전송 → pgpool 이 backend scram-sha-256 challenge-response 에 직접 사용
pool_passwd = '' — 비활성. pgpool 이 plaintext pool_passwd 엔트리를 자동 md5 해시하여 backend scram 거절되는 문제 회피
K8s Service 내부 트래픽이라 clear-text 는 클러스터 내에서만 노출

Patroni TCP keepalive (유지)

tcp_keepalives_idle = 60
tcp_keepalives_interval = 10
tcp_keepalives_count = 3

pgpool 은 자체 연결 관리가 있어 클라이언트 좀비 문제 없지만, Postgres → pgpool 역방향 dead socket 감지용으로 Patroni 파라미터 유지.

Patroni switchover 통합 테스트 (2026-04-16)

3 서비스 전체 pgpool 경유 상태에서 patronictl switchover --force:

write 복구: 즉시 (t+3ms)
n8n: 3건 transient error · NocoDB: 0건 · Outline: 0건
전 서비스 HTTP 200 유지

이전 pgcat 대비: 동일 시나리오에서 n8n 1038건 / NocoDB 13건 / pod restart 필요

마이그레이션 이력: ../history/2026-04-16-pgpool-n8n-poc · ../history/2026-04-16-pgpool-full-migration

APISIX etcd 사용 현황

사이트	etcd	prefix	비고
osaka	통합 클러스터 (192.168.9.100, ...)	`/apisix/osaka`	Docker APISIX (waf-apisix), apisix#오사카-apisix-osaka
sandbox-tokyo	(미가동)	`/apisix/tokyo`	2026-04-08 NixOS 전환으로 APISIX 자체 폐기, etcd 데이터만 보존
서울 K3s	K3s 내부 apisix-etcd StatefulSet (apisix.apisix.svc:2379)	`/apisix`	2026-04-08 외부 통합에서 K3s 내부로 복귀

APISIX etcd 통합/분리 이력: ../history/2026-04-06-apisix-etcd-consolidation. 현재 외부 통합 etcd는 Patroni DCS + osaka APISIX 전용.

7.5 KiB Raw Blame History

개요