diff --git a/history/2026-04-23-netbis-npm-vl-collection.md b/history/2026-04-23-netbis-npm-vl-collection.md new file mode 100644 index 0000000..3b71237 --- /dev/null +++ b/history/2026-04-23-netbis-npm-vl-collection.md @@ -0,0 +1,154 @@ +--- +date: 2026-04-23 +topic: Netbis NPM 6대 → VictoriaLogs 로그 수집 파이프라인 구축 (zlambda Vector 중계) +areas: [services/netbis.md, infra/platform/victorialogs.md, infra/security/crowdsec-safeline.md] +tags: [netbis, vector, victorialogs, npm, zlambda, observability, nixos] +--- + +## 배경 / 결정 + +Netbis 오리진 6대(NPM, Linode Tokyo public)의 nginx access/error log를 사내 VictoriaLogs(`vl.inouter.com`)로 수집. 목적: 향후 CrowdSec 파싱/anomaly-detect 연동 + 요청 패턴 모니터링. + +**핵심 제약**: `vl.inouter.com`의 public DNS가 LAN IP(192.168.9.53)으로만 해석되어 public NPM들이 직접 도달 불가. 해결로 zlambda(Tailscale 100.78.51.18 / public 139.162.71.52) 를 **Vector HTTP 중계** 로 투입. + +선택지 검토: +- (A) 6대에 Tailscale 설치 — 방침상 탈락 (설치 불가) +- (B) zlambda Vector 중계 — **선택** (기존 NixOS 플레이크에 모듈 추가) +- (C) VL public 엔드포인트 노출 — 공격면 확대 우려 탈락 + +## 최종 구조 + +``` +┌────────── public internet ──────────┐ ┌── tailnet ──┐ +NPM-1..6 (Linode Tokyo) │ │ │ + Vector 0.55 (host, file source) │ │ │ + http sink POST:9999 (basic auth) ├─► zlambda Vector-relay 0.45 ─► vl.inouter.com + │ HTTP server bearer=basic │ (K3s Traefik → vlogs svc) + │ ES bulk sink │ + └─────────────────────────────┘ +``` + +- **NPM Vector**: 호스트-레벨. `/etc/vector/vector.yaml` (mode 600, bearer 평문), `vector.service` +- **중계**: zlambda `vector-relay` 컨테이너 (NixOS oci-container, docker network `vector-net`) +- **VL ingest**: `https://vl.inouter.com/insert/elasticsearch` (Tailscale로 LAN 도달, Traefik TLS) + +## 구현 + +### 1. zlambda vector.nix 모듈 + +`~/nixos-infra/vector.nix` 신규 작성. 요점: + +- `virtualisation.oci-containers.containers.vector-relay` — `docker.io/timberio/vector:0.45.0-debian`, `--network=vector-net`, `ports=[9999:9999]` +- `systemd.services.vector-relay-render-config` oneshot — `/var/lib/vector-relay/vector.yaml` 템플릿 렌더 (env interpolation `${VECTOR_BEARER_TOKEN}`) +- `systemd.services.vector-relay-env` oneshot — agenix 복호 결과(`/run/agenix/vector-bearer-token`) 를 `/run/vector-relay/env` 로 이동 후 container에 `environmentFiles` 주입 +- `systemd.services.init-vector-net` oneshot — docker network 생성 +- `age.secrets.vector-bearer-token.file = ./secrets/vector-bearer-token.age` (상대경로, pure eval 호환) + +Vector 컨테이너 config: +```yaml +sources.http_npm: { type: http_server, address: 0.0.0.0:9999, encoding: ndjson, + auth: { strategy: basic, username: npm-relay, password: "${VECTOR_BEARER_TOKEN}" } } +transforms.tag_relay: { type: remap, source: | .relay = "zlambda" } +sinks.vlogs: { type: elasticsearch, endpoints: [https://vl.inouter.com/insert/elasticsearch], + mode: bulk, healthcheck.enabled: false, query._stream_fields: [host, service, log_type] } +``` + +### 2. agenix bearer token + +```bash +# 64자 URL-safe base64 난수 생성 후 age 로 2 recipient 암호화 +age -r "" -r "" \ + -o secrets/vector-bearer-token.age < bearer.secret +``` + +`secrets/secrets.nix` 에 `"vector-bearer-token.age".publicKeys = allUsers ++ allHosts;` 추가. + +### 3. flake/config 배선 + +- `configuration.nix` `imports` 에 `./vector.nix` 추가 +- `configuration.nix` `users.users.root.openssh.authorizedKeys.keys` 에 **heimdall `ops-agents@kaffa` ed25519 공개키** 추가 (runtime 임시 등록을 flake 영속화) +- `git add vector.nix secrets/vector-bearer-token.age secrets/secrets.nix configuration.nix` +- `nixos-rebuild switch --flake .#zlambda` + +### 4. Linode 방화벽 (zlambda id 691875) + +``` +allow-npm-relay-9999: + protocol TCP, ports 9999, action ACCEPT + addresses.ipv4: [172.104.100.11/32, 139.162.114.197/32, 139.162.73.17/32, + 139.162.73.240/32, 172.104.70.137/32, 172.105.226.218/32] +``` + +기존 rules(SSH, CF HTTP/HTTPS, Tailscale UDP 41641, ICMP) 보존. inbound_policy=DROP 유지. + +### 5. NPM 6대 Vector 설치 + +`setup.vector.dev` (공식 sh 스크립트)로 `/usr/local/bin/vector` 0.55 설치. systemd unit 생성 후 enable+start. + +각 호스트 `/etc/vector/vector.yaml` (mode 600): +- **sources.npm_access / npm_error**: file tail + - NPM-1..5: `/root/data/logs/proxy-host-*_access.log` 등 + - NPM-6: `/home/kaffa/npm/data/log/...` (다른 install) +- **transforms.parse_npm_access / parse_npm_error**: remap (VRL) + - NPM proxy log_format 정규식 파싱 → ip/method/path/status/bytes/domain/upstream/UA/referer 구조화 + - 실패 시 NPM standard log_format 재시도, 그래도 실패하면 `log_format="raw"` + - 공통 필드: `.service="npm"`, `.host="