
들어가며
안녕하세요! Devlos입니다.
이번 포스팅은 CloudNet@ 커뮤니티에서 주최하는 K8S Deploy 5주 차 주제인 "Kubespray 고가용성(HA) 실습 및 클러스터 운영"에 대해 정리한 내용입니다.
1주차에서는 Kubernetes The Hard Way로 클러스터를 수동 구축했고, 2주차에서는 Ansible 기초, 3주차에서는 kubeadm으로 클러스터 구축·업그레이드를, 4주차에서는 Kubespray로 프로덕션 수준의 클러스터 자동 구축을 다뤘다면, 이번 5주차에서는 Kubespray로 고가용성(HA) 클러스터를 구성하고, 앞단 L4 로드밸런서(HAProxy)와 워커 노드의 Client-Side LB(nginx), API 엔드포인트 동작을 이해한 뒤, 노드 추가·삭제, 업그레이드, 모니터링까지 실습합니다.
실습 환경(admin-lb + 컨트롤 플레인 3대 + 워커 2대) 설명부터 시작해, scale.yml을 이용한 노드 추가·삭제, upgrade-cluster.yml을 이용한 무중단 업그레이드, 클러스터 reset, Prometheus/Grafana 설정까지 전반적인 내용을 다룹니다.
실습환경 설명

실습 환경은 클러스터 앞단에 admin-lb(192.168.10.10)라는 L4 로드밸런서(HAProxy)를 두고, 그 뒤로 3대의 컨트롤 플레인 노드(k8s-node1~3)가 고가용성(HA)으로 구성됩니다. 각 컨트롤 플레인 노드에는 kube-apiserver가 떠 있고, 같은 노드의 kubelet과 kube-proxy는 로컬에서 API 서버에 붙어 kubelet은 노드/파드 관리, kube-proxy는 Service/Endpoint watch·list를 수행합니다. 다이어그램에서는 kube-controller-manager와 scheduler는 생략되어 있습니다.
API 서버 3대가 Active일 때 앞단 HAProxy(admin-lb) 를 L4 LB로 두어, 사용자(kubectl 등)가 6443/TCP 한 진입점으로 접속하면 트래픽이 세 대의 kube-apiserver로 로드밸런싱됩니다. 외부·관리 트래픽은 이렇게 고가용화됩니다.
워커 노드(k8s-node4, node5, 192.168.10.14~15)에는 Client-Side LB가 배치됩니다. 이 실습에서는 노드마다 nginx static pod가 그 역할을 하며, 노드 내에서 kubelet과 kube-proxy가 이 로컬 LB(nginx)로 연결하고, nginx가 백엔드 3대의 kube-apiserver로 분산 연결합니다.
즉 워커는 로컬 LB 한 곳만 바라보고, 그 LB가 컨트롤 플레인 3대와 통신하는 구조입니다. 사용자 트래픽은 앞단 L4 LB(HAProxy)로, 워커 트래픽은 노드별 Client-Side LB(nginx)로 각각 고가용·부하 분산됩니다.
| NAME | Description | CPU | RAM | NIC1 | NIC2 | Init Script |
|---|---|---|---|---|---|---|
| admin-lb | kubespary 실행, API LB | 2 | 1GB | 10.0.2.15 | 192.168.10.10 | admin-lb.sh |
| k8s-node1 | K8S ControlPlane | 4 | 2GB | 10.0.2.15 | 192.168.10.11 | init-cfg.sh |
| k8s-node2 | K8S ControlPlane | 4 | 2GB | 10.0.2.15 | 192.168.10.12 | init-cfg.sh |
| k8s-node3 | K8S ControlPlane | 4 | 2GB | 10.0.2.15 | 192.168.10.13 | init-cfg.sh |
| k8s-node4 | K8S Worker | 4 | 2GB | 10.0.2.15 | 192.168.10.14 | init-cfg.sh |
| k8s-node5 | K8S Worker | 4 | 2GB | 10.0.2.15 | 192.168.10.15 | init-cfg.sh |
실습 코드는 다음과 같습니다.
##################################
# Vagrantfile
# K8s HA 실습용 VM을 VirtualBox로 생성. 노드 5대(k8s-node1~5) + admin-lb 1대.
##################################
# Base Image https://portal.cloud.hashicorp.com/vagrant/discover/bento/rockylinux-10.0
BOX_IMAGE = "bento/rockylinux-10.0" # 베이스 박스 이미지 (Rocky Linux 10)
BOX_VERSION = "202510.26.0" # 박스 버전
N = 5 # 생성할 K8s 노드 개수 (k8s-node1 ~ k8s-node5)
Vagrant.configure("2") do |config|
# ---------- K8s 노드 VM (컨트롤플레인 3대 + 워커 2대) ----------
(1..N).each do |i|
config.vm.define "k8s-node#{i}" do |subconfig|
subconfig.vm.box = BOX_IMAGE
subconfig.vm.box_version = BOX_VERSION
subconfig.vm.provider "virtualbox" do |vb|
vb.customize ["modifyvm", :id, "--groups", "/Kubespary-Lab"] # VB 그룹으로 묶어 관리
vb.customize ["modifyvm", :id, "--nicpromisc2", "allow-all"] # 프라이빗 네트워크 NIC 프로미스큐어스
vb.name = "k8s-node#{i}"
vb.cpus = 4
vb.memory = 2048
vb.linked_clone = true # 링크드 클론으로 디스크 절약
end
subconfig.vm.host_name = "k8s-node#{i}"
subconfig.vm.network "private_network", ip: "192.168.10.1#{i}" # 고정 IP (11~15)
subconfig.vm.network "forwarded_port", guest: 22, host: "6000#{i}", auto_correct: true, id: "ssh" # 호스트에서 ssh 60001~60005
subconfig.vm.synced_folder "./", "/vagrant", disabled: true # 공유 폴더 비활성화(성능)
subconfig.vm.provision "shell", path: "init_cfg.sh" # 프로비저닝: 노드 공통 초기화
end
end
# ---------- Admin + L4 로드밸런서 VM (HAProxy, NFS, Kubespray 등 실행) ----------
config.vm.define "admin-lb" do |subconfig|
subconfig.vm.box = BOX_IMAGE
subconfig.vm.box_version = BOX_VERSION
subconfig.vm.provider "virtualbox" do |vb|
vb.customize ["modifyvm", :id, "--groups", "/Kubespary-Lab"]
vb.customize ["modifyvm", :id, "--nicpromisc2", "allow-all"]
vb.name = "admin-lb"
vb.cpus = 2
vb.memory = 1024
vb.linked_clone = true
end
subconfig.vm.host_name = "admin-lb"
subconfig.vm.network "private_network", ip: "192.168.10.10" # API LB VIP
subconfig.vm.network "forwarded_port", guest: 22, host: "60000", auto_correct: true, id: "ssh"
subconfig.vm.synced_folder "./", "/vagrant", disabled: true
subconfig.vm.provision "shell", path: "admin-lb.sh" # 프로비저닝: HAProxy, kubectl, kubespray 등
end
end
##################################
# admin-lb.sh
# admin-lb VM 프로비저닝: HAProxy(L4 LB), NFS, SSH, hosts, Kubespray 인벤토리, kubectl/k9s/helm 등
# 실행 시 인자 $1 = 노드 개수 (예: 5). hosts/SSH 키/인벤토리 반복에 사용.
##################################
#!/usr/bin/env bash
echo ">>>> Initial Config Start <<<<"
# --- 타임존 Asia/Seoul, RTC는 로컬 타임 미사용 ---
echo "[TASK 1] Change Timezone and Enable NTP"
timedatectl set-local-rtc 0
timedatectl set-timezone Asia/Seoul
# --- 방화벽/SELinux 비활성화 (실습 환경용) ---
echo "[TASK 2] Disable firewalld and selinux"
systemctl disable --now firewalld >/dev/null 2>&1
setenforce 0
sed -i 's/^SELINUX=enforcing/SELINUX=permissive/' /etc/selinux/config
# --- /etc/hosts에 admin-lb, k8s-node1~N 등록 (도메인 해석용) ---
echo "[TASK 3] Setting Local DNS Using Hosts file"
sed -i '/^127\.0\.\(1\|2\)\.1/d' /etc/hosts
echo "192.168.10.10 k8s-api-srv.admin-lb.com admin-lb" >> /etc/hosts
for (( i=1; i<=$1; i++ )); do echo "192.168.10.1$i k8s-node$i" >> /etc/hosts; done #5회 반복
# --- enp0s9(프라이빗 네트워크)를 기본 라우트로 쓰지 않도록 설정 ---
echo "[TASK 4] Delete default routing - enp0s9 NIC" # setenforce 0 설정 필요
nmcli connection modify enp0s9 ipv4.never-default yes
nmcli connection up enp0s9 >/dev/null 2>&1
# --- Kubernetes 공식 RPM 저장소 추가 후 kubectl 설치 ---
echo "[TASK 5] Install kubectl"
cat << EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.32/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.32/rpm/repodata/repomd.xml.key
exclude=kubectl
EOF
dnf install -y -q kubectl --disableexcludes=kubernetes >/dev/null 2>&1
# --- HAProxy 설치 및 K8s API(6443) L4 로드밸런서 + Stats(9000) + Prometheus(8405) 설정 ---
echo "[TASK 6] Install HAProxy"
dnf install -y haproxy >/dev/null 2>&1
cat << EOF > /etc/haproxy/haproxy.cfg
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
# utilize system-wide crypto-policies
ssl-default-bind-ciphers PROFILE=SYSTEM
ssl-default-server-ciphers PROFILE=SYSTEM
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option tcplog
option dontlognull
option http-server-close
#option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
# ---------------------------------------------------------------------
# Kubernetes API Server Load Balancer Configuration
# ---------------------------------------------------------------------
frontend k8s-api
bind *:6443
mode tcp
option tcplog
default_backend k8s-api-backend
backend k8s-api-backend
mode tcp
option tcp-check
option log-health-checks
timeout client 3h
timeout server 3h
balance roundrobin
server k8s-node1 192.168.10.11:6443 check check-ssl verify none inter 10000
server k8s-node2 192.168.10.12:6443 check check-ssl verify none inter 10000
server k8s-node3 192.168.10.13:6443 check check-ssl verify none inter 10000
# ---------------------------------------------------------------------
# HAProxy Stats Dashboard - http://192.168.10.10:9000/haproxy_stats
# ---------------------------------------------------------------------
listen stats
bind *:9000
mode http
stats enable
stats uri /haproxy_stats
stats realm HAProxy\ Statistic
stats admin if TRUE
# ---------------------------------------------------------------------
# Configure the Prometheus exporter - curl http://192.168.10.10:8405/metrics
# ---------------------------------------------------------------------
frontend prometheus
bind *:8405
mode http
http-request use-service prometheus-exporter if { path /metrics }
no log
EOF
systemctl enable --now haproxy >/dev/null 2>&1
# --- NFS 서버 설치 및 /srv/nfs/share 공유 (모든 클라이언트 rw) ---
echo "[TASK 7] Install nfs-utils"
dnf install -y nfs-utils >/dev/null 2>&1
systemctl enable --now nfs-server >/dev/null 2>&1
mkdir -p /srv/nfs/share
chown nobody:nobody /srv/nfs/share
chmod 755 /srv/nfs/share
echo '/srv/nfs/share *(rw,async,no_root_squash,no_subtree_check)' > /etc/exports
exportfs -rav
# --- Ansible(Kubespray) 및 sshpass(비밀번호 SSH용) 설치 ---
echo "[TASK 8] Install packages"
dnf install -y python3-pip git sshpass >/dev/null 2>&1
# --- root 비밀번호 설정 및 SSH root 로그인·비밀번호 인증 허용 ---
echo "[TASK 9] Setting SSHD"
echo "root:qwe123" | chpasswd
cat << EOF >> /etc/ssh/sshd_config
PermitRootLogin yes
PasswordAuthentication yes
EOF
systemctl restart sshd >/dev/null 2>&1
# --- SSH 키 생성 후 admin-lb 자신 + 모든 k8s-node에 공개키 배포 (Ansible 비밀번호 없이 접속용) ---
echo "[TASK 10] Setting SSH Key"
ssh-keygen -t rsa -N "" -f /root/.ssh/id_rsa >/dev/null 2>&1
sshpass -p 'qwe123' ssh-copy-id -o StrictHostKeyChecking=no root@192.168.10.10 >/dev/null 2>&1 # cat /root/.ssh/authorized_keys
for (( i=1; i<=$1; i++ )); do sshpass -p 'qwe123' ssh-copy-id -o StrictHostKeyChecking=no root@192.168.10.1$i >/dev/null 2>&1 ; done
ssh -o StrictHostKeyChecking=no root@admin-lb hostname >/dev/null 2>&1
for (( i=1; i<=$1; i++ )); do sshpass -p 'qwe123' ssh -o StrictHostKeyChecking=no root@k8s-node$i hostname >/dev/null 2>&1 ; done
# --- Kubespray v2.29.1 클론 후 inventory mycluster 생성, 컨트롤플레인 3대 + 워커 1대(또는 2대) 정의 ---
echo "[TASK 11] Clone Kubespray Repository"
git clone -b v2.29.1 https://github.com/kubernetes-sigs/kubespray.git /root/kubespray >/dev/null 2>&1
cp -rfp /root/kubespray/inventory/sample /root/kubespray/inventory/mycluster
cat << EOF > /root/kubespray/inventory/mycluster/inventory.ini
[kube_control_plane]
k8s-node1 ansible_host=192.168.10.11 ip=192.168.10.11 etcd_member_name=etcd1
k8s-node2 ansible_host=192.168.10.12 ip=192.168.10.12 etcd_member_name=etcd2
k8s-node3 ansible_host=192.168.10.13 ip=192.168.10.13 etcd_member_name=etcd3
[etcd:children]
kube_control_plane
[kube_node]
k8s-node4 ansible_host=192.168.10.14 ip=192.168.10.14
#k8s-node5 ansible_host=192.168.10.15 ip=192.168.10.15
EOF
# --- Kubespray Ansible 의존성 설치 ---
echo "[TASK 12] Install Python Dependencies"
pip3 install -r /root/kubespray/requirements.txt >/dev/null 2>&1 # pip3 list
# --- K9s CLI (TUI) 설치 (아키텍처에 따라 amd64/arm64) ---
echo "[TASK 13] Install K9s"
CLI_ARCH=amd64
if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi
wget -P /tmp https://github.com/derailed/k9s/releases/latest/download/k9s_linux_${CLI_ARCH}.tar.gz >/dev/null 2>&1
tar -xzf /tmp/k9s_linux_${CLI_ARCH}.tar.gz -C /tmp
chown root:root /tmp/k9s
mv /tmp/k9s /usr/local/bin/
chmod +x /usr/local/bin/k9s
# --- kubectl 출력 컬러링 (kubecolor) ---
echo "[TASK 14] Install kubecolor"
dnf install -y -q 'dnf-command(config-manager)' >/dev/null 2>&1
dnf config-manager --add-repo https://kubecolor.github.io/packages/rpm/kubecolor.repo >/dev/null 2>&1
dnf install -y -q kubecolor >/dev/null 2>&1
# --- Helm 3.18.6 설치 ---
echo "[TASK 15] Install Helm"
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | DESIRED_VERSION=v3.18.6 bash >/dev/null 2>&1
# --- vagrant 사용자 로그인 시 자동으로 root 전환 ---
echo "[TASK 16] ETC"
echo "sudo su -" >> /home/vagrant/.bashrc
echo ">>>> Initial Config End <<<<"
##################################
# init_cfg.sh
# k8s-node1~5 각 VM 프로비저닝: 타임존, 방화벽/SELinux, SWAP 제거, 커널/모듈, hosts, SSH, 패키지 등
# 실행 시 인자 $1 = 노드 개수. hosts 반복 등에 사용.
##################################
#!/usr/bin/env bash
echo ">>>> Initial Config Start <<<<"
# --- 타임존 Asia/Seoul ---
echo "[TASK 1] Change Timezone and Enable NTP"
timedatectl set-local-rtc 0
timedatectl set-timezone Asia/Seoul
# --- 방화벽/SELinux 비활성화 (실습 환경용) ---
echo "[TASK 2] Disable firewalld and selinux"
systemctl disable --now firewalld >/dev/null 2>&1
setenforce 0
sed -i 's/^SELINUX=enforcing/SELINUX=permissive/' /etc/selinux/config
# --- SWAP 비활성화 및 fstab에서 제거, swap 파티션 삭제 (K8s 권장) ---
echo "[TASK 3] Disable and turn off SWAP & Delete swap partitions"
swapoff -a
sed -i '/swap/d' /etc/fstab
sfdisk --delete /dev/sda 2 >/dev/null 2>&1
partprobe /dev/sda >/dev/null 2>&1
# --- overlay, br_netfilter 로드 + 브릿지 트래픽 iptables 통과, IP 포워딩 (K8s/CNI용) ---
echo "[TASK 4] Config kernel & module"
cat << EOF > /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
modprobe overlay >/dev/null 2>&1
modprobe br_netfilter >/dev/null 2>&1
cat << EOF >/etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sysctl --system >/dev/null 2>&1
# --- /etc/hosts에 admin-lb, k8s-node1~N 등록 ---
echo "[TASK 5] Setting Local DNS Using Hosts file"
sed -i '/^127\.0\.\(1\|2\)\.1/d' /etc/hosts
echo "192.168.10.10 k8s-api-srv.admin-lb.com admin-lb" >> /etc/hosts
for (( i=1; i<=$1; i++ )); do echo "192.168.10.1$i k8s-node$i" >> /etc/hosts; done
# --- enp0s9를 기본 게이트웨이로 쓰지 않도록 설정 ---
echo "[TASK 6] Delete default routing - enp0s9 NIC" # setenforce 0 설정 필요
nmcli connection modify enp0s9 ipv4.never-default yes
nmcli connection up enp0s9 >/dev/null 2>&1
# --- root 비밀번호 및 SSH root/비밀번호 인증 (admin-lb에서 Ansible 접속용) ---
echo "[TASK 7] Setting SSHD"
echo "root:qwe123" | chpasswd
cat << EOF >> /etc/ssh/sshd_config
PermitRootLogin yes
PasswordAuthentication yes
EOF
systemctl restart sshd >/dev/null 2>&1
# --- git, nfs-utils (NFS 마운트용) 설치 ---
echo "[TASK 8] Install packages"
dnf install -y git nfs-utils >/dev/null 2>&1
# --- vagrant 로그인 시 자동 root 전환 ---
echo "[TASK 9] ETC"
echo "sudo su -" >> /home/vagrant/.bashrc
echo ">>>> Initial Config End <<<<"
##################################
# 실습 환경 배포 (호스트 PC에서 실행)
##################################
# 실습용 디렉터리 생성 후 해당 경로로 이동
mkdir k8s-ha-kubespary
cd k8s-ha-kubespary
# Vagrantfile, admin-lb.sh, init_cfg.sh 다운로드 (gasida/vagrant-lab 저장소)
curl -O https://raw.githubusercontent.com/gasida/vagrant-lab/refs/heads/main/k8s-ha-kubespary/Vagrantfile
curl -O https://raw.githubusercontent.com/gasida/vagrant-lab/refs/heads/main/k8s-ha-kubespary/admin-lb.sh
curl -O https://raw.githubusercontent.com/gasida/vagrant-lab/refs/heads/main/k8s-ha-kubespary/init_cfg.sh
# VM 6대(admin-lb + k8s-node1~5) 생성 및 프로비저닝 실행
vagrant up
# 생성된 VM 상태 확인
vagrant status
# Current machine states:
# k8s-node1 running (virtualbox)
# k8s-node2 running (virtualbox)
# k8s-node3 running (virtualbox)
# k8s-node4 running (virtualbox)
# k8s-node5 running (virtualbox)
# admin-lb running (virtualbox)
admin-lb 기본 정보 확인
vagrant ssh admin-lb
# 관리 대상 노드 통신 확인
cat /etc/hosts
# # Loopback entries; do not change.
# # For historical reasons, localhost precedes localhost.localdomain:
# 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
# ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
# # See hosts(5) for proper format and other examples:
# # 192.168.1.10 foo.example.org foo
# # 192.168.1.13 bar.example.org bar
# 192.168.10.10 k8s-api-srv.admin-lb.com admin-lb
# 192.168.10.11 k8s-node1
# 192.168.10.12 k8s-node2
# 192.168.10.13 k8s-node3
# 192.168.10.14 k8s-node4
# 192.168.10.15 k8s-node5
for i in {0..5}; do echo ">> k8s-node$i <<"; ssh 192.168.10.1$i hostname; echo; done
# >> k8s-node0 <<
# admin-lb
# >> k8s-node1 <<
# k8s-node1
# >> k8s-node2 <<
# k8s-node2
# >> k8s-node3 <<
# k8s-node3
# >> k8s-node4 <<
# k8s-node4
# >> k8s-node5 <<
# k8s-node5
for i in {1..5}; do echo ">> k8s-node$i <<"; ssh k8s-node$i hostname; echo; done
# >> k8s-node1 <<
# k8s-node1
# >> k8s-node2 <<
# k8s-node2
# >> k8s-node3 <<
# k8s-node3
# >> k8s-node4 <<
# k8s-node4
# >> k8s-node5 <<
# k8s-node5
# 파이썬 버전 정보 확인
python -V && pip -V
# Python 3.12.9
# pip 23.3.2 from /usr/lib/python3.12/site-packages/pip (python 3.12)
# kubespary 작업 디렉터리 및 파일 확인
tree /root/kubespray/ -L 2
# /root/kubespray/
# ├── ansible.cfg
# ├── CHANGELOG.md
# ├── cluster.yml
# ├── CNAME
# ├── code-of-condu
# ...
cd /root/kubespray/
cat ansible.cfg
# [ssh_connection] SSH 연결
# pipelining=True : 파이프라인 사용으로 SSH 호출 횟수 감소, 실행 속도 향상
# ssh_args : ControlMaster로 연결 재사용, 30분 유지, known_hosts 검사 비활성화
# [defaults] 기본 동작
# force_valid_group_names : 인벤토리 그룹 이름에 - 또는 . 허용 (ignore)
# host_key_checking=False : SSH 호스트 키 확인 비활성화 (프로비저닝 환경용)
# gathering = smart : fact 캐시가 유효하면 재수집 생략
# fact_caching = jsonfile : fact를 jsonfile로 캐시 (/tmp, 86400초) → 재실행 시 빠름
# timeout = 300 : 작업 타임아웃 300초
# stdout_callback : 출력 형식 (default)
# display_skipped_hosts=no : 스킵된 호스트 출력 생략
# library = ./library : Kubespray 전용 모듈 경로
# callbacks_enabled : profile_tasks로 태스크별 소요 시간 표시
# roles_path : 역할 검색 경로 (kubespray, venv, 시스템 순)
# deprecation_warnings : deprecated 경고 비활성화
# inventory_ignore_extensions : 해당 확장자 파일은 인벤토리에서 제외
# [inventory]
# ignore_patterns : artifacts, credentials 디렉터리/파일 인벤토리에서 무시
cat /root/kubespray/inventory/mycluster/inventory.ini
# [kube_control_plane]
# k8s-node1 ansible_host=192.168.10.11 ip=192.168.10.11 etcd_member_name=etcd1
# k8s-node2 ansible_host=192.168.10.12 ip=192.168.10.12 etcd_member_name=etcd2
# k8s-node3 ansible_host=192.168.10.13 ip=192.168.10.13 etcd_member_name=etcd3
# [etcd:children]
# kube_control_plane
# [kube_node]
# k8s-node4 ansible_host=192.168.10.14 ip=192.168.10.14
# #k8s-node5 ansible_host=192.168.10.15 ip=192.168.10.15
# NFS Server 정보 확인
systemctl status nfs-server --no-pager
# ● nfs-server.service - NFS server and services
# Loaded: loaded (/usr/lib/systemd/system/nfs-server.service; enabled; preset: disabled)
# Drop-In: /run/systemd/generator/nfs-server.service.d
# └─order-with-mounts.conf
# Active: active (exited) since Sat 2026-02-07 15:21:32 KST; 51min ago
# Invocation: eb69cbd374fa4ab5a056183c453ba8d5
# Docs: man:rpc.nfsd(8)
# man:exportfs(8)
# Process: 5969 ExecReload=/usr/sbin/exportfs -r (code=exited, status=0/SUCCESS)
# Main PID: 5183 (code=exited, status=0/SUCCESS)
# Mem peak: 1.1M
# CPU: 2ms
# Feb 07 15:21:32 admin-lb systemd[1]: Starting nfs-server.service - NFS server and services...
# Feb 07 15:21:32 admin-lb systemd[1]: Finished nfs-server.service - NFS server and services.
# Feb 07 15:22:17 admin-lb systemd[1]: Reloading nfs-server.service - NFS server and services...
# Feb 07 15:22:17 admin-lb systemd[1]: Reloaded nfs-server.service - NFS server and services.
tree /srv/nfs/share/
# /srv/nfs/share/
exportfs -rav
# exporting *:/srv/nfs/share
cat /etc/exports
# /srv/nfs/share *(rw,async,no_root_squash,no_subtree_check)
# admin-lb IP에 TCP 6443 호출(인입)시, 백엔드 대상인 k8s-node1~3에 각각 분산 전달 설정 확인
cat /etc/haproxy/haproxy.cfg
...
# ---------------------------------------------------------------------
# Kubernetes API Server Load Balancer Configuration
# ---------------------------------------------------------------------
# frontend k8s-api
# bind *:6443
# mode tcp
# option tcplog
# default_backend k8s-api-backend
# backend k8s-api-backend
# mode tcp
# option tcp-check
# option log-health-checks
# timeout client 3h
# timeout server 3h
# balance roundrobin # 라운드로빈 방식
# server k8s-node1 192.168.10.11:6443 check check-ssl verify none inter 10000 #헬스체크 기능을 수행하여 장애 발생시 LB에서 연결 차단
# server k8s-node2 192.168.10.12:6443 check check-ssl verify none inter 10000
# server k8s-node3 192.168.10.13:6443 check check-ssl verify none inter 10000
# HAProxy 상태 확인
systemctl status haproxy.service --no-pager
# ● haproxy.service - HAProxy Load Balancer
# Loaded: loaded (/usr/lib/systemd/system/haproxy.service; enabled; preset: disabled)
# Active: active (running) since Sat 2026-02-07 15:21:29 KST; 52min ago
# Invocation: a9f15b70f31645febdea9a70e779e472
# Main PID: 4903 (haproxy)
# Status: "Ready."
# Tasks: 3 (limit: 5915)
# Memory: 8.9M (peak: 10.9M)
# CPU: 791ms
# CGroup: /system.slice/haproxy.service
# ├─4903 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -f /etc/haproxy/conf.d -p /run/haproxy.pid
# └─4905 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -f /etc/haproxy/conf.d -p /run/haproxy.pid
# Feb 07 15:21:29 admin-lb haproxy[4903]: [NOTICE] (4903) : New worker (4905) forked
# Feb 07 15:21:29 admin-lb haproxy[4903]: [NOTICE] (4903) : Loading success.
# Feb 07 15:21:29 admin-lb systemd[1]: Started haproxy.service - HAProxy Load Balancer.
# Feb 07 15:21:29 admin-lb haproxy[4905]: [WARNING] (4905) : Health check for server k8s-api-backend/k8s-node1 failed, reason: Layer4 connection problem, info: "Connection refused at initial connection step of tcp-check", check duration: 0ms, status: 0/2 DOWN.
# Feb 07 15:21:29 admin-lb haproxy[4905]: [WARNING] (4905) : Server k8s-api-backend/k8s-node1 is DOWN. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
# Feb 07 15:21:33 admin-lb haproxy[4905]: [WARNING] (4905) : Health check for server k8s-api-backend/k8s-node2 failed, reason: Layer4 connection problem, info: "Connection refused at initial connection step of tcp-check", check duration: 0ms, status: 0/2 DOWN.
# Feb 07 15:21:33 admin-lb haproxy[4905]: [WARNING] (4905) : Server k8s-api-backend/k8s-node2 is DOWN. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
# Feb 07 15:21:36 admin-lb haproxy[4905]: [WARNING] (4905) : Health check for server k8s-api-backend/k8s-node3 failed, reason: Layer4 connection problem, info: "Connection refused at initial connection step of tcp-check", check duration: 0ms, status: 0/2 DOWN.
# Feb 07 15:21:36 admin-lb haproxy[4905]: [WARNING] (4905) : Server k8s-api-backend/k8s-node3 is DOWN. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
# Feb 07 15:21:36 admin-lb haproxy[4905]: [ALERT] (4905) : backend 'k8s-api-backend' has no server available!
journalctl -u haproxy.service --no-pager
# Feb 07 15:21:29 admin-lb systemd[1]: Starting haproxy.service - HAProxy Load Balancer...
# Feb 07 15:21:29 admin-lb haproxy[4903]: [NOTICE] (4903) : haproxy version is 3.0.5-8e879a5
# Feb 07 15:21:29 admin-lb haproxy[4903]: [NOTICE] (4903) : path to executable is /usr/sbin/haproxy
# Feb 07 15:21:29 admin-lb haproxy[4903]: [ALERT] (4903) : config : parsing [/etc/haproxy/haproxy.cfg:8] : 'pidfile' already specified. Continuing.
# Feb 07 15:21:29 admin-lb haproxy[4903]: [WARNING] (4903) : config : parsing [/etc/haproxy/haproxy.cfg:29]: 'option tcplog' overrides previous 'option httplog' in 'defaults' section.
# Feb 07 15:21:29 admin-lb haproxy[4903]: [WARNING] (4903) : config : parsing [/etc/haproxy/haproxy.cfg:57] : 'timeout client' will be ignored because backend 'k8s-api-backend' has no frontend capability
# Feb 07 15:21:29 admin-lb haproxy[4903]: [WARNING] (4903) : config : log format ignored for frontend 'prometheus' since it has no log address.
# Feb 07 15:21:29 admin-lb haproxy[4903]: [NOTICE] (4903) : New worker (4905) forked
# Feb 07 15:21:29 admin-lb haproxy[4903]: [NOTICE] (4903) : Loading success.
# Feb 07 15:21:29 admin-lb systemd[1]: Started haproxy.service - HAProxy Load Balancer.
# Feb 07 15:21:29 admin-lb haproxy[4905]: [WARNING] (4905) : Health check for server k8s-api-backend/k8s-node1 failed, reason: Layer4 connection problem, info: "Connection refused at initial connection step of tcp-check", check duration: 0ms, status: 0/2 DOWN.
# Feb 07 15:21:29 admin-lb haproxy[4905]: [WARNING] (4905) : Server k8s-api-backend/k8s-node1 is DOWN. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
# Feb 07 15:21:33 admin-lb haproxy[4905]: [WARNING] (4905) : Health check for server k8s-api-backend/k8s-node2 failed, reason: Layer4 connection problem, info: "Connection refused at initial connection step of tcp-check", check duration: 0ms, status: 0/2 DOWN.
# Feb 07 15:21:33 admin-lb haproxy[4905]: [WARNING] (4905) : Server k8s-api-backend/k8s-node2 is DOWN. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
# Feb 07 15:21:36 admin-lb haproxy[4905]: [WARNING] (4905) : Health check for server k8s-api-backend/k8s-node3 failed, reason: Layer4 connection problem, info: "Connection refused at initial connection step of tcp-check", check duration: 0ms, status: 0/2 DOWN.
# Feb 07 15:21:36 admin-lb haproxy[4905]: [WARNING] (4905) : Server k8s-api-backend/k8s-node3 is DOWN. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
# Feb 07 15:21:36 admin-lb haproxy[4905]: [ALERT] (4905) : backend 'k8s-api-backend' has no server available!
ss -tnlp | grep haproxy
# LISTEN 0 3000 0.0.0.0:6443 0.0.0.0:* users:(("haproxy",pid=4915,fd=7)) # k8s api loadbalancer
# LISTEN 0 3000 0.0.0.0:9000 0.0.0.0:* users:(("haproxy",pid=4915,fd=8)) # haproxy stat dashbaord
# LISTEN 0 3000 0.0.0.0:8405 0.0.0.0:* users:(("haproxy",pid=4915,fd=9)) # metrics exporter
# 통계 페이지 접속
open http://192.168.10.10:9000/haproxy_stats
# (참고) 프로테우스 메트릭 엔드포인트 접속
curl http://192.168.10.10:8405/metrics

backend 대상 컨트롤프레인 서버 모두 Down 상태 인 것을 확인합니다.
(Worker Client-Side LoadBalancing) Kubespary 를 통한 k8s 배포
# 작업용 inventory 디렉터리 확인
cd /root/kubespray/
git describe --tags
# v2.29.1
git --no-pager tag
...
v2.29.1
v2.3.0
v2.30.0
...
tree inventory/mycluster/
root@admin-lb:~/kubespray# tree inventory/mycluster/
# inventory/mycluster/
# ├── group_vars
# │ ├── all
# │ │ ├── all.yml
# │ │ ├── aws.yml
# │ │ ├── azure.yml
# │ │ ├── containerd.yml
# │ │ ├── coreos.yml
# │ │ ├── cri-o.yml
# │ │ ├── docker.yml
# │ │ ├── etcd.yml
# │ │ ├── gcp.yml
# │ │ ├── hcloud.yml
# │ │ ├── huaweicloud.yml
# │ │ ├── oci.yml
# │ │ ├── offline.yml
# │ │ ├── openstack.yml
# │ │ ├── upcloud.yml
# │ │ └── vsphere.yml
# │ └── k8s_cluster
# │ ├── addons.yml
# │ ├── k8s-cluster.yml
# │ ├── k8s-net-calico.yml
# │ ├── k8s-net-cilium.yml
# │ ├── k8s-net-custom-cni.yml
# │ ├── k8s-net-flannel.yml
# │ ├── k8s-net-kube-ovn.yml
# │ ├── k8s-net-kube-router.yml
# │ ├── k8s-net-macvlan.yml
# │ └── kube_control_plane.yml
# └── inventory.ini
# ...
# inventory.ini 확인 (인벤토리 파일의 값을 가져와서 클러스터 변수로 사용)
cat /root/kubespray/inventory/mycluster/inventory.ini
# [kube_control_plane]
# k8s-node1 ansible_host=192.168.10.11 ip=192.168.10.11 etcd_member_name=etcd1
# k8s-node2 ansible_host=192.168.10.12 ip=192.168.10.12 etcd_member_name=etcd2
# k8s-node3 ansible_host=192.168.10.13 ip=192.168.10.13 etcd_member_name=etcd3
# [etcd:children]
# kube_control_plane
# [kube_node]
# k8s-node4 ansible_host=192.168.10.14 ip=192.168.10.14
# #k8s-node5 ansible_host=192.168.10.15 ip=192.168.10.15
# 아래 hostvars 에 선언 적용된 값 찾기는 아래 코드 블록 참고
ansible-inventory -i /root/kubespray/inventory/mycluster/inventory.ini --list
# "hostvars": {
# "k8s-node1": {
# "allow_unsupported_distribution_setup": false,
# "ansible_host": "192.168.10.11", # 해당 값은 바로 위 인벤토리에 host에 직접 선언
# "bin_dir": "/usr/local/bin",
# ...
ansible-inventory -i /root/kubespray/inventory/mycluster/inventory.ini --graph
# @all:
# |--@ungrouped:
# |--@etcd:
# | |--@kube_control_plane:
# | | |--k8s-node1
# | | |--k8s-node2
# | | |--k8s-node3
# |--@kube_node:
# | |--k8s-node4
# k8s_cluster.yml # for every node in the cluster (not etcd when it's separate)
sed -i 's|kube_owner: kube|kube_owner: root|g' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
sed -i 's|kube_network_plugin: calico|kube_network_plugin: flannel|g' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
sed -i 's|kube_proxy_mode: ipvs|kube_proxy_mode: iptables|g' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
sed -i 's|enable_nodelocaldns: true|enable_nodelocaldns: false|g' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
grep -iE 'kube_owner|kube_network_plugin:|kube_proxy_mode|enable_nodelocaldns:' inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
# kube_owner: root
# kube_network_plugin: flannel
# kube_proxy_mode: iptables
# enable_nodelocaldns: false
## coredns autoscaler 미설치
echo "enable_dns_autoscaler: false" >> inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
# flannel 설정 수정
echo "flannel_interface: enp0s9" >> inventory/mycluster/group_vars/k8s_cluster/k8s-net-flannel.yml
grep "^[^#]" inventory/mycluster/group_vars/k8s_cluster/k8s-net-flannel.yml
# flannel_interface: enp0s9
# addons
sed -i 's|metrics_server_enabled: false|metrics_server_enabled: true|g' inventory/mycluster/group_vars/k8s_cluster/addons.yml
grep -iE 'metrics_server_enabled:' inventory/mycluster/group_vars/k8s_cluster/addons.yml
# metrics_server_enabled: true
## cat roles/kubernetes-apps/metrics_server/defaults/main.yml # 메트릭서버 관련 디폴트 변수 참고
## cat roles/kubernetes-apps/metrics_server/templates/metrics-server-deployment.yaml.j2 # jinja2 템플릿 파일 참고
# 실습 스펙을 줄이기 위해 커스텀
echo "metrics_server_requests_cpu: 25m" >> inventory/mycluster/group_vars/k8s_cluster/addons.yml
echo "metrics_server_requests_memory: 16Mi" >> inventory/mycluster/group_vars/k8s_cluster/addons.yml
# 지원 버전 정보 확인
cat roles/kubespray_defaults/vars/main/checksums.yml | grep -i kube -A40
# 배포: 아래처럼 반드시 ~/kubespray 디렉토리에서 ansible-playbook 를 실행하자! 8분 정도 소요
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --list-tasks # 배포 전, Task 목록 확인
# play #1 (all): Check Ansible version TAGS: [always]
# tasks:
# Check {{ minimal_ansible_version }} <= Ansible version < {{ maximal_ansible_version }} TAGS: [always, check]
# Check that python netaddr is installed TAGS: [always, check]
# Check that jinja is not too old (install via pip) TAGS: [always, check]
# play #2 (all): Inventory setup and validation TAGS: [always]
# tasks:
# dynamic_groups : Match needed groups by their old names or definition TAGS: [always]
# validate_inventory : Stop if removed tags are used TAGS: [always]
# validate_inventory : Stop if kube_control_plane group is empty TAGS: [always]
# validate_inventory : Stop if etcd group is empty in external etcd mode TAGS: [always]
# validate_inventory : Warn if `kube_network_plugin` is `none TAGS: [always]
# validate_inventory : Stop if unsupported version of Kubernetes TAGS: [always]
#...
ANSIBLE_FORCE_COLOR=true ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml -e kube_version="1.32.9" | tee kubespray_install.log
# --- 설치 제거 후 재설치 ---
# 1) 클러스터 전체 제거 (etcd·K8s 설정·서비스 제거, 복구 불가). 확인 프롬프트에 "yes" 입력.
# ansible-playbook -i inventory/mycluster/inventory.ini -v reset.yml
# 2) 확인 없이 실행하려면
# ansible-playbook -i inventory/mycluster/inventory.ini -v reset.yml -e skip_confirmation=true
# 3) 제거가 끝난 뒤 위 cluster.yml 명령으로 다시 설치하면 됩니다.
# 자세한 reset 설명은 본문 "클러스터 reset" 섹션 참고.
# 설치 확인
more kubespray_install.log
# ...
# TASK [Check 2.17.3 <= Ansible version < 2.18.0] ********************************
# ok: [k8s-node4] => {
# "changed": false,
# "msg": "All assertions passed"
# }
# Saturday 07 February 2026 16:21:07 +0900 (0:00:00.016) 0:00:00.028 *****
# TASK [Check that python netaddr is installed] **********************************
# ok: [k8s-node4] => {
# "changed": false,
# "msg": "All assertions passed"
# }
# Saturday 07 February 2026 16:21:07 +0900 (0:00:00.049) 0:00:00.077 *****
# TASK [Check that jinja is not too old (install via pip)] ***********************
# ok: [k8s-node4] => {
# "changed": false,
# "msg": "All assertions passed"
# }
# PLAY [Inventory setup and validation] ******************************************
# Saturday 07 February 2026 16:21:07 +0900 (0:00:00.016) 0:00:00.094 *****
# facts 수집 정보 확인
tree /tmp
├── k8s-node1
├── k8s-node2
├── k8s-node3
...
# local_release_dir: "/tmp/releases" 확인
# node1은 컨트롤플레인, node2는 워커 노드 이기 때문에 다운받은 파일들의 차이가 있습니다.
ssh k8s-node1 tree /tmp/releases
# /tmp/releases
# ├── cni-plugins-linux-arm64-1.8.0.tgz
# ├── containerd-2.1.5-linux-arm64.tar.gz
# ├── containerd-rootless-setuptool.sh
# ├── containerd-rootless.sh
# ├── crictl
# ├── crictl-1.32.0-linux-arm64.tar.gz
# ├── etcd-3.5.25-linux-arm64.tar.gz
# ├── etcd-v3.5.25-linux-arm64
# │ ├── Documentation
# │ │ ├── dev-guide
# │ │ │ └── apispec
# │ │ │ └── swagger
# │ │ │ ├── rpc.swagger.json
# │ │ │ ├── v3election.swagger.json
# │ │ │ └── v3lock.swagger.json
# │ │ └── README.md
# │ ├── etcd
# │ ├── etcdctl
# │ ├── etcdutl
# │ ├── README-etcdctl.md
# │ ├── README-etcdutl.md
# │ ├── README.md
# │ └── READMEv2-etcdctl.md
# ├── images
# ├── kubeadm-1.32.9-arm64
# ├── kubectl-1.32.9-arm64
# ├── kubelet-1.32.9-arm64
# ├── nerdctl
# ├── nerdctl-2.1.6-linux-arm64.tar.gz
# └── runc-1.3.4.arm64
ssh k8s-node4 tree /tmp/releases
# /tmp/releases
# ├── cni-plugins-linux-arm64-1.8.0.tgz
# ├── containerd-2.1.5-linux-arm64.tar.gz
# ├── containerd-rootless-setuptool.sh
# ├── containerd-rootless.sh
# ├── crictl
# ├── crictl-1.32.0-linux-arm64.tar.gz
# ├── images
# ├── kubeadm-1.32.9-arm64
# ├── kubelet-1.32.9-arm64
# ├── nerdctl
# ├── nerdctl-2.1.6-linux-arm64.tar.gz
# └── runc-1.3.4.arm64
# sysctl 적용값 확인
ssh k8s-node1 grep "^[^#]" /etc/sysctl.conf
# net.ipv4.ip_forward=1
# kernel.keys.root_maxbytes=25000000
# kernel.keys.root_maxkeys=1000000
# kernel.panic=10
# kernel.panic_on_oops=1
# vm.overcommit_memory=1
# vm.panic_on_oom=0
# net.ipv4.ip_local_reserved_ports=30000-32767
# net.bridge.bridge-nf-call-iptables=1
# net.bridge.bridge-nf-call-arptables=1
# net.bridge.bridge-nf-call-ip6tables=1
ssh k8s-node4 grep "^[^#]" /etc/sysctl.conf
# net.ipv4.ip_forward=1
# kernel.keys.root_maxbytes=25000000
# kernel.keys.root_maxkeys=1000000
# kernel.panic=10
# kernel.panic_on_oops=1
# vm.overcommit_memory=1
# vm.panic_on_oom=0
# net.ipv4.ip_local_reserved_ports=30000-32767
# net.bridge.bridge-nf-call-iptables=1
# net.bridge.bridge-nf-call-arptables=1
# net.bridge.bridge-nf-call-ip6tables=1
# etcd 백업 확인
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i tree /var/backups; echo; done
# >> k8s-node1 <<
# /var/backups
# └── etcd-2026-02-07_17:41:16
# ├── member
# │ ├── snap
# │ │ └── db
# │ └── wal
# │ └── 0000000000000000-0000000000000000.wal
# └── snapshot.db
# 5 directories, 3 files
# >> k8s-node2 <<
# /var/backups
# └── etcd-2026-02-07_17:41:17
# ├── member
# │ ├── snap
# │ │ └── db
# │ └── wal
# │ └── 0000000000000000-0000000000000000.wal
# └── snapshot.db
# 5 directories, 3 files
# >> k8s-node3 <<
# /var/backups
# └── etcd-2026-02-07_17:41:16
# ├── member
# │ ├── snap
# │ │ └── db
# │ └── wal
# │ └── 0000000000000000-0000000000000000.wal
# └── snapshot.db
# 5 directories, 3 files
# k8s api 호출 확인 : IP, Domain 모든 컨트롤 플레인 노드에서 API 접근이 가능
cat /etc/hosts
# # Loopback entries; do not change.
# # For historical reasons, localhost precedes localhost.localdomain:
# 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
# ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
# # See hosts(5) for proper format and other examples:
# # 192.168.1.10 foo.example.org foo
# # 192.168.1.13 bar.example.org bar
# 192.168.10.10 k8s-api-srv.admin-lb.com admin-lb
# 192.168.10.11 k8s-node1
# 192.168.10.12 k8s-node2
# 192.168.10.13 k8s-node3
# 192.168.10.14 k8s-node4
# 192.168.10.15 k8s-node5
for i in {1..3}; do echo ">> k8s-node$i <<"; curl -sk https://192.168.10.1$i:6443/version | grep Version; echo; done
# >> k8s-node1 <<
# "gitVersion": "v1.32.9",
# "goVersion": "go1.23.12",
# >> k8s-node2 <<
# "gitVersion": "v1.32.9",
# "goVersion": "go1.23.12",
# >> k8s-node3 <<
# "gitVersion": "v1.32.9",
# "goVersion": "go1.23.12",
# 도메인으로도 접속이 잘 되는지 확인
for i in {1..3}; do echo ">> k8s-node$i <<"; curl -sk https://k8s-node$i:6443/version | grep Version; echo; done
# >> k8s-node1 <<
# "gitVersion": "v1.32.9",
# "goVersion": "go1.23.12",
# >> k8s-node2 <<
# "gitVersion": "v1.32.9",
# "goVersion": "go1.23.12",
# >> k8s-node3 <<
# "gitVersion": "v1.32.9",
# "goVersion": "go1.23.12",
# k8s admin 자격증명 확인 : 컨트롤 플레인 노드들은 apiserver 파드가 배치되어 있으니 127.0.0.1:6443 엔드포인트 설정됨
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i kubectl cluster-info -v=6; echo; done
# >> k8s-node1 <<
# I0207 17:45:56.090077 47725 loader.go:402] Config loaded from file: /root/.kube/config
# I0207 17:45:56.090432 47725 envvar.go:172] "Feature gate default state" feature="ClientsAllowCBOR" enabled=false
# I0207 17:45:56.090447 47725 envvar.go:172] "Feature gate default state" feature="ClientsPreferCBOR" enabled=false
# I0207 17:45:56.090450 47725 envvar.go:172] "Feature gate default state" feature="InformerResourceVersion" enabled=false
# I0207 17:45:56.090452 47725 envvar.go:172] "Feature gate default state" feature="WatchListClient" enabled=false
# I0207 17:45:56.100913 47725 round_trippers.go:560] GET https://127.0.0.1:6443/api/v1/namespaces/kube-system/services?labelSelector=kubernetes.io%2Fcluster-service%3Dtrue 200 OK in 7 milliseconds
# Kubernetes control plane is running at https://127.0.0.1:6443
# To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
# >> k8s-node2 <<
# I0207 17:45:56.379091 47042 loader.go:402] Config loaded from file: /root/.kube/config
# I0207 17:45:56.379387 47042 envvar.go:172] "Feature gate default state" feature="ClientsPreferCBOR" enabled=false
# I0207 17:45:56.379399 47042 envvar.go:172] "Feature gate default state" feature="InformerResourceVersion" enabled=false
# I0207 17:45:56.379402 47042 envvar.go:172] "Feature gate default state" feature="WatchListClient" enabled=false
# I0207 17:45:56.379404 47042 envvar.go:172] "Feature gate default state" feature="ClientsAllowCBOR" enabled=false
# I0207 17:45:56.383794 47042 round_trippers.go:560] GET https://127.0.0.1:6443/api?timeout=32s 200 OK in 4 milliseconds
# I0207 17:45:56.385197 47042 round_trippers.go:560] GET https://127.0.0.1:6443/apis?timeout=32s 200 OK in 0 milliseconds
# I0207 17:45:56.391968 47042 round_trippers.go:560] GET https://127.0.0.1:6443/api/v1/namespaces/kube-system/services?labelSelector=kubernetes.io%2Fcluster-service%3Dtrue 200 OK in 2 milliseconds
# Kubernetes control plane is running at https://127.0.0.1:6443
# To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
# >> k8s-node3 <<
# I0207 17:45:56.620644 47130 loader.go:402] Config loaded from file: /root/.kube/config
# I0207 17:45:56.621010 47130 envvar.go:172] "Feature gate default state" feature="WatchListClient" enabled=false
# I0207 17:45:56.621025 47130 envvar.go:172] "Feature gate default state" feature="ClientsAllowCBOR" enabled=false
# I0207 17:45:56.621029 47130 envvar.go:172] "Feature gate default state" feature="ClientsPreferCBOR" enabled=false
# I0207 17:45:56.621032 47130 envvar.go:172] "Feature gate default state" feature="InformerResourceVersion" enabled=false
# I0207 17:45:56.629766 47130 round_trippers.go:560] GET https://127.0.0.1:6443/api/v1/namespaces/kube-system/services?labelSelector=kubernetes.io%2Fcluster-service%3Dtrue 200 OK in 6 milliseconds
# Kubernetes control plane is running at https://127.0.0.1:6443
# To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
mkdir /root/.kube
scp k8s-node1:/root/.kube/config /root/.kube/
cat /root/.kube/config | grep server
# server: https://127.0.0.1:6443
# API Server 주소를 localhost에서 컨트롤 플레인 1번 node P로 변경 : 1번 node 장애 시, 직접 수동으로 다른 node IP 변경 필요.
kubectl get node -owide -v=6
# I0207 17:46:37.754607 27001 loader.go:402] Config loaded from file: /root/.kube/config
# I0207 17:46:37.755018 27001 envvar.go:172] "Feature gate default state" feature="ClientsPreferCBOR" enabled=false
# I0207 17:46:37.755041 27001 envvar.go:172] "Feature gate default state" feature="InformerResourceVersion" enabled=false
# I0207 17:46:37.755044 27001 envvar.go:172] "Feature gate default state" feature="WatchListClient" enabled=false
# I0207 17:46:37.755047 27001 envvar.go:172] "Feature gate default state" feature="ClientsAllowCBOR" enabled=false
# I0207 17:46:37.760595 27001 round_trippers.go:560] GET https://127.0.0.1:6443/api?timeout=32s 200 OK in 5 milliseconds
# I0207 17:46:37.762424 27001 round_trippers.go:560] GET https://127.0.0.1:6443/apis?timeout=32s 200 OK in 0 milliseconds
# I0207 17:46:37.772332 27001 round_trippers.go:560] GET https://127.0.0.1:6443/api/v1/nodes?limit=500 200 OK in 4 milliseconds
# NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
# k8s-node1 Ready control-plane 4m58s v1.32.9 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node2 Ready control-plane 4m48s v1.32.9 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node3 Ready control-plane 4m46s v1.32.9 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node4 Ready <none> 4m20s v1.32.9 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# 실습을 위해 HA 프록시 안타도록 설정
sed -i 's/127.0.0.1/192.168.10.11/g' /root/.kube/config
혹은
sed -i 's/127.0.0.1/192.168.10.12/g' /root/.kube/config
sed -i 's/127.0.0.1/192.168.10.13/g' /root/.kube/config
kubectl get node -owide -v=6
# I0207 17:48:56.200832 27011 loader.go:402] Config loaded from file: /root/.kube/config
# I0207 17:48:56.201133 27011 envvar.go:172] "Feature gate default state" feature="ClientsAllowCBOR" enabled=false
# I0207 17:48:56.201151 27011 envvar.go:172] "Feature gate default state" feature="ClientsPreferCBOR" enabled=false
# I0207 17:48:56.201155 27011 envvar.go:172] "Feature gate default state" feature="InformerResourceVersion" enabled=false
# I0207 17:48:56.201158 27011 envvar.go:172] "Feature gate default state" feature="WatchListClient" enabled=false
# I0207 17:48:56.206404 27011 round_trippers.go:560] GET https://192.168.10.11:6443/api?timeout=32s 200 OK in 5 milliseconds
# I0207 17:48:56.207771 27011 round_trippers.go:560] GET https://192.168.10.11:6443/apis?timeout=32s 200 OK in 0 milliseconds
# I0207 17:48:56.215430 27011 round_trippers.go:560] GET https://192.168.10.11:6443/api/v1/nodes?limit=500 200 OK in 3 milliseconds
# NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
# k8s-node1 Ready control-plane 7m17s v1.32.9 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node2 Ready control-plane 7m7s v1.32.9 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node3 Ready control-plane 7m5s v1.32.9 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node4 Ready <none> 6m39s v1.32.9 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# [kube_control_plane] 과 [kube_node] 포함 노드 비교 워커노드에는 테인트가 걸려있지 않습니다.
ansible-inventory -i /root/kubespray/inventory/mycluster/inventory.ini --graph
# @all:
# |--@ungrouped:
# |--@etcd:
# | |--@kube_control_plane:
# | | |--k8s-node1
# | | |--k8s-node2
# | | |--k8s-node3
# |--@kube_node:
# | |--k8s-node4
kubectl describe node | grep -E 'Name:|Taints'
# Name: k8s-node1
# Taints: node-role.kubernetes.io/control-plane:NoSchedule
# Name: k8s-node2
# Taints: node-role.kubernetes.io/control-plane:NoSchedule
# Name: k8s-node3
# Taints: node-role.kubernetes.io/control-plane:NoSchedule
# Name: k8s-node4
# Taints: <none>
kubectl get pod -A
# NAMESPACE NAME READY STATUS RESTARTS AGE
# kube-system coredns-664b99d7c7-4kc5b 1/1 Running 0 6m55s
# kube-system coredns-664b99d7c7-k9bpw 1/1 Running 0 6m55s
# kube-system kube-apiserver-k8s-node1 1/1 Running 1 7m53s
# kube-system kube-apiserver-k8s-node2 1/1 Running 1 7m45s
# kube-system kube-apiserver-k8s-node3 1/1 Running 1 7m43s
# kube-system kube-controller-manager-k8s-node1 1/1 Running 2 7m53s
# kube-system kube-controller-manager-k8s-node2 1/1 Running 2 7m45s
# kube-system kube-controller-manager-k8s-node3 1/1 Running 2 7m43s
# kube-system kube-flannel-ds-arm64-47nz8 1/1 Running 0 7m8s
# kube-system kube-flannel-ds-arm64-8wlpt 1/1 Running 0 7m8s
# kube-system kube-flannel-ds-arm64-9p6nf 1/1 Running 0 7m8s
# kube-system kube-flannel-ds-arm64-tg8z9 1/1 Running 0 7m8s
# kube-system kube-proxy-9mbr5 1/1 Running 0 7m17s
# kube-system kube-proxy-cpb4m 1/1 Running 0 7m17s
# kube-system kube-proxy-k2qgd 1/1 Running 0 7m17s
# kube-system kube-proxy-nj9fm 1/1 Running 0 7m17s
# kube-system kube-scheduler-k8s-node1 1/1 Running 1 7m53s
# kube-system kube-scheduler-k8s-node2 1/1 Running 1 7m45s
# kube-system kube-scheduler-k8s-node3 1/1 Running 1 7m43s
# kube-system metrics-server-65fdf69dcb-k6r7h 1/1 Running 0 6m50s
# kube-system nginx-proxy-k8s-node4 1/1 Running 0 7m18s
...
# 노드별 파드 CIDR 확인
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.podCIDR}{"\n"}{end}'
# k8s-node1 10.233.64.0/24
# k8s-node2 10.233.65.0/24
# k8s-node3 10.233.66.0/24
# k8s-node4 10.233.67.0/24
# etcd 정보 확인 : etcd name 확인
ssh k8s-node1 etcdctl.sh member list -w table
# +------------------+---------+-------+----------------------------+----------------------------+------------+
# | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
# +------------------+---------+-------+----------------------------+----------------------------+------------+
# | 8b0ca30665374b0 | started | etcd3 | https://192.168.10.13:2380 | https://192.168.10.13:2379 | false |
# | 2106626b12a4099f | started | etcd2 | https://192.168.10.12:2380 | https://192.168.10.12:2379 | false |
# | c6702130d82d740f | started | etcd1 | https://192.168.10.11:2380 | https://192.168.10.11:2379 | false |
# +------------------+---------+-------+----------------------------+----------------------------+------------+
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i etcdctl.sh endpoint status -w table; echo; done
# >> k8s-node1 <<
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | 127.0.0.1:2379 | c6702130d82d740f | 3.5.25 | 5.2 MB | true | false | 4 | 2562 | 2562 | |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# >> k8s-node2 <<
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | 127.0.0.1:2379 | 2106626b12a4099f | 3.5.25 | 5.2 MB | false | false | 4 | 2562 | 2562 | |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# >> k8s-node3 <<
# +----------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
# +----------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | 127.0.0.1:2379 | 8b0ca30665374b0 | 3.5.25 | 5.1 MB | false | false | 4 | 2562 | 2562 | |
# +----------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# k9s 실행
k9s
# 자동완성 및 단축키 설정
source <(kubectl completion bash)
alias k=kubectl
alias kc=kubecolor
complete -F __start_kubectl k
echo 'source <(kubectl completion bash)' >> /etc/profile
echo 'alias k=kubectl' >> /etc/profile
echo 'alias kc=kubecolor' >> /etc/profile
echo 'complete -F __start_kubectl k' >> /etc/profile

Kubespray 변수 우선순위 구조 및 검색
Ansible 변수 우선순위: 숫자가 클수록 우선 적용 (22번이 최상위). Kubespary를 만들때 설정하는 변수가 가장 우선순위가 높습니다.
| 순위 | 구분 | 설명 |
|---|---|---|
| 1 (낮음) | Command line values | -u my_user 등 (변수 아님) |
| 2 | Role defaults | 역할 기본값 |
| 3 | Inventory group vars | 인벤토리 파일/스크립트의 그룹 변수 |
| 4 | Inventory group_vars/all | 인벤토리 쪽 전체 그룹 변수 |
| 5 | Playbook group_vars/all | 플레이북 쪽 group_vars/all |
| 6 | Inventory group_vars/* | 인벤토리 그룹별 group_vars |
| 7 | Playbook group_vars/* | 플레이북 그룹별 group_vars |
| 8 | Inventory host vars | 인벤토리 파일/스크립트의 호스트 변수(만들어서 사용 가능함) |
| 9 | Inventory host_vars/* | 인벤토리 호스트별 host_vars |
| 10 | Playbook host_vars/* | 플레이북 호스트별 host_vars |
| 11 | Host facts / set_facts | 수집 fact, 캐시된 set_facts |
| 12 | Play vars | 플레이의 vars, 플레이북의 변수 |
| 13 | Play vars_prompt | 플레이의 vars_prompt |
| 14 | Play vars_files | 플레이의 vars_files |
| 15 | Role vars | 역할의 vars |
| 16 | Block vars | 블록 내 태스크용 변수 |
| 17 | Task vars | 태스크의 vars, 태스크의 변수 |
| 18 | include_vars | include_vars 로 불러온 변수 |
| 19 | Registered / set_facts | register, set_fact 결과 |
| 20 | Role params | 역할, include_role 인자 |
| 21 | Include params | include 인자 |
| 22 (높음) | Extra vars | -e "user=my_user" — 항상 최우선 |
Kubespray는 Ansible 역할(role)로 구성되어 있고, 대부분의 설정은 역할의 defaults(순위 2)에 정의되어 있습니다. 같은 이름의 변수를 더 높은 순위 위치에 두면 그 값이 적용되므로, “기본값을 바꾸고 싶을 때 어디를 수정하면 되는지”를 알기 위해 위 표를 참고하면 됩니다.
# 예시) 특정 변수 선언 및 사용 검색하여 확인하기 (주로 사용하는 디렉토리 세개에서 검색)
grep -Rn "allow_unsupported_distribution_setup" inventory/mycluster/ playbooks/ roles/ -A1 -B1
# inventory/mycluster/group_vars/all/all.yml-141-## If enabled it will allow kubespray to attempt setup even if the distribution is not supported. For unsupported distributions this can lead to unexpected failures in some cases.
# inventory/mycluster/group_vars/all/all.yml:142:allow_unsupported_distribution_setup: false
# --
# roles/kubernetes/preinstall/tasks/0040-verify-settings.yml-22- assert:
# roles/kubernetes/preinstall/tasks/0040-verify-settings.yml:23: that: (allow_unsupported_distribution_setup | default(false)) or ansible_distribution in supported_os_distributions
# roles/kubernetes/preinstall/tasks/0040-verify-settings.yml-24- msg: "{{ ansible_distribution }} is not a known OS"
# # Kubespray 변수 우선순위 구조 (Override Flow)
# [ 낮은 우선순위 ]
# ┌─────────────────────────────────────────────┐
# │ roles/*/defaults/main.yml │ ← Kubespray role 기본값
# │ (예: bin_dir, kube_version 기본값) │
# └─────────────────────────────────────────────┘
# ⬇ override
# ┌─────────────────────────────────────────────┐
# │ roles/*/vars/main.yml │ ← role 내부 강제 변수 (웬만해선 안 건드림)
# └─────────────────────────────────────────────┘
# ⬇ override
# ┌─────────────────────────────────────────────┐
# │ inventory/mycluster/group_vars/all/*.yml │ ← 전체 노드 공통 설정 # 99% 여기서 조절
# │ inventory/mycluster/group_vars/k8s_cluster/*.yml
# │ inventory/mycluster/group_vars/etcd.yml │
# └─────────────────────────────────────────────┘
# ⬇ override
# ┌─────────────────────────────────────────────┐
# │ inventory/mycluster/host_vars/<node>.yml │ ← 특정 노드에만 적용 # 특정 노드만 다르게 쓸 때 만들어서 사용하면 됨
# └─────────────────────────────────────────────┘
# ⬇ override
# ┌─────────────────────────────────────────────┐
# │ playbook vars (vars:, vars_files:) │ ← reset.yml / cluster.yml 내부 vars # 실행하는 플레이북에 선언된 경우
# └─────────────────────────────────────────────┘
# ⬇ override
# ┌─────────────────────────────────────────────┐
# │ --extra-vars (-e) │ ← CLI에서 준 값 (최강자)
# │ ex) -e kube_version=v1.29.3 │
# └─────────────────────────────────────────────┘
# [ 높은 우선순위 ]
# 전체 노드 공통 설정(예시) : 99% 여기서 조절
inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
kube_version: v1.29.3
kube_network_plugin: cilium
# 특정 노드에만 적용(예시) : 특정 노드만 다르게 쓸 때
inventory/mycluster/host_vars/k8s-ctr1.yml
node_labels:
node-role.kubernetes.io/control-plane: "true"
# 실행하는 플레이북에 선언된 경우(예) : 아래 경우 플레이북을 import 할 때만 적용되는 “로컬 변수 override”
cat playbooks/scale.yml | grep 'Install etcd' -A5
- name: Install etcd
vars: # inventory/group_vars 보다 우선순위가 높음, 이 playbook import 범위 안에서만 유효
etcd_cluster_setup: false # etcd 신규 클러스터 bootstrap 로직 비활성화
etcd_events_cluster_setup: false # 이벤트 전용 etcd 클러스터 구성 안 함
import_playbook: install_etcd.yml # install_etcd.yml이라는 별도의 플레이북 파일을 현재 위치에 포함시켜 실행하라는 명령
# 예시) dns autoscaler 미설치 하기 위해 검색
grep -Rni "autoscaler" inventory/mycluster/ playbooks/ roles/ -A2 -B1
# inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml-372-remove_anonymous_access: false
# inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml:373:enable_dns_autoscaler: false
# --
# roles/kubernetes-apps/ansible/defaults/main.yml-31-coredns_pod_disruption_budget: false
# roles/kubernetes-apps/ansible/defaults/main.yml:32:# when enable_dns_autoscaler is false, coredns_replicas is used to set the number of replicas
# roles/kubernetes-apps/ansible/defaults/main.yml-33-coredns_replicas: 2
# roles/kubernetes-apps/ansible/defaults/main.yml-34-# value for coredns pdb
# ...
grep -Rni "autoscaler" inventory/mycluster/ playbooks/ roles/ --include="*.yml" -A2 -B1
# roles/kubespray_defaults/defaults/main/main.yml:130:# Enable dns autoscaler
# roles/kubespray_defaults/defaults/main/main.yml:131:enable_dns_autoscaler: true
...
control 컴포넌트 확인
# kube-apiserver 정보 확인
ssh k8s-node1 cat /etc/kubernetes/manifests/kube-apiserver.yaml
# apiVersion: v1
# kind: Pod
# metadata:
# annotations:
# kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 192.168.10.11:6443
# creationTimestamp: null
# labels:
# ...
# spec:
# containers:
# - command:
# - kube-apiserver
# - --advertise-address=192.168.10.11
# ...
# - --apiserver-count=3
# - --authorization-mode=Node,RBAC
# - '--bind-address=::'
# ...
# - --etcd-servers=https://192.168.10.11:2379,https://192.168.10.12:2379,https://192.168.10.13:2379
# etcd server가 세대이므로 특정 서버에 문제가 생겨도 다른 ETCD를 참고할 수 있음
# (참고) lease 정보 확인 : apiserver 는 모두 Active 동작, kcm / scheduler 은 1대만 리더 역할
kubectl get lease -n kube-system
# NAME HOLDER AGE
# apiserver-3jsrenrspxlfjr2cvxzde6qwdi apiserver-3jsrenrspxlfjr2cvxzde6qwdi_1b0e12f2-7ea0-423d-95f6-5722b8e707ab 18m
# apiserver-syplgv2uz3ssgciixtnxs4xeza apiserver-syplgv2uz3ssgciixtnxs4xeza_e626e225-38e7-454d-900e-7fda24fdda93 18m
# apiserver-z2kpjb5k5ch6lznxmv3gnpujmy apiserver-z2kpjb5k5ch6lznxmv3gnpujmy_521695f2-61e8-468a-b172-3607a457c088 18m
# kube-controller-manager k8s-node1_24c6245c-3bae-4477-a90b-a10bc0bcfac6 18m
# kube-scheduler k8s-node3_ca72459c-b04d-4351-8c31-0fa7c1304add 18m
# kube-controller-manager 정보 확인 : 'bind-address=::' IPv6/IPv4 지원
ssh k8s-node1 cat /etc/kubernetes/manifests/kube-controller-manager.yaml
# - --allocate-node-cidrs=true
# - --cluster-cidr=10.233.64.0/18
# - --node-cidr-mask-size-ipv4=24
# - --service-cluster-ip-range=10.233.0.0/18
# - '--bind-address=::'
# - --leader-elect=true
# kube-scheduler 정보 확인
ssh k8s-node1 cat /etc/kubernetes/manifests/kube-scheduler.yaml
- '--bind-address=::'
- --leader-elect=true #리더 선출
# 인증서 정보 확인 : ctrl1번 노드만 super-admin.conf 확인
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i ls -l /etc/kubernetes/super-admin.conf ; echo; done
# >> k8s-node1 <<
# -rw-------. 1 root root 5693 Feb 7 17:41 /etc/kubernetes/super-admin.conf
# >> k8s-node2 <<
# ls: cannot access '/etc/kubernetes/super-admin.conf': No such file or directory
# >> k8s-node3 <<
# ls: cannot access '/etc/kubernetes/super-admin.conf': No such file or directory
## 보통 coredns 는 service cide 의 10번째 ip를 사용. 하지만 kubespary 경우 보통 3번째 ip를 사용.
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i kubeadm certs check-expiration ; echo; done
# >> k8s-node1 <<
# [check-expiration] Reading configuration from the "kubeadm-config" ConfigMap in namespace "kube-system"...
# [check-expiration] Use 'kubeadm init phase upload-config --config your-config.yaml' to re-upload it.
# W0207 18:01:27.322273 51768 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [10.233.0.3]
# CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
# admin.conf Feb 07, 2027 08:41 UTC 364d ca no
# apiserver Feb 07, 2027 08:41 UTC 364d ca no
# apiserver-kubelet-client Feb 07, 2027 08:41 UTC 364d ca no
# controller-manager.conf Feb 07, 2027 08:41 UTC 364d ca no
# front-proxy-client Feb 07, 2027 08:41 UTC 364d front-proxy-ca no
# scheduler.conf Feb 07, 2027 08:41 UTC 364d ca no
# super-admin.conf Feb 07, 2027 08:41 UTC 364d ca no
# # 컨트롤 플레인 1번에만 super-admin.conf 가 있음
# CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
# ca Feb 05, 2036 08:41 UTC 9y no
# front-proxy-ca Feb 05, 2036 08:41 UTC 9y no
# >> k8s-node2 <<
# [check-expiration] Reading configuration from the "kubeadm-config" ConfigMap in namespace "kube-system"...
# [check-expiration] Use 'kubeadm init phase upload-config --config your-config.yaml' to re-upload it.
# W0207 18:01:27.565898 50972 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [10.233.0.3]
# CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
# admin.conf Feb 07, 2027 08:41 UTC 364d ca no
# apiserver Feb 07, 2027 08:41 UTC 364d ca no
# apiserver-kubelet-client Feb 07, 2027 08:41 UTC 364d ca no
# controller-manager.conf Feb 07, 2027 08:41 UTC 364d ca no
# front-proxy-client Feb 07, 2027 08:41 UTC 364d front-proxy-ca no
# scheduler.conf Feb 07, 2027 08:41 UTC 364d ca no
# !MISSING! super-admin.conf # ctrl1번 노드만 super-admin.conf 확인
# CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
# ca Feb 05, 2036 08:41 UTC 9y no
# front-proxy-ca Feb 05, 2036 08:41 UTC 9y no
# >> k8s-node3 <<
# [check-expiration] Reading configuration from the "kubeadm-config" ConfigMap in namespace "kube-system"...
# [check-expiration] Use 'kubeadm init phase upload-config --config your-config.yaml' to re-upload it.
# W0207 18:01:27.825093 51063 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [10.233.0.3]
# CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
# admin.conf Feb 07, 2027 08:41 UTC 364d ca no
# apiserver Feb 07, 2027 08:41 UTC 364d ca no
# apiserver-kubelet-client Feb 07, 2027 08:41 UTC 364d ca no
# controller-manager.conf Feb 07, 2027 08:41 UTC 364d ca no
# front-proxy-client Feb 07, 2027 08:41 UTC 364d front-proxy-ca no
# scheduler.conf Feb 07, 2027 08:41 UTC 364d ca no
# !MISSING! super-admin.conf # ctrl1번 노드만 super-admin.conf 확인
# CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
# ca Feb 05, 2036 08:41 UTC 9y no
# front-proxy-ca Feb 05, 2036 08:41 UTC 9y no
# 파드에 coredns service (Cluster) IP 확인 kubespray에서는 Servicve CIDR 세번째 IP를 nameserver로 씀
kubectl exec -it -n kube-system nginx-proxy-k8s-node4 -- cat /etc/resolv.conf
# search kube-system.svc.cluster.local svc.cluster.local cluster.local default.svc.cluster.local
# nameserver 10.233.0.3
# options ndots:5
# kubespray는 coredns의 서비스명이 kube-dns가 아니라 coredns이며, clusterIP 가 10.233.0.3
kubectl get svc -n kube-system coredns
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# coredns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP,9153/TCP 5h34m
kubectl get cm -n kube-system kubelet-config -o yaml | grep clusterDNS -A2
# clusterDNS:
# - 10.233.0.3
# clusterDomain: cluster.local
# (참고) kubeadm 통한 직접 설정 시 : 10.96.0.0/16 대역 중 10번째 IP 사용 확인
kubectl get svc,ep -n kube-system
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# service/coredns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP,9153/TCP 21m
# service/metrics-server ClusterIP 10.233.40.100 <none> 443/TCP 21m
# NAME ENDPOINTS AGE
# endpoints/coredns 10.233.64.2:53,10.233.67.2:53,10.233.64.2:53 + 3 more... 21m
# endpoints/metrics-server 10.233.67.3:10250 21m
...
# kubespray task에 의해서 호스트에서도 서비스명 도메인 질의를 위해 ns 최상단 추가 등 확인
# preinstall dns role에서 host의 dns 설청을 해주는 부분
ssh k8s-node1 cat /etc/resolv.conf
# Generated by NetworkManager
# search default.svc.cluster.local svc.cluster.local
# nameserver 10.233.0.3
# nameserver 168.126.63.1
# nameserver 8.8.8.8
# options ndots:2 timeout:2 attempts:2
# (비교) 현재 k8s join 되지 않는 노드는 기본 dns 설정 상태
ssh k8s-node5 cat /etc/resolv.conf
# Generated by NetworkManager
# nameserver 168.126.63.1
# nameserver 8.8.8.8
# kubeadm-config cm 정보 확인
kubectl get cm -n kube-system kubeadm-config -o yaml
...
apiVersion: kubeadm.k8s.io/v1beta4
caCertificateValidityPeriod: 87600h0m0s
certificateValidityPeriod: 8760h0m0s
certificatesDir: /etc/kubernetes/ssl
clusterName: cluster.local
controlPlaneEndpoint: 192.168.10.11:6443 # << 초기 Init 노드로 controlplane을 만들어서 이렇게 설정됨
...
etcd:
external: # kubeadm 이 파드로 배포한것이 아니며 systemd unit 구성.
caFile: /etc/ssl/etcd/ssl/ca.pem
certFile: /etc/ssl/etcd/ssl/node-k8s-node1.pem
endpoints:
- https://192.168.10.11:2379
- https://192.168.10.12:2379
- https://192.168.10.13:2379
keyFile: /etc/ssl/etcd/ssl/node-k8s-node1-key.pem
imageRepository: registry.k8s.io
kind: ClusterConfiguration
kubernetesVersion: v1.32.9
networking:
dnsDomain: cluster.local
podSubnet: 10.233.64.0/18
serviceSubnet: 10.233.0.0/18
...
# kubeadm 과 동일하게 kubelet node 최초 join 시 CSR 사용 확인
kubectl get csr
# NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION
# csr-5l9tk 24m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:p8e2jk <none> Approved,Issued
# csr-6gbtd 25m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:xia33r <none> Approved,Issued
# csr-995vb 25m kubernetes.io/kube-apiserver-client-kubelet system:node:k8s-node1 <none> Approved,Issued
# csr-j4mgw 25m kubernetes.io/kube-apiserver-client-kubelet system:bootstrap:ad567d <none> Approved,Issued
K8S API 엔드포인트
[Case1] HA 컨트플 플레인 노드(3대) + (Worker Client-Side LoadBalancing)

클라이언트 측 로드밸런싱(Client-Side LB) 으로 API 서버에 접근하는 구성을 보여줍니다. 상단에는 워커 노드(Worker #1, #2)가 있고, 각 워커에는 kubelet·kube-proxy와 함께 Client-Side LB 가 있습니다. kubelet과 kube-proxy는 이 LB를 통해 API 요청을 보내며, LB는 그 요청을 하단의 컨트롤 플레인 3대(Control Plane #1~#3)에 있는 kube-apiserver에 나누어 보냅니다.
즉, L4 로드밸런서(admin-lb) 없이도 각 워커가 자신만의 LB를 두고 세 대의 API 서버에 직접 분산 연결하는 방식입니다. 컨트롤 플레인 한 대가 내려가도 나머지 두 대의 API 서버로 트래픽이 유지되어 고가용성(HA)이 확보되는 과정을 실습합니다.
워커노드 컴포넌트 → k8s api endpoint 분석

워커 노드(kube-node)와 컨트롤 플레인(kube-master)에서 kube-apiserver 로 연결되는 방식을 정리한 그림입니다. 컨트롤 플레인 쪽에서는 kubelet, kube-proxy, kube-scheduler, kube-controller-manager가 모두 http://localhost:8080 으로 같은 노드의 kube-apiserver에 직접 붙습니다. 워커 노드 쪽에서는 kubelet과 kube-proxy가 API 서버에 직접 가지 않고, 노드 위의 nginx proxy(Client-Side LB)에 https://localhost:443 으로 연결합니다.
이 nginx proxy가 SSL pass-through로 요청을 컨트롤 플레인의 kube-apiserver로 넘깁니다. 즉, 워커의 컴포넌트는 로컬 nginx만 바라보고, nginx가 여러 API 서버로 분산·전달하는 구조입니다.
# worker(kubeclt, kube-proxy) -> k8s api
# 워커노드에서 정보 확인 (nginx-proxy가 세팅 되어 있음)
ssh k8s-node4 crictl ps
# CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID POD NAMESPACE
# 0c3b5b5da337f 5a91d90f47ddf 27 minutes ago Running nginx-proxy 0 963ea50ddce00 nginx-proxy-k8s-node4 kube-system...
# ...
ssh k8s-node4 cat /etc/nginx/nginx.conf
# error_log stderr notice;
# worker_processes 2;
# worker_rlimit_nofile 130048;
# worker_shutdown_timeout 10s;
# ...
# stream {
# upstream kube_apiserver {
# least_conn; # least_conn으로 설정되어 있음 (least_conn: 커넥션이 적은 순으로)
# server 192.168.10.11:6443; # API 서버들
# server 192.168.10.12:6443;
# server 192.168.10.13:6443;
# }
# server {
# listen 127.0.0.1:6443;
# proxy_pass kube_apiserver;
# proxy_timeout 10m;
# proxy_connect_timeout 1s;
# }
# }
# http {
# ...
# server {
# listen 8081;
# location /healthz {
# access_log off;
# return 200;
...
ssh k8s-node4 curl -s localhost:8081/healthz -I
# HTTP/1.1 200 OK
# Server: nginx
# Date: Sat, 07 Feb 2026 09:11:08 GMT
# Content-Type: text/plain
# Content-Length: 0
# Connection: keep-alive
# 워커노드에서 -> Client-Side LB를 사용해서 k8s api 호출 시도
ssh k8s-node4 curl -sk https://127.0.0.1:6443/version | grep Version
# "gitVersion": "v1.32.9",
# "goVersion": "go1.23.12",
ssh k8s-node4 ss -tnlp | grep nginx
# LISTEN 0 511 0.0.0.0:8081 0.0.0.0:* users:(("nginx",pid=15043,fd=6),("nginx",pid=15042,fd=6),("nginx",pid=15016,fd=6))
# LISTEN 0 511 127.0.0.1:6443 0.0.0.0:* users:(("nginx",pid=15043,fd=5),("nginx",pid=15042,fd=5),("nginx",pid=15016,fd=5))
# kubelet(client) -> api-server 호출 시 엔드포인트 정보 확인 : https://localhost:6443
ssh k8s-node4 cat /etc/kubernetes/kubelet.conf
# apiVersion: v1
# clusters:
# - cluster:
# certificate-authority-data: ...
# server: https://localhost:6443 #자격증명 서버 목록에 nginx static pod가 설정되어 있음
# name: default-cluster
# contexts:
# - context:
# cluster: default-cluster
# namespace: default
# user: default-auth
# name: default-context
# current-context: default-context
# kind: Config
# preferences: {}
# users:
# - name: default-auth
ssh k8s-node4 cat /etc/kubernetes/kubelet.conf | grep server
# server: https://localhost:6443
# kube-proxy(client) -> api-server 호출 시 엔드포인트 정보 확인
kc get cm -n kube-system kube-proxy -o yaml
# kubectl get cm -n kube-system kube-proxy -o yaml | grep 'kubeconfig.conf:' -A18
# kubeconfig.conf: |-
# apiVersion: v1
# kind: Config
# clusters:
# - cluster:
# certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# server: https://127.0.0.1:6443 #자격증명 서버 목록에 nginx static pod가 설정되어 있음
# name: default
...
# 즉 kubelet, kube-proxy가 nginx를 바라보고 있음
# nginx.conf 생성 Task
tree roles/kubernetes/node/tasks/loadbalancer
# ├── haproxy.yml
# ├── kube-vip.yml
# └── nginx-proxy.yml
cat roles/kubernetes/node/tasks/loadbalancer/nginx-proxy.yml
# ---
# - name: Haproxy | Cleanup potentially deployed haproxy
# file:
# path: "{{ kube_manifest_dir }}/haproxy.yml"
# state: absent
# - name: Nginx-proxy | Make nginx directory
# file:
# path: "{{ nginx_config_dir }}"
# state: directory
# mode: "0700"
# owner: root
# - name: Nginx-proxy | Write nginx-proxy configuration
# template:
# src: "loadbalancer/nginx.conf.j2"
# dest: "{{ nginx_config_dir }}/nginx.conf"
# owner: root
# mode: "0755"
# backup: true
# - name: Nginx-proxy | Get checksum from config
# stat:
# path: "{{ nginx_config_dir }}/nginx.conf"
# get_attributes: false
# get_checksum: true
# get_mime: false
# register: nginx_stat
# - name: Nginx-proxy | Write static pod
# template:
# src: manifests/nginx-proxy.manifest.j2
# dest: "{{ kube_manifest_dir }}/nginx-proxy.yml"
# mode: "0640"
# 워커 노드에서 nginx를 static pod로 띄워, 요청을 분산하는 client-side load balancer 역할을 하도록 하는 ansible task 파일입니다.
# 주요 작업 단계는 다음과 같습니다:
#
# 1. 기존에 haproxy가 static pod 형태로 배포되어 있었다면 그 파일(haproxy.yml)을 삭제해 정리(cleanup)합니다.
# 2. nginx 설정 파일이 들어갈 디렉토리(nginx_config_dir)를 생성합니다. 디렉토리의 권한은 0700, 소유자는 root입니다.
# 3. jinja2 템플릿(loadbalancer/nginx.conf.j2)을 이용해 실제 nginx config 파일(nginx.conf)을 생성합니다. 파일 권한은 0755, 소유자는 root입니다. 변경 이력이 남도록 backup도 true로 설정합니다.
# 4. 위에서 생성한 nginx.conf 파일의 checksum을 계산해서 nginx_stat라는 변수에 저장합니다. 이 값은 pod manifest(nginx-proxy.manifest.j2)의 annotation 등에 활용됩니다(설정 변경 시 static pod가 재시작되도록 하기 위함).
# 5. 마지막으로 static pod manifest를 templatize해서 kubelet manifest 디렉토리 아래에 nginx-proxy.yml로 저장합니다. 이 파일대로 nginx static pod가 워커 노드에서 켜집니다.
#
# 즉, 이 전체 작업은 워커 노드에서 nginx를 static pod로 실행하고, 설정 변경 적용 시 pod가 정상적으로 갱신되게 하는 일련의 과정입니다.
# nginx.conf jinja2 템플릿 파일
cat roles/kubernetes/node/templates/loadbalancer/nginx.conf.j2
# error_log stderr notice;
# worker_processes 2;
# worker_rlimit_nofile 130048;
# worker_shutdown_timeout 10s;
# events {
# multi_accept on;
# use epoll;
# worker_connections 16384;
# }
# stream {
# upstream kube_apiserver {
# least_conn; # 하드코딩 되어있음
# {% for host in groups['kube_control_plane'] -%}
# server {{ hostvars[host]['main_access_ip'] | ansible.utils.ipwrap }}:{{ kube_apiserver_port }};
# {% endfor -%}
# }
# server {
# listen 127.0.0.1:{{ loadbalancer_apiserver_port|default(kube_apiserver_port) }};
# {% if ipv6_stack -%}
# listen [::1]:{{ loadbalancer_apiserver_port|default(kube_apiserver_port) }};
# {% endif -%}
# proxy_pass kube_apiserver;
# proxy_timeout 10m;
# proxy_connect_timeout 1s;
# }
# }
# nginx static pod 매니페스트 파일 확인(실제 스태틱 파드가 구성되는 네임스페이스)
cat roles/kubernetes/node/templates/manifests/nginx-proxy.manifest.j2
# apiVersion: v1
# kind: Pod
# metadata:
# name: {{ loadbalancer_apiserver_pod_name }}
# namespace: kube-system
# labels:
# addonmanager.kubernetes.io/mode: Reconcile
# k8s-app: kube-nginx
# annotations:
# nginx-cfg-checksum: "{{ nginx_stat.stat.checksum }}"
# ...
워커 노드에서 k8s apiserver를 부하 분산해서 호출하려면 Client-Side LB를 사용합니다.
컨트롤 플레인에서는 같은 노드의 kube-apiserver에 localhost(또는 LB VIP)로 직접 붙으면 되고, 워커와 달리 노드마다 Client-Side LB를 둘 필요는 없습니다.
- Kubespray가 지원하는 Client-Side LB:
구분 Client-Side LB 설명 1 nginx 리버스 프록시. 워커 노드에 static pod로 올려 각 노드에서 apiserver로 트래픽을 분산할 때 많이 씁니다. 2 HAProxy L4/L7 로드밸런서. 헬스체크·고성능이 장점이며, 서버측 LB(예: admin-lb) 또는 Client-Side LB 모두 사용 가능합니다. 3 kube-vip Kubernetes용 VIP/로드밸런싱. 컨트롤 플레인 고가용성용 VIP 제공 등 K8s 환경에 맞춘 방식입니다.
(참고) nginx log 중 alert 해결 : --tags "containerd" 사용
# nginx proxy 파드 로그 확인
kubectl logs -n kube-system nginx-proxy-k8s-node4
# ...
# 2026/02/07 08:42:17 [notice] 1#1: using the "epoll" event method
# 2026/02/07 08:42:17 [notice] 1#1: nginx/1.28.0
# 2026/02/07 08:42:17 [notice] 1#1: built by gcc 14.2.0 (Alpine 14.2.0)
# 2026/02/07 08:42:17 [notice] 1#1: OS: Linux 6.12.0-55.39.1.el10_0.aarch64
# 2026/02/07 08:42:17 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 65535:65535 # 130048 적용 안되어 있음
# 2026/02/07 08:42:17 [notice] 1#1: start worker processes
# 2026/02/07 08:42:17 [notice] 1#1: start worker process 20
# 2026/02/07 08:42:17 [notice] 1#1: start worker process 21
# ...
#
ssh k8s-node4 cat /etc/nginx/nginx.conf
# error_log stderr notice;
# worker_processes 2;
# worker_rlimit_nofile 130048;
# worker_shutdown_timeout 10s;
# ...
#
ssh k8s-node4 cat /etc/containerd/config.toml | grep base_runtime_spec
# base_runtime_spec = "/etc/containerd/cri-base.json"
ssh k8s-node4 cat /etc/containerd/cri-base.json | jq | grep rlimits -A 6
# "rlimits": [
# {
# "type": "RLIMIT_NOFILE",
# "hard": 65535,
# "soft": 65535
# }
# ],
ssh k8s-node4 crictl inspect --name nginx-proxy | jq
ssh k8s-node4 crictl inspect --name nginx-proxy | grep rlimits -A6
# "rlimits": [
# {
# "hard": 65535,
# "soft": 65535,
# "type": "RLIMIT_NOFILE"
# }
# ],
# 관련 변수명 확인
cat roles/container-engine/containerd/defaults/main.yml
# ...
# containerd_base_runtime_spec_rlimit_nofile: 65535 #이것 때문
# containerd_default_base_runtime_spec_patch:
# process:
# rlimits:
# - type: RLIMIT_NOFILE
# hard: "{{ containerd_base_runtime_spec_rlimit_nofile }}"
# soft: "{{ containerd_base_runtime_spec_rlimit_nofile }}"
# 기본 OCI Spec(Runtime Spec)을 수정(Patch)
cat << EOF >> inventory/mycluster/group_vars/all/containerd.yml
containerd_default_base_runtime_spec_patch:
process:
rlimits: []
EOF
grep "^[^#]" inventory/mycluster/group_vars/all/containerd.yml
# containerd tag : configuring containerd engine runtime for hosts
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "containerd" --list-tasks
# play #1 (all): Check Ansible version TAGS: [always] 태그가 always라 1, 2 번 플레이북은 무조건 실행됨
# tasks:
# Check {{ minimal_ansible_version }} <= Ansible version < {{ maximal_ansible_version }} TAGS: [always, check]
# Check that python netaddr is installed TAGS: [always, check]
# Check that jinja is not too old (install via pip) TAGS: [always, check]
# play #2 (all): Inventory setup and validation TAGS: [always]
# tasks:
# dynamic_groups : Match needed groups by their old names or definition TAGS: [always]
# validate_inventory : Stop if removed tags are used TAGS: [always]
# validate_inventory : Stop if kube_control_plane group is empty TAGS: [always]
# validate_inventory : Stop if etcd group is empty in external etcd mode TAGS: [always]
# validate_inventory : Warn if `kube_network_plugin` is `none TAGS: [always]
# validate_inventory : Stop if unsupported version of Kubernetes TAGS: [always]
# validate_inventory : Stop if known booleans are set as strings (Use JSON format on CLI: -e "{'key': true }") TAGS: [always]
# validate_inventory : Stop if even number of etcd hosts TAGS: [always]
# validate_inventory : Guarantee that enough network address space is available for all pods TAGS: [always]
# validate_inventory : Stop if RBAC is not enabled when dashboard is enabled TAGS: [always]
# validate_inventory : Check cloud_provider value TAGS: [always]
# validate_inventory : Check external_cloud_provider value TAGS: [always]
# validate_inventory : Check that kube_service_addresses is a network range TAGS: [always]
# validate_inventory : Check that kube_pods_subnet is a network range TAGS: [always]
# validate_inventory : Check that kube_pods_subnet does not collide with kube_service_addresses TAGS: [always]
# validate_inventory : Check that ipv4 IP range is enough for the nodes TAGS: [always]
# validate_inventory : Check that kube_service_addresses_ipv6 is a network range TAGS: [always]
# validate_inventory : Check that kube_pods_subnet_ipv6 is a network range TAGS: [always]
# validate_inventory : Check that kube_pods_subnet_ipv6 does not collide with kube_service_addresses_ipv6 TAGS: [always]
# validate_inventory : Check that ipv6 IP range is enough for the nodes TAGS: [always]
# validate_inventory : Stop if unsupported options selected TAGS: [always]
# validate_inventory : Warn if `enable_dual_stack_networks` is set TAGS: [always]
# validate_inventory : Stop if download_localhost is enabled but download_run_once is not TAGS: [always]
# validate_inventory : Stop if kata_containers_enabled is enabled when container_manager is docker TAGS: [always]
# validate_inventory : Stop if gvisor_enabled is enabled when container_manager is not containerd TAGS: [always]
# validate_inventory : Ensure minimum containerd version TAGS: [always]
# validate_inventory : Stop if auto_renew_certificates is enabled when certificates are managed externally (kube_external_ca_mode is true) TAGS: [always]
# play #6 (k8s_cluster:etcd): Prepare for etcd install TAGS: []
# tasks:
# container-engine/containerd : Containerd | Download containerd TAGS: [container-engine, containerd]
# container-engine/containerd : Containerd | Unpack containerd archive TAGS: [container-engine, containerd]
# container-engine/containerd : Containerd | Generate systemd service for containerd TAGS: [container-engine, containerd]
# container-engine/containerd : Containerd | Ensure containerd directories exist TAGS: [container-engine, containerd]
# container-engine/containerd : Containerd | Write containerd proxy drop-in TAGS: [container-engine, containerd]
# container-engine/containerd : Containerd | Generate default base_runtime_spec TAGS: [container-engine, containerd]
# container-engine/containerd : Containerd | Store generated default base_runtime_spec TAGS: [container-engine, containerd]
# container-engine/containerd : Containerd | Write base_runtime_specs TAGS: [container-engine, containerd]
# container-engine/containerd : Containerd | Copy containerd config file TAGS: [container-engine, containerd]
# container-engine/containerd : Containerd | Create registry directories TAGS: [container-engine, containerd]
# container-engine/containerd : Containerd | Write hosts.toml file TAGS: [container-engine, containerd]
# container-engine/containerd : Containerd | Flush handlers TAGS: [container-engine, containerd]
# container-engine/containerd : Containerd | Ensure containerd is started and enabled TAGS: [container-engine, containerd]
# ...
# (신규터미널) 모니터링
[k8s-node4]
journalctl -u containerd.service -f
Feb 07 17:47:41 k8s-node4 containerd[18185]: time="2026-02-07T17:47:41.010775397+09:00" level=info msg="container event discarded" container=0c19073f1194201f3eae16924b91dd2e1ad83aae0cbac64b1ed480700ee8d768 type=CONTAINER_STARTED_EVENT
Feb 07 17:47:41 k8s-node4 containerd[18185]: time="2026-02-07T17:47:41.023276352+09:00" level=info msg="container event discarded" container=db41df69bbfab3a1e9d7eaa30346b2f39d4b683795e573db7da7328976214644 type=CONTAINER_CREATED_EVENT
Feb 07 17:47:41 k8s-node4 containerd[18185]: time="2026-02-07T17:47:41.074578804+09:00" level=info msg="container event discarded" container=db41df69bbfab3a1e9d7eaa30346b2f39d4b683795e573db7da7328976214644 type=CONTAINER_STARTED_EVENT
Feb 07 17:47:45 k8s-node4 containerd[18185]: time="2026-02-07T17:47:45.840304916+09:00" level=info msg="container event discarded" container=cbc47937a1f92c93c517599a4293857edafb13b1f774f1119ef2e20a06124ff0 type=CONTAINER_CREATED_EVENT
Feb 07 17:47:45 k8s-node4 containerd[18185]: time="2026-02-07T17:47:45.840376958+09:00" level=info msg="container event discarded" container=cbc47937a1f92c93c517599a4293857edafb13b1f774f1119ef2e20a06124ff0 type=CONTAINER_STARTED_EVENT
Feb 07 17:47:53 k8s-node4 containerd[18185]: time="2026-02-07T17:47:53.268903439+09:00" level=info msg="container event discarded" container=1eb61d028e7ed72db00c12aef0df61755ab807fea9c34762d8a024216fedd220 type=CONTAINER_CREATED_EVENT
Feb 07 17:47:53 k8s-node4 containerd[18185]: time="2026-02-07T17:47:53.430975029+09:00" level=info msg="container event discarded" container=1eb61d028e7ed72db00c12aef0df61755ab807fea9c34762d8a024216fedd220 type=CONTAINER_STARTED_EVENT
Feb 07 17:48:58 k8s-node4 containerd[18185]: time="2026-02-07T17:48:58.870449424+09:00" level=info msg="container event discarded" container=4dcefb91dc3c481d50f86cd1443283524215da2a48fdd8518757bed89799cb1c type=CONTAINER_DELETED_EVENT
Feb 07 18:03:12 k8s-node4 containerd[18185]: time="2026-02-07T18:03:12.643285536+09:00" level=info msg="Container exec \"d5c8a47899ebc3c8cfc40c2f6462d61a1861c18e0dc670b2703a919432a5d091\" stdin closed"
Feb 07 18:03:12 k8s-node4 containerd[18185]: time="2026-02-07T18:03:12.643614871+09:00" level=error msg="Failed to resize process \"d5c8a47899ebc3c8cfc40c2f6462d61a1861c18e0dc670b2703a919432a5d091\" console for container \"0c3b5b5da337fae8e0a66b064efe5f7ee052c8265ed6123ff2a7ae0edd5fd63a\"" error="cannot resize a stopped container"
...
# 재시작
Feb 07 18:23:52 k8s-node4 containerd[50325]: time="2026-02-07T18:23:52.746588076+09:00" level=info msg="runtime interface starting up..."
Feb 07 18:23:52 k8s-node4 containerd[50325]: time="2026-02-07T18:23:52.746591118+09:00" level=info msg="starting plugins..."
Feb 07 18:23:52 k8s-node4 containerd[50325]: time="2026-02-07T18:23:52.746602076+09:00" level=info msg="Synchronizing NRI (plugin) with current runtime state"
Feb 07 18:23:52 k8s-node4 containerd[50325]: time="2026-02-07T18:23:52.750647428+09:00" level=info msg="containerd successfully booted in 0.083138s"
Feb 07 18:23:52 k8s-node4 systemd[1]: Started containerd.service - containerd container runtime.
while true; do curl -sk https://127.0.0.1:6443/version | grep gitVersion ; date ; sleep 1; echo ; done
# "gitVersion": "v1.32.9",
# Sat Feb 7 06:23:50 PM KST 2026
# "gitVersion": "v1.32.9",
# Sat Feb 7 06:23:51 PM KST 2026
# "gitVersion": "v1.32.9",
# Sat Feb 7 06:23:52 PM KST 2026
# "gitVersion": "v1.32.9",
# Sat Feb 7 06:23:53 PM KST 2026
# "gitVersion": "v1.32.9",
# Sat Feb 7 06:23:54 PM KST 2026
# 1분 이내 수행 완료
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "containerd" --limit k8s-node4 -e kube_version="1.32.9"
# 확인
ssh k8s-node4 cat /etc/containerd/cri-base.json | jq | grep rlimits
# "rlimits": [],
ssh k8s-node4 crictl inspect --name nginx-proxy | grep rlimits -A6
# "rlimits": [
# {
# "hard": 65535,
# "soft": 65535,
# "type": "RLIMIT_NOFILE"
# }
# ],
# 적용을 위해서 컨테이너를 다시 기동
[k8s-node4]
watch -d crictl ps # (신규터미널) 모니터링
# CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
# POD NAMESPACE
# 486f3fc69351f 5a91d90f47ddf 4 seconds ago Running nginx-proxy 1 c108d71883
# 1ab nginx-proxy-k8s-node4 kube-system
# 1eb61d028e7ed bc6c1e09a843d 42 minutes ago Running metrics-server 0 cbc47937a1
# f92 metrics-server-65fdf69dcb-k6r7h kube-system
# db41df69bbfab 2f6c962e7b831 42 minutes ago Running coredns 0 0c19073f11
# 942 coredns-664b99d7c7-k9bpw kube-system
# 4cddb84b6751c cadcae92e6360 42 minutes ago Running kube-flannel 0 cc6c711a4f
# 213 kube-flannel-ds-arm64-9p6nf kube-system
# 15d2ba9919839 72b57ec14d31e 43 minutes ago Running kube-proxy 0 ed0ab0c227
# d93 kube-proxy-k2qgd kube-system
crictl pods --namespace kube-system --name 'nginx-proxy-*' -q | xargs crictl rmp -f
ssh k8s-node4 crictl inspect --name nginx-proxy | grep rlimits -A6
# 로그 확인
kubectl logs -n kube-system nginx-proxy-k8s-node4 -f
# /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
# /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
# /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
# 10-listen-on-ipv6-by-default.sh: info: /etc/nginx/conf.d/default.conf is not a file or does not exist
# /docker-entrypoint.sh: Sourcing /docker-entrypoint.d/15-local-resolvers.envsh
# /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
# /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
# /docker-entrypoint.sh: Configuration complete; ready for start up
# 2026/02/07 09:25:24 [notice] 1#1: using the "epoll" event method
# 2026/02/07 09:25:24 [notice] 1#1: nginx/1.28.0
# 2026/02/07 09:25:24 [notice] 1#1: built by gcc 14.2.0 (Alpine 14.2.0)
# 2026/02/07 09:25:24 [notice] 1#1: OS: Linux 6.12.0-55.39.1.el10_0.aarch64
# 2026/02/07 09:25:24 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576 # 적용 됨
# 2026/02/07 09:25:24 [notice] 1#1: start worker processes
# 2026/02/07 09:25:24 [notice] 1#1: start worker process 20
# 2026/02/07 09:25:24 [notice] 1#1: start worker process 21
# (참고) 나머지 현재 Ready 중인 모든 컨테이너 재기동
crictl pods --state ready -q | xargs crictl rmp -f
(참고) playbook 파일에 tags 정보 출력
# playbooks/ 파일 중 tags
tree playbooks/
# playbooks/
# ├── ansible_version.yml
# ├── boilerplate.yml
# ├── cluster.yml
# ├── facts.yml
# ├── install_etcd.yml
# ├── internal_facts.yml
# ├── recover_control_plane.yml
# ├── remove_node.yml
# ├── reset.yml
# ├── scale.yml
# └── upgrade_cluster.yml
grep -Rni "tags" playbooks -A2 -B1
# playbooks/ansible_version.yml-9- maximal_ansible_version: 2.18.0
# playbooks/ansible_version.yml:10: tags: always
# playbooks/ansible_version.yml-11- tasks:
# playbooks/ansible_version.yml-12- - name: "Check {{ minimal_ansible_version }} <= Ansible version < {{ maximal_ansible_version }}"
# --
# ...
# roles/ 파일 중 tags
tree roles/ -L 2
# ...
# ├── upgrade
# │ ├── post-upgrade
# │ ├── pre-upgrade
# │ └── system-upgrade
# ├── validate_inventory
# │ ├── meta
# │ └── tasks
# └── win_nodes
# └── kubernetes_patch
# 139 directories, 1 file
grep -Rni "tags" roles --include="*.yml" -A2 -B1
grep -Rni "tags" roles --include="*.yml" -A3 | less
컨트롤 플레인 노드 컴포넌트 → k8s api endpoint 분석
# apiserver static 파드의 bind-address 에 '::' 확인 엔드포인트 192.168.10.11로 설정되어 있음
kubectl describe pod -n kube-system kube-apiserver-k8s-node1 | grep -E 'address|secure-port'
# Annotations: kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 192.168.10.11:6443
# --advertise-address=192.168.10.11
# --secure-port=6443
# --bind-address=::
ssh k8s-node1 ss -tnlp | grep 6443
# LISTEN 0 4096 *:6443 *:* users:(("kube-apiserver",pid=26124,fd=3))
# lo, enp0s8, enp0s9 모두 가능
ssh k8s-node1 ip -br -4 addr
# lo UNKNOWN 127.0.0.1/8
# enp0s8 UP 10.0.2.15/24
# enp0s9 UP 192.168.10.11/24
# flannel.1 UNKNOWN 10.233.64.0/32
# cni0 UP 10.233.64.1/24
ssh k8s-node1 curl -sk https://127.0.0.1:6443/version | grep gitVersion
# "gitVersion": "v1.32.9",
ssh k8s-node1 curl -sk https://192.168.10.11:6443/version | grep gitVersion
# "gitVersion": "v1.32.9",
ssh k8s-node1 curl -sk https://10.0.2.15:6443/version | grep gitVersion
# "gitVersion": "v1.32.9",
# admin 자격증명(client) -> api-server 호출 시 엔드포인트 정보 확인
ssh k8s-node1 cat /etc/kubernetes/admin.conf | grep server
# server: https://127.0.0.1:6443
# super-admin 자격증명(client) -> api-server 호출 시 엔드포인트 정보 확인
ssh k8s-node1 cat /etc/kubernetes/super-admin.conf | grep server
# server: https://192.168.10.11:6443
# kubelet(client) -> api-server 호출 시 엔드포인트 정보 확인 : https://127.0.0.1:6443
ssh k8s-node1 cat /etc/kubernetes/kubelet.conf
# apiVersion: v1
# clusters:
# - cluster:
# certificate-authority-data: ...
# server: https://127.0.0.1:6443
# name: cluster.local
# contexts:
# - context:
# cluster: cluster.local
# user: system:node:k8s-node1
# name: system:node:k8s-node1@cluster.local
# current-context: system:node:k8s-node1@cluster.local
# kind: Config
# preferences: {}
# users:
# - name: system:node:k8s-node1
# user:
# client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
# client-key: /var/lib/kubelet/pki/kubelet-client-current.pem
ssh k8s-node1 cat /etc/kubernetes/kubelet.conf | grep server
# server: https://127.0.0.1:6443
# kube-proxy(client) -> api-server 호출 시 엔드포인트 정보 확인
k get cm -n kube-system kube-proxy -o yaml | grep server
# server: https://127.0.0.1:6443
# kube-controller-manager(client) -> api-server 호출 시 엔드포인트 정보 확인
ssh k8s-node1 cat /etc/kubernetes/controller-manager.conf | grep server
# server: https://127.0.0.1:6443
# kube-scheduler(client) -> api-server 호출 시 엔드포인트 정보 확인
ssh k8s-node1 cat /etc/kubernetes/scheduler.conf | grep server
# server: https://127.0.0.1:6443
결론
Cilium처럼 DaemonSet으로 배포된 파드들도 k8s API를 호출할 때,
노드에 떠 있는 kube-apiserver가 있기 때문에 로컬 엔드포인트 127.0.0.1:6443을 그대로 사용할 수 있습니다.
즉, 워커 노드의 파드가 API 호출 시 해당 노드의 Client-Side LB(nginx 등)를 경유해 127.0.0.1:6443으로 붙는 것과 같은 방식입니다.
Case1 의 환경 만으로도 apiserver 접근을 통일할 수 있습니다.
# Cilium Helm values 예시
# k8sServiceHost: 127.0.0.1
# k8sServicePort: 6443
[Case2] External LB → HA 컨트플 플레인 노드(3대) + (Worker Client-Side LoadBalancing)
노드의 상태를 확인하기 위해 kube-ops-view를 설치합니다.
# kube-ops-view
## helm show values geek-cookbook/kube-ops-view
helm repo add geek-cookbook https://geek-cookbook.github.io/charts/
# macOS 사용자
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 \
--set service.main.type=NodePort,service.main.ports.http.nodePort=30000 \
--set env.TZ="Asia/Seoul" --namespace kube-system \
--set image.repository="abihf/kube-ops-view" --set image.tag="latest"
# Windows 사용자
helm install kube-ops-view geek-cookbook/kube-ops-view --version 1.2.2 \
--set service.main.type=NodePort,service.main.ports.http.nodePort=30000 \
--set env.TZ="Asia/Seoul" --namespace kube-system
# 설치 확인
kubectl get deploy,pod,svc,ep -n kube-system -l app.kubernetes.io/instance=kube-ops-view
# kube-ops-view 접속 URL 확인 (1.5 , 2 배율) : nodePor 이므로 IP는 all node 의 IP 가능!
open "http://192.168.10.14:30000/#scale=1.5"
open "http://192.168.10.14:30000/#scale=2"
샘플 애플리케이션 배포, 반복 호출
애플리케이션 실행을 확인하기 위해 샘플애플리케이션을 배포하고, 반복적으로 호출합니다.
# 샘플 애플리케이션 배포
cat << EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: webpod
spec:
replicas: 2
selector:
matchLabels:
app: webpod
template:
metadata:
labels:
app: webpod
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- sample-app
topologyKey: "kubernetes.io/hostname"
containers:
- name: webpod
image: traefik/whoami
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: webpod
labels:
app: webpod
spec:
selector:
app: webpod
ports:
- protocol: TCP
port: 80
targetPort: 80
nodePort: 30003
type: NodePort
EOF

이 그림은 Service와 Endpoint를 통해 트래픽이 Pod로 반복 호출될 때 어떻게 로드밸런싱되는지 보여줍니다.
컨트롤 플레인에서는 endpoint/service controller가 Service·Pod를 보고 Endpoint 객체(해당 Service의 백엔드 Pod IP 목록)를 API 서버·etcd에 반영합니다. 각 워커 노드의 kube-proxy는 API 서버에서 Service·Endpoint 변경을 watch한 뒤, iptables 또는 IPVS 규칙으로 ClusterIP(예: 10.8.0.6)에 들어온 요청을 실제 Pod IP(예: 10.0.3.5, 10.0.3.6)로 나누어 보냅니다.
클러스터 내부 앱은 ClusterIP로, 외부 클라이언트는 NodePort로 접속하면 되고, 둘 다 결국 같은 Service를 거쳐 여러 Pod에 부하가 분산됩니다. 따라서 `curl`을 반복 호출하면 서로 다른 Pod의 Hostname이 번갈아 나오는 것입니다.
# 배포 확인
kubectl get deploy,svc,ep webpod -owide
[admin-lb] # IP는 node 작업에 따라 변경
while true; do curl -s http://192.168.10.14:30003 | grep Hostname; sleep 1; done
# Hostname: webpod-697b545f57-sd9kp
# Hostname: webpod-697b545f57-sd9kp
# Hostname: webpod-697b545f57-lthj2
# Hostname: webpod-697b545f57-sd9kp
# Hostname: webpod-697b545f57-sd9kp
# (옵션) k8s-node 에서 service 명 호출 확인
ssh k8s-node1 cat /etc/resolv.conf
# Generated by NetworkManager
# search default.svc.cluster.local svc.cluster.local
# nameserver 10.233.0.3
# nameserver 168.126.63.1
# nameserver 8.8.8.8
# options ndots:2 timeout:2 attempts:2
# 성공
ssh k8s-node1 curl -s webpod -I
# HTTP/1.1 200 OK
# 성공
ssh k8s-node1 curl -s webpod.default -I
# HTTP/1.1 200 OK
# 실패
ssh k8s-node1 curl -s webpod.default.svc -I
ssh k8s-node1 curl -s webpod.default.svc.cluster -I
# 성공
ssh k8s-node1 curl -s webpod.default.svc.cluster.local -I
# HTTP/1.1 200 OK
# Date: Sat, 07 Feb 2026 09:47:47 GMT
# Content-Length: 228
# Content-Type: text/plain; charset=utf-8
(장애 재현) 만약 컨트롤 플레인 1번 노드 장애 발생 시 영향도
장애 발생 전의 HAProxy 대시보드 상태는 다음과 같습니다.

# [admin-lb] kubeconfig 자격증명 사용 시 정보 확인
cat /root/.kube/config | grep server
server: https://192.168.10.11:6443
# 모니터링 : 신규 터미널 4개
# ----------------------
## [admin-lb]
while true; do kubectl get node ; echo ; curl -sk https://192.168.10.12:6443/version | grep gitVersion ; sleep 1; echo ; done
# Unable to connect to the server: dial tcp 192.168.10.11:6443: connect: no route to host
# "gitVersion": "v1.32.9",
## [k8s-node2]
watch -d kubectl get pod -n kube-system
kubectl logs -n kube-system nginx-proxy-k8s-node4 -f
## [k8s-node4]
while true; do curl -sk https://127.0.0.1:6443/version | grep gitVersion ; date; sleep 1; echo ; done
# "gitVersion": "v1.32.9",
# Sat Feb 7 07:01:23 PM KST 2026
# ----------------------
# 장애 재현
[k8s-node1] poweroff
# [k8s-node2]
kubectl logs -n kube-system nginx-proxy-k8s-node4 -f
# Unable to connect to the server: dial tcp 192.168.10.11:6443: connect: no route to host
# [k8s-node4] 하지만 백엔드 대상 서버가 나머지 2대가 있으니 아래 요청 처리 정상!
while true; do curl -sk https://127.0.0.1:6443/version | grep gitVersion ; date; sleep 1; echo ; done
"gitVersion": "v1.32.9",
# [admin-lb] 아래 자격증명 서버 정보 수정 필요
while true; do kubectl get node ; echo ; curl -sk https://192.168.10.12:6443/version | grep gitVersion ; sleep 1; echo ; done
Unable to connect to the server: dial tcp 192.168.10.11:6443: connect: no route to host # << 요건 실패!
"gitVersion": "v1.32.9", # << 요건 성공!
sed -i 's/192.168.10.11/192.168.10.12/g' /root/.kube/config
# while true; do kubectl get node ; echo ; curl -sk https://192.168.10.12:6443/version | grep gitVersion ; sleep 1; echo ; done
# NAME STATUS ROLES AGE VERSION
# k8s-node1 NotReady control-plane 82m v1.32.9
# k8s-node2 Ready control-plane 81m v1.32.9
# k8s-node3 Ready control-plane 81m v1.32.9
# k8s-node4 Ready <none> 81m v1.32.9
# "gitVersion": "v1.32.9",
HA Proxy 대시보드 상에서도 장애발생이 확인 됩니다.

다음으로 Virtualbox 에서 k8s-node1 가상머신을 시작합니다.
External LB → HA 컨트플 플레인 노드(3대) : k8s apiserver 호출 설정
#
curl -sk https://192.168.10.10:6443/version | grep gitVersion
"gitVersion": "v1.32.9",
#
sed -i 's/192.168.10.12/192.168.10.10/g' /root/.kube/config
# 인증서 SAN list 확인
kubectl get node
E0128 23:53:41.079370 70802 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://192.168.10.10:6443/api?timeout=32s\": tls: failed to verify certificate: x509: certificate is valid for 10.233.0.1, 192.168.10.11, 127.0.0.1, ::1, 192.168.10.12, 192.168.10.13, 10.0.2.15, fd17:625c:f037:2:a00:27ff:fe90:eaeb, not 192.168.10.10"
# 인증서에 SAN 정보 확인 (192.168.10.10,도메인 없음)
ssh k8s-node1 cat /etc/kubernetes/ssl/apiserver.crt | openssl x509 -text -noout
# DNS:k8s-node1, DNS:k8s-node2, DNS:k8s-node3, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, DNS:lb-apiserver.kubernetes.local, DNS:localhost, IP Address:10.233.0.1, IP Address:192.168.10.11, IP Address:127.0.0.1, IP Address:0:0:0:0:0:0:0:1, IP Address:192.168.10.12, IP Address:192.168.10.13, IP Address:10.0.2.15, IP Address:FD17:625C:F037:2:A00:27FF:FE90:EAEB
...
ssh k8s-node1 kubectl get cm -n kube-system kubeadm-config -o yaml
# apiServer:
# certSANs:
# - kubernetes
# - kubernetes.default
# - kubernetes.default.svc
# - kubernetes.default.svc.cluster.local
# - 10.233.0.1
# - localhost
# - 127.0.0.1
# - ::1
# - k8s-node1
# - k8s-node2
# - k8s-node3
# - lb-apiserver.kubernetes.local
# - 192.168.10.11
# - 192.168.10.12
# - 192.168.10.13
# - 10.0.2.15
# - fd17:625c:f037:2:a00:27ff:fe90:eaeb
# 인증서 SAN 에 'IP, Domain' 추가
echo "supplementary_addresses_in_ssl_keys: [192.168.10.10, k8s-api-srv.admin-lb.com]" >> inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
grep "^[^#]" inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
# supplementary_addresses_in_ssl_keys: [192.168.10.10, k8s-api-srv.admin-lb.com]
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "control-plane" --list-tasks
# ...
# play #10 (kube_control_plane): Install the control plane TAGS: []
# tasks:
# ...
# kubernetes/control-plane : Kubeadm | aggregate all SANs TAGS: [control-plane, facts]
# ...
# (신규터미널) 모니터링
[k8s-node4]
while true; do curl -sk https://127.0.0.1:6443/version | grep gitVersion ; date ; sleep 1; echo ; done
# "gitVersion": "v1.32.9",
# NAME STATUS ROLES AGE VERSION
# k8s-node1 Ready control-plane 84m v1.32.9
# k8s-node2 Ready control-plane 84m v1.32.9
# k8s-node3 Ready control-plane 84m v1.32.9
# k8s-node4 Ready <none> 84m v1.32.9
# 1분 이내 완료
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "control-plane" --limit kube_control_plane -e kube_version="1.32.9"
# Saturday 07 February 2026 19:09:39 +0900 (0:00:00.095) 0:00:23.128 *****
# ===============================================================================
# Gather minimal facts ------------------------------------------------------------------------- 2.79s
# kubernetes/control-plane : Kubeadm | Check apiserver.crt SAN hosts --------------------------- 1.42s
# kubernetes/control-plane : Kubeadm | Check apiserver.crt SAN IPs ----------------------------- 1.36s
# kubernetes/control-plane : Backup old certs and keys ----------------------------------------- 1.15s
# Gather necessary facts (hardware) ------------------------------------------------------------ 0.95s
# kubernetes/preinstall : Create other directories of root owner ------------------------------- 0.95s
# win_nodes/kubernetes_patch : debug ----------------------------------------------------------- 0.91s
# kubernetes/control-plane : Backup old confs -------------------------------------------------- 0.82s
# kubernetes/control-plane : Install | Copy kubectl binary from download dir ------------------- 0.76s
# kubernetes/control-plane : Update server field in component kubeconfigs ---------------------- 0.75s
# kubernetes/preinstall : Create kubernetes directories ---------------------------------------- 0.65s
# kubernetes/control-plane : Kubeadm | Create kubeadm config ----------------------------------- 0.61s
# kubernetes/control-plane : Renew K8S control plane certificates monthly 2/2 ------------------ 0.50s
# kubernetes/control-plane : Create kube-scheduler config -------------------------------------- 0.41s
# kubernetes/control-plane : Install script to renew K8S control plane certificates ------------ 0.37s
# kubernetes/control-plane : Kubeadm | regenerate apiserver cert 2/2 --------------------------- 0.36s
# kubernetes/control-plane : Create kubeadm token for joining nodes with 24h expiration (default) --- 0.31s
# kubernetes/control-plane : Kubeadm | regenerate apiserver cert 1/2 --------------------------- 0.30s
# kubernetes/control-plane : Kubeadm | aggregate all SANs -------------------------------------- 0.29s
# Gather necessary facts (network) ------------------------------------------------------------- 0.29s
# 192.168.10.10 엔드포인트 요청 성공!
kubectl get node -v=6
# I0207 19:11:45.343325 38792 loader.go:402] Config loaded from file: /root/.kube/config
# I0207 19:11:45.343673 38792 envvar.go:172] "Feature gate default state" feature="WatchListClient" enabled=false
# I0207 19:11:45.343693 38792 envvar.go:172] "Feature gate default state" feature="ClientsAllowCBOR" enabled=false
# I0207 19:11:45.343696 38792 envvar.go:172] "Feature gate default state" feature="ClientsPreferCBOR" enabled=false
# I0207 19:11:45.343699 38792 envvar.go:172] "Feature gate default state" feature="InformerResourceVersion" enabled=false
# I0207 19:11:45.357305 38792 round_trippers.go:560] GET https://192.168.10.10:6443/api/v1/nodes?limit=500 200 OK in 10 milliseconds
# NAME STATUS ROLES AGE VERSION
# k8s-node1 Ready control-plane 90m v1.32.9
# k8s-node2 Ready control-plane 89m v1.32.9
# k8s-node3 Ready control-plane 89m v1.32.9
# k8s-node4 Ready <none> 89m v1.32.9
# ip, domain 둘 다 확인
sed -i 's/192.168.10.10/k8s-api-srv.admin-lb.com/g' /root/.kube/config
# 추가 확인
ssh k8s-node1 cat /etc/kubernetes/ssl/apiserver.crt | openssl x509 -text -noout
# X509v3 Subject Alternative Name: (192.168.10.10 추가됨)
# DNS:k8s-api-srv.admin-lb.com, DNS:k8s-node1, DNS:k8s-node2, DNS:k8s-node3, DNS:kubernetes, DNS:kubernetes.default,
# DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, DNS:lb-apiserver.kubernetes.local,
# DNS:localhost, IP Address:10.233.0.1, IP Address:192.168.10.11, IP Address:127.0.0.1,
# IP Address:0:0:0:0:0:0:0:1, IP Address:192.168.10.10, IP Address:192.168.10.12, IP Address:192.168.10.13,
# IP Address:10.0.2.15, IP Address:FD17:625C:F037:2:A00:27FF:FE90:EAEB
# 해당 cm은 최초 설치 후 자동 업데이트 X, 업그레이드에 활용된다고 하니, 위 처럼 kubeadm config 변경 시 직접 cm도 같이 변경해두자.
kubectl get cm -n kube-system kubeadm-config -o yaml
...
kubectl edit cm -n kube-system kubeadm-config # or k9s -> cm kube-system
...
LB 설정이 되면 다음과같이 트래픽이 3개의 controlplane에 분배 되는 모습을 확인할 수 있습니다.

(다시 한번 더 장애 재현) 만약 컨트롤 플레인 1번 노드 장애 발생 시 영향도
# [admin-lb] kubeconfig 자격증명 사용 시 정보 확인
cat /root/.kube/config | grep server
# server: https://k8s-api-srv.admin-lb.com:6443
# 모니터링 : 신규 터미널 2개
# ----------------------
## [admin-lb]
while true; do kubectl get node ; echo ; kubectl get pod -n kube-system; sleep 1; echo ; done
## [k8s-node4]
while true; do curl -sk https://127.0.0.1:6443/version | grep gitVersion ; date; sleep 1; echo ; done
# ----------------------
# 장애 재현
[k8s-node1] poweroff

장애가 나지만 api server 기능은 정상 동작합니다.

Virtualbox 에서 k8s-node1 가상머신을 시작하여 복구합니다,
노드 관리
노드 추가는 playbook scale.yml과 scale 역할(role)을 사용합니다. 기존 클러스터는 변경하지 않고, 인벤토리에 새로 넣은 노드만 단계적으로 클러스터에 합류시킵니다.
#
cat scale.yml
# ---
# - name: Scale the cluster
# ansible.builtin.import_playbook: playbooks/scale.yml
cat playbooks/scale.yml
# ---
# [scale.yml] 인벤토리에 새로 추가한 노드만 대상으로, 기존 클러스터는 건드리지 않고 단계적으로 조인시킵니다.
# - name: Common tasks for every playbooks
# import_playbook: boilerplate.yml
# # boilerplate: Ansible 버전/변수 검증, 인벤토리 검사 등 공통 선행 작업
# - name: Gather facts
# import_playbook: internal_facts.yml
# # internal_facts: 각 호스트 fact 수집 (나중 play에서 사용)
# - name: Install etcd
# # [Play 1] etcd: 기존 etcd 클러스터는 변경하지 않음. 새 노드가 etcd 멤버로 지정된 경우에만 해당 노드에서 etcd join 수행.
# vars:
# etcd_cluster_setup: false # 새 클러스터 구성 아님
# etcd_events_cluster_setup: false
# import_playbook: install_etcd.yml
# - name: Download images to ansible host cache via first kube_control_plane node
# # [Play 2] 다운로드: download_run_once 설정 시 첫 번째 control-plane 노드에서만 이미지/바이너리를 받아 Ansible 호스트 캐시에 적재. 새 노드에는 이 캐시에서 배포.
# hosts: kube_control_plane[0]
# gather_facts: false
# any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
# environment: "{{ proxy_disable_env }}"
# roles:
# - { role: kubespray_defaults, when: "not skip_downloads and download_run_once and not download_localhost" }
# - { role: kubernetes/preinstall, tags: preinstall, when: "not skip_downloads and download_run_once and not download_localhost" }
# - { role: download, tags: download, when: "not skip_downloads and download_run_once and not download_localhost" }
# - name: Target only workers to get kubelet installed and checking in on any new nodes(engine)
# # [Play 3] 워커 노드 준비(engine): 새 워커(kube_node)에 preinstall, container-engine(containerd 등), download, (조건부)etcd 클라이언트 설정.
# # role etcd: Calico/Flannel/Cilium 등이 etcd를 직접 쓰는 경우, 워커에서도 etcd 접속 가능하도록 설정. etcd_cluster_setup=false 로 join만.
# hosts: kube_node
# gather_facts: false
# any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
# environment: "{{ proxy_disable_env }}"
# roles:
# - { role: kubespray_defaults }
# - { role: kubernetes/preinstall, tags: preinstall }
# - { role: container-engine, tags: "container-engine", when: deploy_container_engine }
# - { role: download, tags: download, when: "not skip_downloads" }
# - role: etcd
# tags: etcd
# vars:
# etcd_cluster_setup: false
# when:
# - etcd_deployment_type != "kubeadm"
# - kube_network_plugin in ["calico", "flannel", "canal", "cilium"] or cilium_deploy_additionally | default(false) | bool
# - kube_network_plugin != "calico" or calico_datastore == "etcd"
# - name: Target only workers to get kubelet installed and checking in on any new nodes(node)
# # [Play 4] 워커 노드(node): kubelet·kube-proxy 설치 및 systemd 등록. 아직 클러스터 join은 하지 않음.
# hosts: kube_node
# gather_facts: false
# any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
# environment: "{{ proxy_disable_env }}"
# roles:
# - { role: kubespray_defaults }
# - { role: kubernetes/node, tags: node }
# - name: Upload control plane certs and retrieve encryption key
# # [Play 5] kubeadm 인증서: 첫 번째 control-plane에서 인증서 업로드 후 certificate_key를 변수로 저장. 새 워커의 kubeadm join 시 이 키 사용.
# hosts: kube_control_plane | first
# environment: "{{ proxy_disable_env }}"
# gather_facts: false
# tags: kubeadm
# roles:
# - { role: kubespray_defaults }
# tasks:
# - name: Upload control plane certificates
# # kubeadm init phase upload-certs --upload-certs: 컨트롤 플레인 인증서를 secret으로 올리고, 조인용 키를 stdout 마지막 줄로 반환
# command: >-
# {{ bin_dir }}/kubeadm init phase
# --config {{ kube_config_dir }}/kubeadm-config.yaml
# upload-certs
# --upload-certs
# environment: "{{ proxy_disable_env }}"
# register: kubeadm_upload_cert
# changed_when: false
# - name: Set fact 'kubeadm_certificate_key' for later use
# set_fact:
# kubeadm_certificate_key: "{{ kubeadm_upload_cert.stdout_lines[-1] | trim }}"
# when: kubeadm_certificate_key is not defined
# - name: Target only workers to get kubelet installed and checking in on any new nodes(network)
# # [Play 6] 클러스터 조인 및 네트워크: 새 워커에서 kubeadm join → node-label / node-taint → CNI(network_plugin) 적용.
# hosts: kube_node
# gather_facts: false
# any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
# environment: "{{ proxy_disable_env }}"
# roles:
# - { role: kubespray_defaults }
# - { role: kubernetes/kubeadm, tags: kubeadm } # kubeadm join 실행 → 클러스터에 노드 등록
# - { role: kubernetes/node-label, tags: node-label } # 인벤토리/변수에 정의된 라벨 적용
# - { role: kubernetes/node-taint, tags: node-taint } # 인벤토리/변수에 정의된 테인트 적용
# - { role: network_plugin, tags: network } # Calico/Cilium 등 CNI 설정 → Pod 네트워크 및 노드 간 통신 가능
# - name: Apply resolv.conf changes now that cluster DNS is up
# # [Play 7] DNS(resolvconf): 클러스터 DNS(CoreDNS 등)가 올라온 뒤, 전체 클러스터 노드의 /etc/resolv.conf를 갱신해 내부 도메인 해석이 되도록 함.
# hosts: k8s_cluster
# gather_facts: false
# any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
# environment: "{{ proxy_disable_env }}"
# roles:
# - { role: kubespray_defaults }
# - { role: kubernetes/preinstall, when: "dns_mode != 'none' and resolvconf_mode == 'host_resolvconf'", tags: resolvconf, dns_late: true }
노드 추가 k8s-node5
# inventory.ini 수정
cat << EOF > /root/kubespray/inventory/mycluster/inventory.ini
[kube_control_plane]
k8s-node1 ansible_host=192.168.10.11 ip=192.168.10.11 etcd_member_name=etcd1
k8s-node2 ansible_host=192.168.10.12 ip=192.168.10.12 etcd_member_name=etcd2
k8s-node3 ansible_host=192.168.10.13 ip=192.168.10.13 etcd_member_name=etcd3
[etcd:children]
kube_control_plane
[kube_node]
k8s-node4 ansible_host=192.168.10.14 ip=192.168.10.14
k8s-node5 ansible_host=192.168.10.15 ip=192.168.10.15
EOF
ansible-inventory -i /root/kubespray/inventory/mycluster/inventory.ini --graph
# @all:
# |--@ungrouped:
# |--@etcd:
# | |--@kube_control_plane:
# | | |--k8s-node1
# | | |--k8s-node2
# | | |--k8s-node3
# |--@kube_node:
# | |--k8s-node4
# | |--k8s-node5
# ansible 연결 확인
ansible -i inventory/mycluster/inventory.ini k8s-node5 -m ping
# [WARNING]: Platform linux on host k8s-node5 is using the discovered Python interpreter at
# /usr/bin/python3.12, but future installation of another Python interpreter could change the meaning
# of that path. See https://docs.ansible.com/ansible-
# core/2.17/reference_appendices/interpreter_discovery.html for more information.
# k8s-node5 | SUCCESS => {
# "ansible_facts": {
# "discovered_interpreter_python": "/usr/bin/python3.12"
# },
# "changed": false,
# "ping": "pong"
# }
# 모니터링
watch -d kubectl get node
kube-ops-view
# 워커 노드 추가 수행 : 3분 정도 소요
ansible-playbook -i inventory/mycluster/inventory.ini -v scale.yml --list-tasks
ANSIBLE_FORCE_COLOR=true ansible-playbook -i inventory/mycluster/inventory.ini -v scale.yml --limit=k8s-node5 -e kube_version="1.32.9" | tee kubespray_add_worker_node.log
# Saturday 07 February 2026 19:32:10 +0900 (0:00:00.015) 0:03:30.587 *****
# ===============================================================================
# network_plugin/flannel : Flannel | Wait for flannel subnet.env file presence -- 29.28s
# download : Download_file | Download item ------------------------------- 23.35s
# download : Download_file | Download item ------------------------------- 12.07s
# download : Download_container | Download image if required ------------- 10.56s
# download : Download_file | Download item ------------------------------- 10.23s
# download : Download_container | Download image if required -------------- 8.77s
# container-engine/containerd : Download_file | Download item ------------- 7.29s
# download : Download_container | Download image if required -------------- 6.30s
# download : Download_container | Download image if required -------------- 5.97s
# system_packages : Manage packages --------------------------------------- 5.53s
# download : Download_container | Download image if required -------------- 2.75s
# container-engine/containerd : Containerd | Unpack containerd archive ---- 2.52s
# network_plugin/cni : CNI | Copy cni plugins ----------------------------- 2.51s
# container-engine/runc : Download_file | Download item ------------------- 2.12s
# container-engine/crictl : Extract_file | Unpacking archive -------------- 2.10s
# container-engine/crictl : Download_file | Download item ----------------- 2.06s
# container-engine/nerdctl : Download_file | Download item ---------------- 2.05s
# network_plugin/cni : CNI | Copy cni plugins ----------------------------- 1.87s
# container-engine/nerdctl : Extract_file | Unpacking archive ------------- 1.87s
# container-engine/validate-container-engine : Populate service facts ----- 1.28s
# 확인
kubectl get node -owide
# NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
# k8s-node1 Ready control-plane 111m v1.32.9 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node2 Ready control-plane 111m v1.32.9 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node3 Ready control-plane 111m v1.32.9 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node4 Ready <none> 110m v1.32.9 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node5 Ready <none> 94s v1.32.9 192.168.10.15 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
kubectl get pod -n kube-system -owide |grep k8s-node5
kube-flannel-ds-arm64-q6slt 1/1 Running 1 (74s ago) 107s 192.168.10.15 k8s-node5 <none> <none>
kube-proxy-srmp4 1/1 Running 0 107s 192.168.10.15 k8s-node5 <none> <none>
nginx-proxy-k8s-node5 1/1 Running 0 106s 192.168.10.15 k8s-node5 <none> <none>
# 변경 정보 확인
ssh k8s-node5 tree /etc/kubernetes
# /etc/kubernetes
# ├── kubeadm-client.conf
# ├── kubelet.conf
# ├── kubelet.conf.21457.2026-02-07@19:31:27~
# ├── kubelet-config.yaml
# ├── kubelet.env
# ├── manifests
# │ └── nginx-proxy.yml
# ├── pki -> /etc/kubernetes/ssl
# └── ssl
# └── ca.crt
# 4 directories, 7 files
ssh k8s-node5 tree /var/lib/kubelet
# ...
# │ ├── ca.crt -> ..data/ca.crt
# │ ├── namespace -> ..data/namespace
# │ └── token -> ..data/token
# └── f8f0790f0f374d27632f9ad8c3ae4aaf
# ├── containers
# │ └── nginx-proxy
# │ └── e4e36bc8
# ├── etc-hosts
# ├── plugins
# └── volumes
# 39 directories, 33 files
ssh k8s-node5 pstree -a
# systemd --switched-root --system --deserialize=46 no_timer_check
# |-NetworkManager --no-daemon
# | `-3*[{NetworkManager}]
# |-VBoxService --pidfile /var/run/vboxadd-service.sh
# | `-8*[{VBoxService}]
# |-agetty -o -- \\u --noreset --noclear - linux
# |-atd -f
# |-auditd
# | |-sedispatch
# | `-2*[{auditd}]
# |-chronyd -F 2
# |-containerd
# | `-10*[{containerd}]
# ...
# 샘플 파드 분배
kubectl get pod -owide
# NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
# webpod-697b545f57-lthj2 1/1 Running 0 50m 10.233.67.5 k8s-node4 <none> <none>
# webpod-697b545f57-sd9kp 1/1 Running 0 50m 10.233.67.4 k8s-node4 <none> <none>
kubectl scale deployment webpod --replicas 1
kubectl scale deployment webpod --replicas 2
kubectl get pod -owide
# NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
# webpod-697b545f57-lthj2 1/1 Running 0 50m 10.233.67.5 k8s-node4 <none> <none>
# webpod-697b545f57-rrxfx 1/1 Running 0 14s 10.233.68.2 k8s-node5 <none> <none>
노드가 추가된 모습을 kube-opsview로 확인합니다.

노드 삭제
노드 안전 제거(Graceful Remove)
내부적으로 playbooks/remove_node.yml 역할을 이용하여 노드와 연관된 리소스를 정리하고 클러스터에서 graceful termination을 수행합니다.
# webpod deployment 에 pdb 설정 : 해당 정책은 항상 최소 2개의 Pod가 Ready 상태여야 함 , drain / eviction 시 단 하나의 Pod도 축출 불가
kubectl scale deployment webpod --replicas 1
kubectl scale deployment webpod --replicas 2
cat <<EOF | kubectl apply -f -
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: webpod
namespace: default
spec:
maxUnavailable: 0
selector:
matchLabels:
app: webpod
EOF
# 확인
kubectl get pdb
# NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
# webpod N/A 0 0 16s
# 삭제 실패
ansible-playbook -i inventory/mycluster/inventory.ini -v remove-node.yml --list-tags
Using /root/kubespray/ansible.cfg as config file
# [WARNING]: Could not match supplied host pattern, ignoring: bastion
# [WARNING]: Could not match supplied host pattern, ignoring: k8s_cluster
# [WARNING]: Could not match supplied host pattern, ignoring: calico_rr
# playbook: remove-node.yml
# play #1 (localhost): Validate nodes for removal TAGS: []
# TASK TAGS: []
# play #2 (all): Check Ansible version TAGS: [always]
# TASK TAGS: [always, check]
# play #3 (all): Inventory setup and validation TAGS: [always]
# TASK TAGS: [always]
# play #4 (bastion[0]): Install bastion ssh config TAGS: []
# TASK TAGS: [bastion, localhost]
# play #5 (this_is_unreachable): Confirm node removal TAGS: []
# [WARNING]: Could not match supplied host pattern, ignoring: this_is_unreachable
# TASK TAGS: []
# play #6 (k8s_cluster:etcd:calico_rr): Bootstrap hosts for Ansible TAGS: []
# TASK TAGS: [bootstrap_os, facts, system-packages]
# play #7 (k8s_cluster:etcd:calico_rr): Gather facts TAGS: [always]
# TASK TAGS: [always]
# play #8 (this_is_unreachable): Reset node TAGS: []
# TASK TAGS: [containerd, crio, dns, docker, facts, files, ip6tables, iptables, mounts, network, pre-remove, reset, services]
# play #9 (this_is_unreachable): Post node removal TAGS: []
# TASK TAGS: [post-remove]
ansible-playbook -i inventory/mycluster/inventory.ini -v remove-node.yml -e node=k8s-node5
...
# PLAY [Confirm node removal] *******************************************************************************************************
# Saturday 07 February 2026 19:36:34 +0900 (0:00:00.106) 0:00:01.562 ******
# [Confirm Execution]
# Are you sure you want to delete nodes state? Type 'yes' to delete nodes.: yes
...
TASK [remove_node/pre_remove : Remove-node | List nodes] **************************************************************************
# ok: [k8s-node5 -> k8s-node1(192.168.10.11)] => {"changed": false, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig", "/etc/kubernetes/admin.conf", "get", "nodes", "-o", "go-template={{ range .items }}{{ .metadata.name }}{{ \"\\n\" }}{{ end }}"], "delta": "0:00:00.048156", "end": "2026-02-07 19:36:56.975111", "msg": "", "rc": 0, "start": "2026-02-07 19:36:56.926955", "stderr": "", "stderr_lines": [], "stdout": "k8s-node1\nk8s-node2\nk8s-node3\nk8s-node4\nk8s-node5", "stdout_lines": ["k8s-node1", "k8s-node2", "k8s-node3", "k8s-node4", "k8s-node5"]}
# FAILED - RETRYING: [k8s-node5 -> k8s-node1]: Remove-node | Drain node except daemonsets resource (3 retries left).
# CTRL+C 취소
# pdb 삭제
kubectl delete pdb webpod
# 다시 삭제 시도 : 2분 20초 소요
ansible-playbook -i inventory/mycluster/inventory.ini -v remove-node.yml -e node=k8s-node5
# ...
# PLAY [Confirm node removal] *******************************************************************************************************
# Saturday 07 February 2026 19:39:54 +0900 (0:00:00.106) 0:00:01.506 *****
# [Confirm Execution]
# Are you sure you want to delete nodes state? Type 'yes' to delete nodes.: yes
# ...
# 확인
kubectl get node -owide
# NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
# k8s-node1 Ready control-plane 119m v1.32.9 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node2 Ready control-plane 119m v1.32.9 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node3 Ready control-plane 119m v1.32.9 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node4 Ready <none> 119m v1.32.9 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# 삭제 확인
ssh k8s-node5 tree /etc/kubernetes
# /etc/kubernetes [error opening dir]
# 0 directories, 0 files
ssh k8s-node5 tree /var/lib/kubelet
# /var/lib/kubelet [error opening dir]
# 0 directories, 0 files
ssh k8s-node5 pstree -a
# inventory.ini 수정
cat << EOF > /root/kubespray/inventory/mycluster/inventory.ini
[kube_control_plane]
k8s-node1 ansible_host=192.168.10.11 ip=192.168.10.11 etcd_member_name=etcd1
k8s-node2 ansible_host=192.168.10.12 ip=192.168.10.12 etcd_member_name=etcd2
k8s-node3 ansible_host=192.168.10.13 ip=192.168.10.13 etcd_member_name=etcd3
[etcd:children]
kube_control_plane
[kube_node]
k8s-node4 ansible_host=192.168.10.14 ip=192.168.10.14
EOF
다음 실습을 위해 다시 노드를 추가합니다.
# inventory.ini 수정
cat << EOF > /root/kubespray/inventory/mycluster/inventory.ini
[kube_control_plane]
k8s-node1 ansible_host=192.168.10.11 ip=192.168.10.11 etcd_member_name=etcd1
k8s-node2 ansible_host=192.168.10.12 ip=192.168.10.12 etcd_member_name=etcd2
k8s-node3 ansible_host=192.168.10.13 ip=192.168.10.13 etcd_member_name=etcd3
[etcd:children]
kube_control_plane
[kube_node]
k8s-node4 ansible_host=192.168.10.14 ip=192.168.10.14
k8s-node5 ansible_host=192.168.10.15 ip=192.168.10.15
EOF
# 워커 노드 추가 수행 : 3분 정도 소요
ANSIBLE_FORCE_COLOR=true ansible-playbook -i inventory/mycluster/inventory.ini -v scale.yml --limit=k8s-node5 -e kube_version="1.32.9" | tee kubespray_add_worker_node.log
# 확인
kubectl get node -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-node1 Ready control-plane 129m v1.32.9 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node2 Ready control-plane 129m v1.32.9 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node3 Ready control-plane 129m v1.32.9 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node4 Ready <none> 128m v1.32.9 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node5 Ready <none> 5m52s v1.32.9 192.168.10.15 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# 샘플 파드 분배
kubectl scale deployment webpod --replicas 1
kubectl scale deployment webpod --replicas 2
kubectl get pod -owide
# NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
# webpod-697b545f57-5t9kl 1/1 Running 0 9s 10.233.69.2 k8s-node5 <none> <none>
# webpod-697b545f57-lthj2 1/1 Running 0 67m 10.233.67.5 k8s-node4 <none> <none>
비정상 노드 강제 삭제(remove-node.yml) 후 추가
# 모니터링
watch -d kubectl get node
# k8s-node5 비정상 노드 상태 만들기
ssh k8s-node5 systemctl stop kubelet
ssh k8s-node5 systemctl stop containerd
# 확인
kubectl get node
# NAME STATUS ROLES AGE VERSION
# k8s-node1 Ready control-plane 4h11m v1.34.3
# k8s-node2 Ready control-plane 4h11m v1.34.3
# k8s-node3 Ready control-plane 4h11m v1.34.3
# k8s-node4 Ready <none> 4h11m v1.34.3
# k8s-node5 NotReady <none> 128m v1.34.3
# taint 추가 확인
kubectl describe node k8s-node5 | grep Taint
# Taints: node.kubernetes.io/unreachable:NoExecute
# 삭제 시도
ansible-playbook -i inventory/mycluster/inventory.ini -v remove-node.yml -e node=k8s-node5 -e skip_confirmation=true
# 아래 해당 노드 drain 에서 실패!
- name: Remove-node | Drain node except daemonsets resource
command: >-
{{ kubectl }} drain
--force
--ignore-daemonsets
--grace-period {{ drain_grace_period }}
--timeout {{ drain_timeout }}
--delete-emptydir-data {{ kube_override_hostname | default(inventory_hostname) }}
when:
- groups['kube_control_plane'] | length > 0
# ignore servers that are not nodes
- kube_override_hostname | default(inventory_hostname) in nodes.stdout_lines
register: result
failed_when: result.rc != 0 and not allow_ungraceful_removal
delegate_to: "{{ groups['kube_control_plane'] | first }}"
until: result.rc == 0 or allow_ungraceful_removal
retries: "{{ drain_retries }}"
delay: "{{ drain_retry_delay_seconds }}"
# role 내에 defaults 에 변수 확인
cat roles/remove_node/pre_remove/defaults/main.yml
# ---
# allow_ungraceful_removal: false
# drain_grace_period: 300
# drain_timeout: 360s
# drain_retries: 3
# drain_retry_delay_seconds: 10
비정상 노드 강제 삭제
reset_nodes=false: 클러스터 제어부에서 메타데이터만 정리, 노드에 삭제 시도 안함(kubeadm reset ❌ , 서비스 정리 ❌ , SSH 접속 ❌)allow_ungraceful_removal=true: drain / 정상 종료 못 해도 그냥 제거 허용, Pod eviction 실패 무시, kubelet 응답 없어도 계속 진행
# 모니터링
watch -d kubectl get node
# 삭제 시도
ansible-playbook -i inventory/mycluster/inventory.ini -v remove-node.yml -e node=k8s-node5 -e reset_nodes=false -e allow_ungraceful_removal=true -e skip_confirmation=true
# ...
# TASK [remove-node/post-remove : Remove-node | Delete node] ************************************************************************
# changed: [k8s-node5 -> k8s-node1(192.168.10.11)] => {"attempts": 1, "changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig", "/etc/kubernetes/admin.conf", "delete", "node", "k8s-node5"], "delta": "0:00:00.132328", "end": "2026-01-30 23:13:44.943928", "msg": "", "rc": 0, "start": "2026-01-30 23:13:44.811600", "stderr": "", "stderr_lines": [], "stdout": "node \"k8s-node5\" deleted", "stdout_lines": ["node \"k8s-node5\" deleted"]}
# Saturday 07 February 2026 22:07:29 +0900 (0:00:00.432) 0:06:04.923 *****
# ===============================================================================
# remove_node/pre_remove : Remove-node | Drain node except daemonsets resource --------------- 360.29s # drain_timeout: 360s 반영
# remove-node/post-remove : Remove-node | Delete node ------------------------------------------ 0.43s
# remove_node/pre_remove : Remove-node | List nodes -------------------------------------------- 0.35s
# system_packages : Manage packages ------------------------------------------------------------ 0.17s
# dynamic_groups : Match needed groups by their old names or definition ------------------------ 0.16s
# reset : Reset | delete some files and directories -------------------------------------------- 0.15s
# bootstrap_os : Ensure bash_completion.d folder exists ---------------------------------------- 0.13s
# validate_inventory : Guarantee that enough network address space is available for all pods --- 0.11s
# validate_inventory : Check that kube_pods_subnet does not collide with kube_service_addresses --- 0.11s
# bootstrap_os : Include tasks ----------------------------------------------------------------- 0.09s
# Gather necessary facts (hardware) ------------------------------------------------------------ 0.09s
# bootstrap_os : Include vars ------------------------------------------------------------------ 0.09s
# system_packages : Gather OS information ------------------------------------------------------ 0.08s
# validate_inventory : Stop if unsupported version of Kubernetes ------------------------------- 0.08s
# validate_inventory : Stop if auto_renew_certificates is enabled when certificates are managed externally (kube_external_ca_mode is true) --- 0.08s
# network_facts : Populates no_proxy to all hosts ---------------------------------------------- 0.08s
# network_facts : Set computed IPs varables ---------------------------------------------------- 0.07s
# validate_inventory : Stop if kata_containers_enabled is enabled when container_manager is docker --- 0.07s
# Fail if user does not confirm deletion ------------------------------------------------------- 0.07s
# bootstrap_os : Fetch /etc/os-release --------------------------------------------------------- 0.06s
# 확인
kubectl get node
# NAME STATUS ROLES AGE VERSION
# k8s-node1 Ready control-plane 4h26m v1.34.3
# k8s-node2 Ready control-plane 4h26m v1.34.3
# k8s-node3 Ready control-plane 4h26m v1.34.3
# k8s-node4 Ready <none> 4h25m v1.34.3
# k8s-node5 상태 확인 : 아직 이전 상태로 남아 있음
ssh k8s-node5 systemctl status kubelet --no-pager
# (venv) root@admin-lb:~/kubespray# ssh k8s-node5 systemctl status kubelet --no-pager
# ○ kubelet.service - Kubernetes Kubelet Server
# Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; preset: disabled)
# Active: inactive (dead) since Sat 2026-02-07 21:52:46 KST; 15min ago
# Duration: 22min 26.124s
# Invocation: 354c4dc2cd4543778bb3cd826b83d73c
# Docs: https://github.com/GoogleCloudPlatform/kubernetes
# Process: 93731 ExecStart=/usr/local/bin/kubelet $KUBE_LOGTOSTDERR $KUBE_LOG_LEVEL $KUBELET_API_SERVER $KUBELET_ADDRESS $KUBELET_PORT $KUBELET_HOSTNAME $KUBELET_ARGS $DOCKER_SOCKET $KUBELET_NETWORK_PLUGIN $KUBELET_VOLUME_PLUGIN $KUBELET_CLOUDPROVIDER (code=exited, status=0/SUCCESS)
# Main PID: 93731 (code=exited, status=0/SUCCESS)
# Mem peak: 71.3M
# CPU: 18.361s
ssh k8s-node5 tree /etc/kubernetes
# /etc/kubernetes
# ├── kubeadm-client.conf
# ├── kubeadm-client.conf.62217.2026-02-07@20:40:37~
# ├── kubeadm-client.conf.73163.2026-02-07@20:48:50~
# ├── kubeadm-client.conf.94036.2026-02-07@21:30:22~
# ├── kubelet.conf
# ├── kubelet.conf.39557.2026-02-07@19:45:06~
# ├── kubelet-config.yaml
# ├── kubelet.env
# ├── manifests
# │ └── nginx-proxy.yml
# ├── pki -> /etc/kubernetes/ssl
# └── ssl
# └── ca.crt
# 4 directories, 10 files
# (참고) 노드 제거 실행 task
cat roles/remove-node/post-remove/tasks/main.yml
# ---
# - name: Remove-node | Delete node
# command: "{{ kubectl }} delete node {{ kube_override_hostname | default(inventory_hostname) }}"
# delegate_to: "{{ groups['kube_control_plane'] | first }}"
# when:
# - groups['kube_control_plane'] | length > 0
# # ignore servers that are not nodes
# - ('k8s_cluster' in group_names) and kube_override_hostname | default(inventory_hostname) in nodes.stdout_lines
# retries: "{{ delete_node_retries }}"
# # Sometimes the api-server can have a short window of indisponibility when we delete a control plane node
# delay: "{{ delete_node_delay_seconds }}"
# register: result
# until: result is not failed
다음 실습을 위해 ‘k8s-node5 초기화 or k8s-node5 가상머신만 삭제’ 후 다시 생성하여 k8s-node5 추가하기
[k8s-node5]
# kubeadm reset 실행
kubeadm reset -f
# [preflight] Running pre-flight checks
# W0207 22:09:28.957564 107237 removeetcdmember.go:105] [reset] No kubeadm config, using etcd pod spec to get data directory
# [reset] Deleted contents of the etcd data directory: /var/lib/etcd
# [reset] Stopping the kubelet service
# [reset] Unmounting mounted directories in "/var/lib/kubelet"
# W0207 22:09:28.965875 107237 cleanupnode.go:104] [reset] Failed to remove containers: failed to create new CRI runtime service: validate service connection: validate CRI v1 runtime API for endpoint "unix:///var/run/containerd/containerd.sock": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/run/containerd/containerd.sock: connect: no such file or directory"
# [reset] Deleting contents of directories: [/etc/kubernetes/manifests /var/lib/kubelet /etc/kubernetes/pki]
# [reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/super-admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
# The reset process does not perform cleanup of CNI plugin configuration,
# network filtering rules and kubeconfig files.
# For information on how to perform this cleanup manually, please see:
# https://k8s.io/docs/reference/setup-tools/kubeadm/kubeadm-reset/
# 아래 업데이트 필요
# 디렉터리/파일 삭제
rm -rf /etc/cni/net.d
rm -rf /etc/kubernetes/
rm -rf /var/lib/kubelet
# iptables 정리
iptables -t nat -S
iptables -t filter -S
iptables -F
iptables -t nat -F
iptables -t mangle -F
iptables -X
# 서비스 중지
systemctl status containerd --no-pager
systemctl status kubelet --no-pager
# kubelet 서비스 중지 -> 제거해보자
systemctl stop kubelet && systemctl disable kubelet
# Removed '/etc/systemd/system/multi-user.target.wants/kubelet.service'.
# contrainrd 서비스 중지 -> 제거해보자
systemctl stop containerd && systemctl disable containerd
# Removed '/etc/systemd/system/multi-user.target.wants/containerd.service'.
# reboot
reboot
클러스터 reset (k8s 클러스터 전체를 초기화하기 때문에 안쓰는걸 권고)
playbook(reset.yml), role(playbooks/reset.yml) - k8s 클러스터 전체를 설치 전 상태로 완전 제거 - etcd 삭제, 실행 후 복구 불가
#
cat reset.yml
---
- name: Reset the cluster
ansible.builtin.import_playbook: playbooks/reset.yml
cat playbooks/reset.yml
---
- name: Common tasks for every playbooks
import_playbook: boilerplate.yml
- name: Gather facts
import_playbook: internal_facts.yml
- name: Reset cluster
hosts: etcd:k8s_cluster:calico_rr
gather_facts: false
pre_tasks:
- name: Reset Confirmation
pause:
prompt: "Are you sure you want to reset cluster state? Type 'yes' to reset your cluster."
register: reset_confirmation_prompt
run_once: true
when:
- not (skip_confirmation | default(false) | bool)
- reset_confirmation is not defined
- name: Check confirmation
fail:
msg: "Reset confirmation failed"
when:
- not reset_confirmation | default(false) | bool
- not reset_confirmation_prompt.user_input | default("") == "yes"
- name: Gather information about installed services
service_facts:
environment: "{{ proxy_disable_env }}"
roles:
- { role: kubespray_defaults}
- { role: kubernetes/preinstall, when: "dns_mode != 'none' and resolvconf_mode == 'host_resolvconf'", tags: resolvconf, dns_early: true }
- { role: reset, tags: reset }
정상적인 상태에서는 워커 노드의 추가 및 삭제는 무리 없이 수행할 수 있습니다. 그러나 컨트롤 플레인 노드(Ctrl 노드)나 etcd 노드의 추가/제거 과정에서 실패하거나 playbook 실행 도중 문제가 발생할 수 있으므로 이를 적절히 대응할 수 있어야 합니다. 최악의 상황에 대비하려면 kubeadm과 etcd를 활용하여 수동으로 노드를 관리하거나 초기화 및 추가 작업을 수행할 수 있어야 하므로, 이에 대한 절차를 미리 숙지해 두는 것이 중요합니다.
모니터링 설정
NFS subdir external provisioner 설치
# NFS subdir external provisioner 설치 : admin-lb 에 NFS Server(/srv/nfs/share) 설정 되어 있음
kubectl create ns nfs-provisioner
helm repo add nfs-subdir-external-provisioner https://kubernetes-sigs.github.io/nfs-subdir-external-provisioner/
helm install nfs-provisioner nfs-subdir-external-provisioner/nfs-subdir-external-provisioner -n nfs-provisioner \
--set nfs.server=192.168.10.10 \
--set nfs.path=/srv/nfs/share \
--set storageClass.defaultClass=true
# 스토리지 클래스 확인
kubectl get sc
# NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
# nfs-client (default) cluster.local/nfs-provisioner-nfs-subdir-external-provisioner Delete Immediate true 7s
# 파드 확인
kubectl get pod -n nfs-provisioner -owide
# NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
# nfs-provisioner-nfs-subdir-external-provisioner-b5775b7fd-jn9p6 1/1 Running 0 28s 10.233.69.19 k8s-node5 <none> <none>
kube-prometheus-stack 설치, 대시보드 추가
# kube-prometheus-stack 설치
# repo 추가
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
# 파라미터 파일 생성
cat <<EOT > monitor-values.yaml
prometheus:
prometheusSpec:
scrapeInterval: "20s"
evaluationInterval: "20s"
storageSpec:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
additionalScrapeConfigs:
- job_name: 'haproxy-metrics'
static_configs:
- targets:
- '192.168.10.10:8405'
externalLabels:
cluster: "myk8s-cluster"
service:
type: NodePort
nodePort: 30001
grafana:
defaultDashboardsTimezone: Asia/Seoul
adminPassword: prom-operator
service:
type: NodePort
nodePort: 30002
alertmanager:
enabled: false
defaultRules:
create: false
kubeProxy:
enabled: false
prometheus-windows-exporter:
prometheus:
monitor:
enabled: false
EOT
cat monitor-values.yaml
# 배포
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 80.13.3 \
-f monitor-values.yaml --create-namespace --namespace monitoring
# 확인
helm list -n monitoring
# NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
# kube-prometheus-stack monitoring 1 2026-02-07 21:35:05.497538474 +0900 KST deployed kube-prometheus-stack-80.13.3 v0.87.1
kubectl get pod,svc,ingress,pvc -n monitoring
# NAME READY STATUS RESTARTS AGE
# pod/kube-prometheus-stack-grafana-7cb5654bf5-t7zvh 0/3 ContainerCreating 0 17s
# pod/kube-prometheus-stack-kube-state-metrics-669dcf4b9-tnfhw 0/1 ContainerCreating 0 17s
# pod/kube-prometheus-stack-operator-66979c78b5-8gh9v 0/1 ContainerCreating 0 17s
# pod/kube-prometheus-stack-prometheus-node-exporter-8nhxl 0/1 ContainerCreating 0 17s
# pod/kube-prometheus-stack-prometheus-node-exporter-dlc8c 1/1 Running 0 17s
# pod/kube-prometheus-stack-prometheus-node-exporter-j8tkq 1/1 Running 0 17s
# pod/kube-prometheus-stack-prometheus-node-exporter-qjwzl 1/1 Running 0 17s
# pod/kube-prometheus-stack-prometheus-node-exporter-xhnn9 1/1 Running 0 17s
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
# service/kube-prometheus-stack-grafana NodePort 10.233.58.203 <none> 80:30002/TCP 17s
# service/kube-prometheus-stack-kube-state-metrics ClusterIP 10.233.60.39 <none> 8080/TCP 17s
# service/kube-prometheus-stack-operator ClusterIP 10.233.48.83 <none> 443/TCP 17s
# service/kube-prometheus-stack-prometheus NodePort 10.233.14.142 <none> 9090:30001/TCP,8080:32011/TCP 17s
# service/kube-prometheus-stack-prometheus-node-exporter ClusterIP 10.233.47.172 <none> 9100/TCP 17s
kubectl get prometheus,servicemonitors,alertmanagers -n monitoring
# NAME VERSION DESIRED READY RECONCILED AVAILABLE AGE
# prometheus.monitoring.coreos.com/kube-prometheus-stack-prometheus v3.9.1 1 0 True False 35s
# NAME AGE
# servicemonitor.monitoring.coreos.com/kube-prometheus-stack-apiserver 35s
# servicemonitor.monitoring.coreos.com/kube-prometheus-stack-coredns 35s
# servicemonitor.monitoring.coreos.com/kube-prometheus-stack-grafana 35s
# servicemonitor.monitoring.coreos.com/kube-prometheus-stack-kube-controller-manager 35s
# servicemonitor.monitoring.coreos.com/kube-prometheus-stack-kube-etcd 35s
# servicemonitor.monitoring.coreos.com/kube-prometheus-stack-kube-scheduler 35s
# servicemonitor.monitoring.coreos.com/kube-prometheus-stack-kube-state-metrics 35s
# servicemonitor.monitoring.coreos.com/kube-prometheus-stack-kubelet 35s
# servicemonitor.monitoring.coreos.com/kube-prometheus-stack-operator 35s
# servicemonitor.monitoring.coreos.com/kube-prometheus-stack-prometheus 35s
# servicemonitor.monitoring.coreos.com/kube-prometheus-stack-prometheus-node-exporter 35s
kubectl get crd | grep monitoring
# alertmanagerconfigs.monitoring.coreos.com 2026-02-07T12:34:58Z
# alertmanagers.monitoring.coreos.com 2026-02-07T12:34:58Z
# podmonitors.monitoring.coreos.com 2026-02-07T12:34:58Z
# probes.monitoring.coreos.com 2026-02-07T12:34:58Z
# prometheusagents.monitoring.coreos.com 2026-02-07T12:34:58Z
# prometheuses.monitoring.coreos.com 2026-02-07T12:34:58Z
# prometheusrules.monitoring.coreos.com 2026-02-07T12:34:58Z
# scrapeconfigs.monitoring.coreos.com 2026-02-07T12:34:58Z
# servicemonitors.monitoring.coreos.com 2026-02-07T12:34:59Z
# thanosrulers.monitoring.coreos.com 2026-02-07T12:34:59Z
# 각각 웹 접속 실행 : NodePort 접속
open http://192.168.10.14:30001 # prometheus
open http://192.168.10.14:30002 # grafana : 접속 계정 admin / prom-operator
# 프로메테우스 버전 확인
kubectl exec -it sts/prometheus-kube-prometheus-stack-prometheus -n monitoring -c prometheus -- prometheus --version
# prometheus, version 3.9.1
# 그라파나 버전 확인
kubectl exec -it -n monitoring deploy/kube-prometheus-stack-grafana -- grafana --version
# grafana version 12.3.1
그라파나 Dashboard : 15661, 12693
15661 - K8S Dashboard 대시보드

12693 - HAProxy 대시보드

# 대시보드 다운로드
curl -o 12693_rev12.json https://grafana.com/api/dashboards/12693/revisions/12/download
curl -o 15661_rev2.json https://grafana.com/api/dashboards/15661/revisions/2/download
curl -o k8s-system-api-server.json https://raw.githubusercontent.com/dotdc/grafana-dashboards-kubernetes/refs/heads/master/dashboards/k8s-system-api-server.json
# sed 명령어로 uid 일괄 변경 : 기본 데이터소스의 uid 'prometheus' 사용
sed -i -e 's/${DS_PROMETHEUS}/prometheus/g' 12693_rev12.json
sed -i -e 's/${DS__VICTORIAMETRICS-PROD-ALL}/prometheus/g' 15661_rev2.json
sed -i -e 's/${DS_PROMETHEUS}/prometheus/g' k8s-system-api-server.json
# my-dashboard 컨피그맵 생성 : Grafana 포드 내의 사이드카 컨테이너가 grafana_dashboard="1" 라벨 탐지!
kubectl create configmap my-dashboard --from-file=12693_rev12.json --from-file=15661_rev2.json --from-file=k8s-system-api-server.json -n monitoring
kubectl label configmap my-dashboard grafana_dashboard="1" -n monitoring
# 대시보드 경로에 추가 확인
kubectl exec -it -n monitoring deploy/kube-prometheus-stack-grafana -- ls -l /tmp/dashboards
-rw-r--r-- 1 grafana 472 333790 Jan 22 06:27 12693_rev12.json
-rw-r--r-- 1 grafana 472 198839 Jan 22 06:27 15661_rev2.json
...
etcd 메트릭 수집 될 수 있게 설정
# 2381 메트릭 포트/설정 없음
ssh k8s-node1 ss -tnlp | grep etcd
# LISTEN 0 4096 192.168.10.11:2380 0.0.0.0:* users:(("etcd",pid=227373,fd=6))
# LISTEN 0 4096 192.168.10.11:2379 0.0.0.0:* users:(("etcd",pid=227373,fd=8))
# LISTEN 0 4096 127.0.0.1:2379 0.0.0.0:* users:(("etcd",pid=227373,fd=7))
ssh k8s-node1 ps -ef | grep etcd
...
cat roles/etcd/templates/etcd.env.j2 | grep -i metric
ETCD_METRICS={{ etcd_metrics }}
{% if etcd_listen_metrics_urls is defined %}
ETCD_LISTEN_METRICS_URLS={{ etcd_listen_metrics_urls }}
{% elif etcd_metrics_port is defined %}
ETCD_LISTEN_METRICS_URLS=http://{{ etcd_address | ansible.utils.ipwrap }}:{{ etcd_metrics_port }},http://127.0.0.1:{{ etcd_metrics_port }}
# 변수 수정
cat << EOF >> inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
etcd_metrics: true
etcd_listen_metrics_urls: "http://0.0.0.0:2381"
EOF
tail -n 5 inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml
# enable_dns_autoscaler: false
# supplementary_addresses_in_ssl_keys: [192.168.10.10, k8s-api-srv.admin-lb.com]
# supplementary_addresses_in_ssl_keys: [192.168.10.10, k8s-api-srv.admin-lb.com]
# etcd_metrics: true
# etcd_listen_metrics_urls: "http://0.0.0.0:2381"
# 모니터링
[k8s-node1] watch -d "etcdctl.sh member list -w table"
[admin-lb]
while true; do echo ">> k8s-node1 <<"; ssh k8s-node1 etcdctl.sh endpoint status -w table; echo; echo ">> k8s-node2 <<"; ssh k8s-node2 etcdctl.sh endpoint status -w table; echo ">> k8s-node3 <<"; ssh k8s-node3 etcdctl.sh endpoint status -w table; sleep 1; done
# 작업중
# {"level":"warn","ts":"2026-02-07T21:46:26.808440+0900","logger":"etcd-client","caller":"v3@v3.5.26/retry_interceptor.go:63","msg":"retr
# ying of unary invoker failed","target":"etcd-endpoints://0x4000412f00/127.0.0.1:2379","attempt":0,"error":"rpc error: code = DeadlineEx
# ceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: conne
# ction refused\""}
# Error: context deadline exceeded
# 작업후
# +------------------+---------+-------+----------------------------+----------------------------+------------+
# | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
# +------------------+---------+-------+----------------------------+----------------------------+------------+
# | 8b0ca30665374b0 | started | etcd3 | https://192.168.10.13:2380 | https://192.168.10.13:2379 | false |
# | 2106626b12a4099f | started | etcd2 | https://192.168.10.12:2380 | https://192.168.10.12:2379 | false |
# | c6702130d82d740f | started | etcd1 | https://192.168.10.11:2380 | https://192.168.10.11:2379 | false |
# +------------------+---------+-------+----------------------------+----------------------------+------------+
# 2분 소요 : etcd 재시작
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "etcd" --list-tasks
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "etcd" --limit etcd -e kube_version="1.32.9"
# 확인
ssh k8s-node1 etcdctl.sh member list -w table
# +------------------+---------+-------+----------------------------+----------------------------+------------+
# | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
# +------------------+---------+-------+----------------------------+----------------------------+------------+
# | 8b0ca30665374b0 | started | etcd3 | https://192.168.10.13:2380 | https://192.168.10.13:2379 | false |
# | 2106626b12a4099f | started | etcd2 | https://192.168.10.12:2380 | https://192.168.10.12:2379 | false |
# | c6702130d82d740f | started | etcd1 | https://192.168.10.11:2380 | https://192.168.10.11:2379 | false |
# +------------------+---------+-------+----------------------------+----------------------------+------------+
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i etcdctl.sh endpoint status -w table; echo; done
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | 127.0.0.1:2379 | c6702130d82d740f | 3.5.26 | 23 MB | true | false | 10 | 51771 | 51771 | |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# >> k8s-node2 <<
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | 127.0.0.1:2379 | 2106626b12a4099f | 3.5.26 | 22 MB | false | false | 10 | 51771 | 51771 | |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# >> k8s-node3 <<
# +----------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
# +----------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | 127.0.0.1:2379 | 8b0ca30665374b0 | 3.5.26 | 23 MB | false | false | 10 | 51771 | 51771 | |
# +----------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# etcd 백업 확인
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i tree /var/backups; echo; done
# >> k8s-node1 <<
# /var/backups
# ├── etcd-2026-02-07_17:41:16
# │ ├── member
# │ │ ├── snap
# │ │ │ └── db
# │ │ └── wal
# │ │ └── 0000000000000000-0000000000000000.wal
# │ └── snapshot.db
# ├── etcd-2026-02-07_21:13:32
# │ ├── member
# │ │ ├── snap
# │ │ │ └── db
# │ │ └── wal
# │ │ └── 0000000000000000-0000000000000000.wal
# │ └── snapshot.db
# └── etcd-2026-02-07_21:46:10
# ├── member
# │ ├── snap
# │ │ └── db
# │ └── wal
# │ └── 0000000000000000-0000000000000000.wal
# └── snapshot.db
# 13 directories, 9 files
# >> k8s-node2 <<
# /var/backups
# ├── etcd-2026-02-07_17:41:17
# │ ├── member
# │ │ ├── snap
# │ │ │ └── db
# │ │ └── wal
# │ │ └── 0000000000000000-0000000000000000.wal
# │ └── snapshot.db
# ├── etcd-2026-02-07_21:13:32
# │ ├── member
# │ │ ├── snap
# │ │ │ └── db
# │ │ └── wal
# │ │ └── 0000000000000000-0000000000000000.wal
# │ └── snapshot.db
# └── etcd-2026-02-07_21:46:10
# ├── member
# │ ├── snap
# │ │ └── db
# │ └── wal
# │ └── 0000000000000000-0000000000000000.wal
# └── snapshot.db
# 13 directories, 9 files
# >> k8s-node3 <<
# /var/backups
# ├── etcd-2026-02-07_17:41:16
# │ ├── member
# │ │ ├── snap
# │ │ │ └── db
# │ │ └── wal
# │ │ └── 0000000000000000-0000000000000000.wal
# │ └── snapshot.db
# ├── etcd-2026-02-07_21:13:32
# │ ├── member
# │ │ ├── snap
# │ │ │ └── db
# │ │ └── wal
# │ │ └── 0000000000000000-0000000000000000.wal
# │ └── snapshot.db
# └── etcd-2026-02-07_21:46:10
# ├── member
# │ ├── snap
# │ │ └── db
# │ └── wal
# │ └── 0000000000000000-0000000000000000.wal
# └── snapshot.db
# 확인
ssh k8s-node1 ss -tnlp | grep etcd
# LISTEN 0 4096 192.168.10.11:2380 0.0.0.0:* users:(("etcd",pid=321303,fd=6))
# LISTEN 0 4096 192.168.10.11:2379 0.0.0.0:* users:(("etcd",pid=321303,fd=8))
# LISTEN 0 4096 127.0.0.1:2379 0.0.0.0:* users:(("etcd",pid=321303,fd=7))
# LISTEN 0 4096 *:2381 *:* users:(("etcd",pid=321303,fd=26))
# 호출 확인
curl -s http://192.168.10.11:2381/metricstual
# memory available in bytes.
# # TYPE process_virtual_memory_max_bytes gauge
# process_virtual_memory_max_bytes 1.8446744073709552e+19
# # HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# # TYPE promhttp_metric_handler_requests_in_flight gauge
# promhttp_metric_handler_requests_in_flight 1
# # HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# # TYPE promhttp_metric_handler_requests_total counter
# promhttp_metric_handler_requests_total{code="200"} 0
# promhttp_metric_handler_requests_total{code="500"} 0
# promhttp_metric_handler_requests_total{code="503"} 0
curl -s http://192.168.10.12:2381/metrics
curl -s http://192.168.10.13:2381/metrics
# 스크래핑 설정 추가
cat <<EOF > monitor-add-values.yaml
prometheus:
prometheusSpec:
additionalScrapeConfigs:
- job_name: 'etcd'
metrics_path: /metrics
static_configs:
- targets:
- '192.168.10.11:2381'
- '192.168.10.12:2381'
- '192.168.10.13:2381'
EOF
# helm upgrade 로 적용
helm get values -n monitoring kube-prometheus-stack
helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 80.13.3 \
--reuse-values -f monitor-add-values.yaml --namespace monitoring
# 확인
helm get values -n monitoring kube-prometheus-stack
# USER-SUPPLIED VALUES:
# alertmanager:
# enabled: false
# defaultRules:
# create: false
# grafana:
# adminPassword: prom-operator
# defaultDashboardsTimezone: Asia/Seoul
# service:
# nodePort: 30002
# type: NodePort
# kubeProxy:
# enabled: false
# prometheus:
# prometheusSpec:
# additionalScrapeConfigs:
# - job_name: etcd
# metrics_path: /metrics
# static_configs:
# - targets:
# - 192.168.10.11:2381
# - 192.168.10.12:2381
# - 192.168.10.13:2381
# (옵션) 불필요 servicemonitor etcd 제거 : 반영에 다소 시간 소요
kubectl get servicemonitors.monitoring.coreos.com -n monitoring kube-prometheus-stack-kube-etcd -o yaml
kubectl delete servicemonitors.monitoring.coreos.com -n monitoring kube-prometheus-stack-kube-etcd
ETCD 매트릭이 수집됩니다.

k8s 업그레이드
(사전 작업) flannel cni plugin upgrade
# 관련 변수 검색
grep -Rni "flannel" inventory/mycluster/ playbooks/ roles/ --include="*.yml" -A2 -B1
...
roles/kubespray_defaults/defaults/main/download.yml:115:flannel_version: 0.27.3
roles/kubespray_defaults/defaults/main/download.yml:116:flannel_cni_version: 1.7.1-flannel1
roles/kubespray_defaults/defaults/main/download.yml:219:flannel_image_repo: "{{ docker_image_repo }}/flannel/flannel"
roles/kubespray_defaults/defaults/main/download.yml:220:flannel_image_tag: "v{{ flannel_version }}"
roles/kubespray_defaults/defaults/main/download.yml:221:flannel_init_image_repo: "{{ docker_image_repo }}/flannel/flannel-cni-plugin"
roles/kubespray_defaults/defaults/main/download.yml:222:flannel_init_image_tag: "v{{ flannel_cni_version }}"
# 현재 정보 확인
kubectl get ds -n kube-system -owide
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-flannel 0 0 0 0 0 <none> 167m kube-flannel docker.io/flannel/flannel:v0.27.3 app=flannel
ssh k8s-node1 crictl images
IMAGE TAG IMAGE ID SIZE
docker.io/flannel/flannel-cni-plugin v1.7.1-flannel1 e5bf9679ea8c3 5.14MB
docker.io/flannel/flannel v0.27.3 cadcae92e6360 33.1MB
# 노드에 미리 이미지 다운로드 해두기 : play 로 미리 다운로드 후 적용이니 굳이 아래 과정 할 필요 없음
ssh k8s-node3 crictl pull ghcr.io/flannel-io/flannel:v0.27.4
ssh k8s-node3 crictl pull ghcr.io/flannel-io/flannel-cni-plugin:v1.8.0-flannel1
# flannel 설정 수정
cat << EOF >> inventory/mycluster/group_vars/k8s_cluster/k8s-net-flannel.yml
flannel_version: 0.27.4
EOF
grep "^[^#]" inventory/mycluster/group_vars/k8s_cluster/k8s-net-flannel.yml
# 모니터링
watch -d "ssh k8s-node3 crictl ps"
# flannel tag : Network plugin flannel => 아래 전부 실패
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --tags "flannel" --list-tasks
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --tags "flannel" --limit k8s-node3 -e kube_version="1.32.9"
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "network,flannel" --list-tasks
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --tags "network,flannel" --limit k8s-node3 -e kube_version="1.32.9"
ansible-playbook -i inventory/mycluster/inventory.ini -v cluster.yml --tags "cni,network,flannel" --list-tasks
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --tags "cni,network,flannel" --limit k8s-node3 -e kube_version="1.32.9"
## cordon -> apiserver 파드 재생성 -> uncordon
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --list-tasks
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --limit k8s-node3 -e kube_version="1.32.9"
# flannel 은 ds 이므로 특정 대상 노드로 수행 불가 -> 민감한 클러스터 환경이라면 cni plugin 은 kubespary 와 별로 배포 관리 후 특정 노드별 순차 적용 해야 될듯.
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml --tags "flannel" -e kube_version="1.32.9"
# 확인
kubectl get ds -n kube-system -owide
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-flannel 0 0 0 0 0 <none> 3h27m kube-flannel docker.io/flannel/flannel:v0.27.4 app=flannel
...
ssh k8s-node1 crictl images
IMAGE TAG IMAGE ID SIZE
docker.io/flannel/flannel-cni-plugin v1.7.1-flannel1 e5bf9679ea8c3 5.14MB
docker.io/flannel/flannel v0.27.3 cadcae92e6360 33.1MB
docker.io/flannel/flannel v0.27.4 7a52f3ae4ee60 33.2MB
kubectl get pod -n kube-system -l app=flannel -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-flannel-ds-arm64-48r2f 1/1 Running 0 98s 192.168.10.11 k8s-node1 <none> <none>
kube-flannel-ds-arm64-hchn8 1/1 Running 0 108s 192.168.10.15 k8s-node5 <none> <none>
kube-flannel-ds-arm64-jbjw9 1/1 Running 0 2m13s 192.168.10.12 k8s-node2 <none> <none>
kube-flannel-ds-arm64-qf6q9 1/1 Running 0 112s 192.168.10.13 k8s-node3 <none> <none>
kube-flannel-ds-arm64-qtv2m 1/1 Running 0 2m2s 192.168.10.14 k8s-node4 <none> <none>
# flannel 설정 수정 : repo 변경 테스트도 해보자.
cat << EOF >> inventory/mycluster/group_vars/k8s_cluster/k8s-net-flannel.yml
flannel_version: 0.27.4
#flannel_cni_version: 1.8.0
#flannel_image_repo: ghcr.io/flannel-io/flannel
#flannel_init_image_repo: ghcr.io/flannel-io/flannel-cni-plugin
EOF
grep "^[^#]" inventory/mycluster/group_vars/k8s_cluster/k8s-net-flannel.yml
# (참고) flannel_cni_version: 1.8.0 설정 후 수행 시, 아래 처럼 image 주소 문제로, 업그레이드 실패함
kubectl get pod -n kube-system -l app=flannel -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-flannel-ds-arm64-q6jhg 0/1 Init:ImagePullBackOff 0 2m46s 192.168.10.13 k8s-node3 <none> <none>
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 88s default-scheduler Successfully assigned kube-system/kube-flannel-ds-arm64-q6jhg to k8s-node3
Normal Pulling 45s (x3 over 88s) kubelet Pulling image "docker.io/flannel/flannel-cni-plugin:v1.8.0"
Warning Failed 43s (x3 over 85s) kubelet Failed to pull image "docker.io/flannel/flannel-cni-plugin:v1.8.0": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/flannel/flannel-cni-plugin:v1.8.0": failed to unpack image on snapshotter overlayfs: unexpected media type text/html for sha256:070aaf03d7c7230d20b5feaaedca75282c6aba593629b96d5a62df9eb9c5e367: not found
...
kubespary 업그레이드 공식 문서
https://github.com/kubernetes-sigs/kubespray/blob/master/docs/operations/upgrades.md
Unsafe upgrade (안전하지 않은, 즉시 업그레이드)
cluster.yml을 사용합니다.-e upgrade_cluster_setup=true를 주면, 정상 업그레이드에서만 수행되는 kube-apiserver 등 배포를 즉시 마이그레이션합니다.
Graceful upgrade (우아한 업그레이드)
- 노드 cordon, drain, uncordon을 지원합니다. 최소 1개의
kube_control_plane이 이미 배포된 환경에서 사용합니다. upgrade-cluster.yml을 사용합니다.- serial (기본값
20%): 지정한 비율(또는 개수)만큼 호스트에 play를 실행한 뒤 다음 배치로 넘어갑니다. 기본값이면 전체의 20%씩 진행하고,1로 두면 워커 노드를 1대씩 업그레이드합니다.- Ansible serial : 지정된 수 혹은 백분율의 호스트에 play 실행 후 다음 호스트 실행 - [Docs](https://docs.ansible.com/projects/ansible/latest/playbook_guide/playbooks_strategies.html#setting-the-batch-size-with-serial) - Pausing the upgrade*
각 업그레이드 *전에* 일시 중지하면 해당 노드에서 실행 중인 Pod를 검사하거나 노드에서 수동 작업을 수행하는 데 유용 할 수 있음
upgrade_node_confirm: true
- 이 기능을 사용하면 각 노드를 업그레이드하기 전에 플레이북 실행이 일시 중지됩니다.
- 터미널에서 "yes"를 입력하여 수동으로 승인하면 플레이북 실행이 다시 시작됩니다
upgrade_node_pause_seconds: 60 - 이 기능을 사용하면 각 노드를 업그레이드하기 전에 플레이북 실행이 60초 동안 일시 중지됩니다. 60초 후 플레이북 실행이 자동으로 재개됩니다.
각 노드 *업그레이드 후*** 일시 중지하는 것은 커널 업데이트를 적용하기 위해 노드를 재부팅하거나, 아직 격리된 노드를 테스트하는 데 유용 할 수 있음
upgrade_node_post_upgrade_confirm: true
- 이 옵션은 각 노드 업그레이드 후, 노드의 차단이 해제되기 전에 플레이북 실행을 일시 중지합니다. 터미널에서 "yes"를 입력하여 수동으로 승인하면 플레이북 실행이 다시 시작됩니다.
upgrade_node_post_upgrade_pause_seconds: 60 - 이 옵션은 각 노드 업그레이드 후, 노드의 차단이 해제되기 전에 플레이북 실행을 60초 동안 일시 중지합니다. 60초 후 플레이북 실행이 자동으로 재개됩니다.
설치 순서
| 순서 | 항목 | 비고 |
|---|---|---|
| 1 | 컨테이너 런타임 | Docker 또는 containerd 등 |
| 2 | 기타 의존성 | 패키지, 커널 모듈, 설정 등 |
| 3 | kubelet, kube-proxy | 모든 노드에 배포 |
| 4 | kube-apiserver, kube-scheduler, kube-controller-manager | 컨트롤 플레인 노드 |
| 5 | 네트워크 플러그인 | Calico, Cilium 등 CNI |
| 6 | 추가 기능 | CoreDNS(KubeDNS), Metrics Server 등 |
실습 목표 : 1.32.9 → 1.32.10 (패치 업그레이드) → 1.33.7 (마이너 업그레이드) → 1.34.3 : 최소 중단(무중단) 업그레이드 수행
k8s upgrade : 1.32.9 → 1.32.10
# 모니터링
[admin-lb]
watch -d kubectl get node
# NAME STATUS ROLES AGE VERSION
# k8s-node1 Ready control-plane 159m v1.32.9
# k8s-node2 Ready control-plane 159m v1.32.9
# k8s-node3 Ready control-plane 159m v1.32.9
# k8s-node4 Ready <none> 159m v1.32.9
# k8s-node5 Ready <none> 36m v1.32.9
# 업그레이드 도중 SchedulingDisabled 상태로 변경됨
# NAME STATUS ROLES AGE VERSION
# k8s-node1 Ready,SchedulingDisabled control-plane 165m v1.32.10
# 순차적으로 노드 업데이트가 수행됨. k8s-node1이 Ready로 변경되고 k8s-node2가 SchedulingDisabled 로 변경됨
# NAME STATUS ROLES AGE VERSION
# k8s-node1 Ready control-plane 166m v1.32.10
# k8s-node2 Ready,SchedulingDisabled control-plane 166m v1.32.9
watch -d kubectl get pod -n kube-system -owide
while true; do echo ">> k8s-node1 <<"; ssh k8s-node1 etcdctl.sh endpoint status -w table; echo; echo ">> k8s-node2 <<"; ssh k8s-node2 etcdctl.sh endpoint status -w table; echo ">> k8s-node3 <<"; ssh k8s-node3 etcdctl.sh endpoint status -w table; sleep 1; done
# >> k8s-node1 <<
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | 127.0.0.1:2379 | c6702130d82d740f | 3.5.25 | 6.4 MB | false | false | 5 | 30616 | 30616 | |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# >> k8s-node2 <<
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | 127.0.0.1:2379 | 2106626b12a4099f | 3.5.25 | 6.4 MB | true | false | 5 | 30616 | 30616 | |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# >> k8s-node3 <<
# +----------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
# +----------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | 127.0.0.1:2379 | 8b0ca30665374b0 | 3.5.25 | 6.3 MB | false | false | 5 | 30620 | 30620 | |
# +----------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# 업그레이드 후
>> k8s-node1 <<
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | 127.0.0.1:2379 | c6702130d82d740f | 3.5.25 | 6.8 MB | false | false | 5 | 33542 | 33542 | |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# >> k8s-node2 <<
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | 127.0.0.1:2379 | 2106626b12a4099f | 3.5.25 | 6.8 MB | true | false | 5 | 33545 | 33545 | |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# >> k8s-node3 <<
# +----------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
# +----------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | 127.0.0.1:2379 | 8b0ca30665374b0 | 3.5.25 | 6.9 MB | false | false | 5 | 33646 | 33646 | |
# +----------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
watch -d 'ssh k8s-node1 crictl ps ; echo ; ssh k8s-node1 crictl images'
# Every 2.0s: ssh k8s-node1 crictl ps ; echo ; ssh k8s-node1 crictl images admin-lb: Sat Feb 7 20:22:12 2026
# CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
# POD NAMESPACE
# eb65479bd7131 cadcae92e6360 About an hour ago Running kube-flannel 2 f4318937b3c2b
# kube-flannel-ds-arm64-8wlpt kube-system
# 8b20f821f1127 72b57ec14d31e About an hour ago Running kube-proxy 2 874c10dca5288
# kube-proxy-9mbr5 kube-system
# 83ef22aed197b 1d625baf81b59 About an hour ago Running kube-scheduler 3 1ba7f5912ecf7
# kube-scheduler-k8s-node1 kube-system
# a73166d67007d 02ea53851f07d About an hour ago Running kube-apiserver 3 784ab5e66bd26
# kube-apiserver-k8s-node1 kube-system
# d7351dde574d7 f0bcbad5082c9 About an hour ago Running kube-controller-manager 4 3ff909df56629
# kube-controller-manager-k8s-node1 kube-system
# IMAGE TAG IMAGE ID SIZE
# docker.io/flannel/flannel-cni-plugin v1.7.1-flannel1 e5bf9679ea8c3 5.14MB
# docker.io/flannel/flannel v0.27.3 cadcae92e6360 33.1MB
# registry.k8s.io/coredns/coredns v1.11.3 2f6c962e7b831 16.9MB
# registry.k8s.io/kube-apiserver v1.32.9 02ea53851f07d 26.4MB
# registry.k8s.io/kube-controller-manager v1.32.9 f0bcbad5082c9 24.1MB
# registry.k8s.io/kube-proxy v1.32.9 72b57ec14d31e 27.4MB
# registry.k8s.io/kube-scheduler v1.32.9 1d625baf81b59 19.1MB
# registry.k8s.io/metrics-server/metrics-server v0.8.0 bc6c1e09a843d 20.6MB
# registry.k8s.io/pause 3.10 afb61768ce381 268kB
# 업그레이드 후 새로운 이미지틀이 추가되고, controlplane object들이 새로 기동됨
# CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
# POD NAMESPACE
# 55e80f7154570 8b57c1f8bd2dd 7 minutes ago Running kube-proxy 0 d81a72250c978
# kube-proxy-8brj8 kube-system
# 777c8fbf06186 fcf368a1abd0b 7 minutes ago Running kube-scheduler 0 0064a3c3b4aa9
# kube-scheduler-k8s-node1 kube-system
# 687d167e5d3a1 66490a6490dde 7 minutes ago Running kube-controller-manager 0 f73fd4c994e78
# kube-controller-manager-k8s-node1 kube-system
# 259293add3f1d 03aec5fd5841e 8 minutes ago Running kube-apiserver 0 101f736a9006a
# kube-apiserver-k8s-node1 kube-system
# eb65479bd7131 cadcae92e6360 About an hour ago Running kube-flannel 2 f4318937b3c2b
# kube-flannel-ds-arm64-8wlpt kube-system
# IMAGE TAG IMAGE ID SIZE
# docker.io/flannel/flannel-cni-plugin v1.7.1-flannel1 e5bf9679ea8c3 5.14MB
# docker.io/flannel/flannel v0.27.3 cadcae92e6360 33.1MB
# registry.k8s.io/coredns/coredns v1.11.3 2f6c962e7b831 16.9MB
# registry.k8s.io/kube-apiserver v1.32.10 03aec5fd5841e 26.4MB
# registry.k8s.io/kube-apiserver v1.32.9 02ea53851f07d 26.4MB
# registry.k8s.io/kube-controller-manager v1.32.10 66490a6490dde 24.2MB
# registry.k8s.io/kube-controller-manager v1.32.9 f0bcbad5082c9 24.1MB
# registry.k8s.io/kube-proxy v1.32.10 8b57c1f8bd2dd 27.6MB
# registry.k8s.io/kube-proxy v1.32.9 72b57ec14d31e 27.4MB
# registry.k8s.io/kube-scheduler v1.32.10 fcf368a1abd0b 19.2MB
# registry.k8s.io/kube-scheduler v1.32.9 1d625baf81b59 19.1MB
# registry.k8s.io/metrics-server/metrics-server v0.8.0 bc6c1e09a843d 20.6MB
# registry.k8s.io/pause
# ctrl upgrade 14분 소요 : 1.32.9 → 1.32.10
# 이미지 다운 -> ctrl1번 drain -> containderd 업글 -> kubeadm 업글 명령 실행 -> static pod 신규 기동 -> (최초 한번) kube-proxy ds all node 신규 기동 -> 노드 uncordon => ctrl2번 시작...
ansible-playbook -i inventory/mycluster/inventory.ini upgrade-cluster.yml --list-tags
ANSIBLE_FORCE_COLOR=true ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml -e kube_version="1.32.10" --limit "kube_control_plane:etcd" | tee kubespray_upgrade.log
# Saturday 07 February 2026 20:33:58 +0900 (0:00:00.039) 0:13:08.098 *****
# ===============================================================================
# kubernetes/control-plane : Kubeadm | Upgrade first control plane node to 1.32.10 -- 96.79s
# kubernetes/control-plane : Kubeadm | Upgrade other control plane nodes to 1.32.10 -- 78.95s
# kubernetes/control-plane : Kubeadm | Upgrade other control plane nodes to 1.32.10 -- 77.42s
# download : Download_file | Download item ------------------------------- 43.66s
# download : Download_file | Download item ------------------------------- 40.77s
# download : Download_file | Download item ------------------------------- 35.48s
# download : Download_container | Download image if required ------------- 19.92s
# download : Download_container | Download image if required ------------- 16.65s
# network_plugin/flannel : Flannel | Wait for flannel subnet.env file presence -- 15.45s
# download : Download_container | Download image if required ------------- 15.21s
# download : Download_container | Download image if required ------------- 14.16s
# system_packages : Manage packages --------------------------------------- 8.57s
# upgrade/pre-upgrade : Drain node ---------------------------------------- 7.64s
# etcd : Gen_certs | Write etcd member/admin and kube_control_plane client certs to other etcd nodes --- 7.03s
# kubernetes/control-plane : Control plane | wait for the apiserver to be running --- 6.59s
# container-engine/containerd : Containerd | Unpack containerd archive ---- 5.78s
# network_plugin/cni : CNI | Copy cni plugins ----------------------------- 5.76s
# network_plugin/cni : CNI | Copy cni plugins ----------------------------- 5.70s
# container-engine/validate-container-engine : Populate service facts ----- 4.32s
# kubernetes/control-plane : Kubeadm | Check apiserver.crt SAN hosts ------ 4.13s
# 업그레이드 확인
kubectl get node -owide
# NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
# k8s-node1 Ready control-plane 174m v1.32.10 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node2 Ready control-plane 174m v1.32.10 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node3 Ready control-plane 174m v1.32.10 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node4 Ready <none> 174m v1.32.9 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node5 Ready <none> 51m v1.32.9 192.168.10.15 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# apiserver, kcm, scheduler 와 kube-proxy 가 1.32.10 업그레이드!
# coredns, pause, etcd 는 기존 버전 그대로, 영향 없음.
ssh k8s-node1 crictl images
# IMAGE TAG IMAGE ID SIZE
# docker.io/flannel/flannel-cni-plugin v1.7.1-flannel1 e5bf9679ea8c3 5.14MB
# docker.io/flannel/flannel v0.27.3 cadcae92e6360 33.1MB
# registry.k8s.io/coredns/coredns v1.11.3 2f6c962e7b831 16.9MB
# registry.k8s.io/kube-apiserver v1.32.10 03aec5fd5841e 26.4MB
# registry.k8s.io/kube-apiserver v1.32.9 02ea53851f07d 26.4MB
# registry.k8s.io/kube-controller-manager v1.32.10 66490a6490dde 24.2MB
# registry.k8s.io/kube-controller-manager v1.32.9 f0bcbad5082c9 24.1MB
# registry.k8s.io/kube-proxy v1.32.10 8b57c1f8bd2dd 27.6MB
# registry.k8s.io/kube-proxy v1.32.9 72b57ec14d31e 27.4MB
# registry.k8s.io/kube-scheduler v1.32.10 fcf368a1abd0b 19.2MB
# registry.k8s.io/kube-scheduler v1.32.9 1d625baf81b59 19.1MB
# registry.k8s.io/metrics-server/metrics-server v0.8.0 bc6c1e09a843d 20.6MB
# registry.k8s.io/pause 3.10 afb61768ce381 268kB
# etcd 확인 : etcd는 버전 업글이 필요없어서 영향 없음
ssh k8s-node1 systemctl status etcd --no-pager | grep active
# Active: active (running) since Thu 2026-01-29 14:52:07 KST; 6h ago
# Active: active (running) since Sat 2026-02-07 19:17:22 KST; 1h 19min ago
ssh k8s-node1 etcdctl.sh member list -w table
# +------------------+---------+-------+----------------------------+----------------------------+------------+
# | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
# +------------------+---------+-------+----------------------------+----------------------------+------------+
# | 8b0ca30665374b0 | started | etcd3 | https://192.168.10.13:2380 | https://192.168.10.13:2379 | false |
# | 2106626b12a4099f | started | etcd2 | https://192.168.10.12:2380 | https://192.168.10.12:2379 | false |
# | c6702130d82d740f | started | etcd1 | https://192.168.10.11:2380 | https://192.168.10.11:2379 | false |
# +------------------+---------+-------+----------------------------+----------------------------+------------+
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i etcdctl.sh endpoint status -w table; echo; done
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i tree /var/backups; echo; done # etcd 백업 확인
# >> k8s-node1 <<
# /var/backups
# └── etcd-2026-02-07_17:41:16
# ├── member
# │ ├── snap
# │ │ └── db
# │ └── wal
# │ └── 0000000000000000-0000000000000000.wal
# └── snapshot.db
# 5 directories, 3 files
# >> k8s-node2 <<
# /var/backups
# └── etcd-2026-02-07_17:41:17
# ├── member
# │ ├── snap
# │ │ └── db
# │ └── wal
# │ └── 0000000000000000-0000000000000000.wal
# └── snapshot.db
# 5 directories, 3 files
# >> k8s-node3 <<
# /var/backups
# └── etcd-2026-02-07_17:41:16
# ├── member
# │ ├── snap
# │ │ └── db
# │ └── wal
# │ └── 0000000000000000-0000000000000000.wal
# └── snapshot.db
개별 upgrade 4(2+2)분 소요 : 1.32.9 → 1.32.10
# wk 파드 기동 확인
kubectl get pod -A -owide | grep node4
# default webpod-697b545f57-lthj2 1/1 Running 0 114m 10.233.67.5 k8s-node4 <none> <none>
# kube-system coredns-664b99d7c7-k9bpw 1/1 Running 0 175m 10.233.67.2 k8s-node4 <none> <none>
# kube-system kube-flannel-ds-arm64-9p6nf 1/1 Running 0 175m 192.168.10.14 k8s-node4 <none> <none>
# kube-system kube-ops-view-8484bdc5df-5z4gb 1/1 Running 0 112m 10.233.67.6 k8s-node4 <none> <none>
# kube-system kube-proxy-vp9kn 1/1 Running 0 9m48s 192.168.10.14 k8s-node4 <none> <none>
# kube-system metrics-server-65fdf69dcb-k6r7h 1/1 Running 0 175m 10.233.67.3 k8s-node4 <none> <none>
# kube-system nginx-proxy-k8s-node4 1/1 Running 1 175m 192.168.10.14 k8s-node4 <none> <none>
kubectl get pod -A -owide | grep node5
# default webpod-697b545f57-5t9kl 1/1 Running 0 46m 10.233.69.2 k8s-node5 <none> <none>
# kube-system coredns-664b99d7c7-6vmwh 1/1 Running 0 9m44s 10.233.69.4 k8s-node5 <none> <none>
# kube-system kube-flannel-ds-arm64-brr6c 1/1 Running 1 (52m ago) 52m 192.168.10.15 k8s-node5 <none> <none>
# kube-system kube-proxy-xctkr 1/1 Running 0 9m50s 192.168.10.15 k8s-node5 <none> <none>
# kube-system nginx-proxy-k8s-node5 1/1 Running 0 52m 192.168.10.15 k8s-node5 <none> <none>
# (참고) wk 1번에 1대의 노드씩만 업그레이드 시
ansible-playbook upgrade-cluster.yml -b -i inventory/sample/hosts.ini -e kube_version=1.20.7 -e "serial=1"
# wk upgrade 2분 + 2분 소요 : (최초 한번) kube-proxy ds all node 재기동??? 1.32.9 → 1.32.10
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml -e kube_version="1.32.10" --limit "k8s-node5"
# 확인 후 나머지 노드 실행
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml -e kube_version="1.32.10" --limit "k8s-node4"
k8s upgrade : 1.32.10 → 1.33.7
cat roles/kubespray_defaults/vars/main/checksums.yml | grep kubelet -A30
# kubelet_checksums:
# arm64:
# 1.33.7: sha256:3035c44e0d429946d6b4b66c593d371cf5bbbfc85df39d7e2a03c422e4fe404a
# 1.33.6: sha256:7d8b7c63309cfe2da2331a1ae13cce070b9ba01e487099e7881a4281667c131d
# 1.33.5: sha256:c6ad0510c089d49244eede2638b4a4ff125258fd29a0649e7eef05c7f79c737f
# 1.33.4: sha256:623329b1a5f4858e3a5406d3947807b75144f4e71dde11ef1a71362c3a8619cc
# 1.33.3: sha256:3f69bb32debfaf25fce91aa5e7181e1e32f3550f3257b93c17dfb37bed621a9c
# 1.33.2: sha256:0fa15aca9b90fe7aef1ed3aad31edd1d9944a8c7aae34162963a6aaaf726e065
# 1.33.1: sha256:10540261c311ae005b9af514d83c02694e12614406a8524fd2d0bad75296f70d
# 1.33.0: sha256:ae5a4fc6d733fc28ff198e2d80334e21fcb5c34e76b411c50fff9cb25accf05a
# 1.32.10: sha256:21cc3d98550d3a23052d649e77956f2557e7f6119ff1e27dc82b852d006136cd
...
# ctrl upgrade
## 모니터링
[admin-lb]
watch -d kubectl get node
watch -d kubectl get pod -n kube-system -owide
while true; do echo ">> k8s-node1 <<"; ssh k8s-node1 etcdctl.sh endpoint status -w table; echo; echo ">> k8s-node2 <<"; ssh k8s-node2 etcdctl.sh endpoint status -w table; echo ">> k8s-node3 <<"; ssh k8s-node3 etcdctl.sh endpoint status -w table; sleep 1; done
[k8s-node1] watch -d 'crictl ps ; echo ; crictl images'
# ctrl upgrade 14분 소요 : 1.32.10 → 1.33.7
# 이미지 다운 -> coredns,metrics-server 버전업 -> ctrl1번 drain -> containderd 업글 -> kubeadm 업글 명령 실행 (아래 이어서)
# -> static pod 신규 기동 -> (최초 한번) kube-proxy ds all node 신규 기동 => ctrl2번...
ansible-playbook -i inventory/mycluster/inventory.ini upgrade-cluster.yml --list-tags
ANSIBLE_FORCE_COLOR=true ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml -e kube_version="1.33.7" --limit "kube_control_plane:etcd" | tee kubespray_upgrade-2.log
# Saturday 07 February 2026 21:03:51 +0900 (0:00:00.037) 0:14:50.403 *****
# ===============================================================================
# kubernetes/control-plane : Kubeadm | Upgrade first control plane node to 1.33.7 - 130.10s
# kubernetes/control-plane : Kubeadm | Upgrade other control plane nodes to 1.33.7 -- 96.24s
# kubernetes/control-plane : Kubeadm | Upgrade other control plane nodes to 1.33.7 -- 91.53s
# download : Download_file | Download item ------------------------------- 40.08s
# download : Download_file | Download item ------------------------------- 32.96s
# download : Download_file | Download item ------------------------------- 23.28s
# upgrade/pre-upgrade : Drain node --------------------------------------- 20.72s
# kubernetes/control-plane : Create kubeadm token for joining nodes with 24h expiration (default) -- 17.55s
# kubernetes/control-plane : Kubeadm | Check api is up ------------------- 15.87s
# kubernetes/control-plane : Kubeadm | Check api is up ------------------- 15.85s
# network_plugin/flannel : Flannel | Wait for flannel subnet.env file presence -- 15.44s
# download : Download_container | Download image if required ------------- 15.26s
# download : Download_container | Download image if required ------------- 13.09s
# download : Download_container | Download image if required ------------- 11.85s
# download : Download_container | Download image if required ------------- 11.53s
# download : Download_container | Download image if required ------------- 11.02s
# kubernetes/control-plane : Control plane | wait for the apiserver to be running --- 7.99s
# download : Download_file | Download item -------------------------------- 7.39s
# etcd : Gen_certs | Write etcd member/admin and kube_control_plane client certs to other etcd nodes --- 7.07s
# network_plugin/cni : CNI | Copy cni plugins ----------------------------- 5.80s
# 업그레이드 확인
kubectl get node -owide
# NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
# k8s-node1 Ready control-plane 3h22m v1.33.7 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node2 Ready control-plane 3h22m v1.33.7 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node3 Ready control-plane 3h22m v1.33.7 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node4 Ready <none> 3h1m v1.32.10 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node5 Ready <none> 58m v1.32.10 192.168.10.15 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# apiserver, kcm, scheduler 와 kube-proxy 가 1.32.10 + coredns 업그레이드!
# pause, etcd 는 기존 버전 그대로, 영향 없음.
ssh k8s-node1 crictl images
# IMAGE TAG IMAGE ID SIZE
# docker.io/flannel/flannel-cni-plugin v1.7.1-flannel1 e5bf9679ea8c3 5.14MB
# docker.io/flannel/flannel v0.27.3 cadcae92e6360 33.1MB
# registry.k8s.io/coredns/coredns v1.11.3 2f6c962e7b831 16.9MB
# registry.k8s.io/coredns/coredns v1.12.0 f72407be9e08c 19.1MB
# registry.k8s.io/kube-apiserver v1.32.10 03aec5fd5841e 26.4MB
# registry.k8s.io/kube-apiserver v1.32.9 02ea53851f07d 26.4MB
# registry.k8s.io/kube-apiserver v1.33.7 6d7bc8e445519 27.4MB
# registry.k8s.io/kube-controller-manager v1.32.10 66490a6490dde 24.2MB
# registry.k8s.io/kube-controller-manager v1.32.9 f0bcbad5082c9 24.1MB
# registry.k8s.io/kube-controller-manager v1.33.7 a94595d0240bc 25.1MB
# registry.k8s.io/kube-proxy v1.32.10 8b57c1f8bd2dd 27.6MB
# registry.k8s.io/kube-proxy v1.32.9 72b57ec14d31e 27.4MB
# registry.k8s.io/kube-proxy v1.33.7 78ccb937011a5 28.3MB
# registry.k8s.io/kube-scheduler v1.32.10 fcf368a1abd0b 19.2MB
# registry.k8s.io/kube-scheduler v1.32.9 1d625baf81b59 19.1MB
# registry.k8s.io/kube-scheduler v1.33.7 94005b6be50f0 19.9MB
# registry.k8s.io/metrics-server/metrics-server v0.8.0 bc6c1e09a843d 20.6MB
# registry.k8s.io/pause 3.10 afb61768ce381 268kB
# etcd 확인 : etcd는 버전 업글이 필요없어서 영향 없음
ssh k8s-node1 systemctl status etcd --no-pager | grep active
# Active: active (running) since Sat 2026-02-07 19:17:22 KST; 1h 26min ago
ssh k8s-node1 etcdctl.sh member list -w table
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i etcdctl.sh endpoint status -w table; echo; done
>> k8s-node1 <<
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 127.0.0.1:2379 | c6702130d82d740f | 3.5.25 | 8.2 MB | false | false | 5 | 40920 | 40920 | |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
>> k8s-node2 <<
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 127.0.0.1:2379 | 2106626b12a4099f | 3.5.25 | 8.3 MB | true | false | 5 | 40922 | 40922 | |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
>> k8s-node3 <<
+----------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 127.0.0.1:2379 | 8b0ca30665374b0 | 3.5.25 | 8.3 MB | false | false | 5 | 40925 | 40925 | |
+----------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i tree /var/backups; echo; done # etcd 백업 확인
# >> k8s-node1 <<
# /var/backups
# └── etcd-2026-02-07_17:41:16
# ├── member
# │ ├── snap
# │ │ └── db
# │ └── wal
# │ └── 0000000000000000-0000000000000000.wal
# └── snapshot.db
# 5 directories, 3 files
# >> k8s-node2 <<
# /var/backups
# └── etcd-2026-02-07_17:41:17
# ├── member
# │ ├── snap
# │ │ └── db
# │ └── wal
# │ └── 0000000000000000-0000000000000000.wal
# └── snapshot.db
# 5 directories, 3 files
# >> k8s-node3 <<
# /var/backups
# └── etcd-2026-02-07_17:41:16
# ├── member
# │ ├── snap
# │ │ └── db
# │ └── wal
# │ └── 0000000000000000-0000000000000000.wal
# └── snapshot.db
# 5 directories, 3 files
# wk upgrade 4분 소요 : 1.32.10 → 1.33.7
# wk 파드 기동 확인
kubectl get pod -A -owide | grep node4
# default webpod-697b545f57-8c2jj 1/1 Running 0 17m 10.233.67.12 k8s-node4 <none> <none>
# default webpod-697b545f57-sfr85 1/1 Running 0 17m 10.233.67.9 k8s-node4 <none> <none>
# kube-system coredns-5d784884df-v5j9c 1/1 Running 0 5m9s 10.233.67.14 k8s-node4 <none> <none>
# kube-system kube-flannel-ds-arm64-9p6nf 1/1 Running 0 3h22m 192.168.10.14 k8s-node4 <none> <none>
# kube-system kube-ops-view-8484bdc5df-d4wzp 1/1 Running 0 17m 10.233.67.10 k8s-node4 <none> <none>
# kube-system kube-proxy-8q4l8 1/1 Running 0 8m15s 192.168.10.14 k8s-node4 <none> <none>
# kube-system metrics-server-65fdf69dcb-xn2zg 1/1 Running 0 17m 10.233.67.11 k8s-node4 <none> <none>
# kube-system nginx-proxy-k8s-node4 1/1 Running 0 17m 192.168.10.14 k8s-node4 <none> <none>
kubectl get pod -A -owide | grep node5
kubectl get pod -A -owide | grep node5
kube-system coredns-5d784884df-wr7vn 1/1 Running 0 8m19s 10.233.69.12 k8s-node5 <none> <none>
kube-system kube-flannel-ds-arm64-brr6c 1/1 Running 1 (79m ago) 80m 192.168.10.15 k8s-node5 <none> <none>
kube-system kube-proxy-6wdft 1/1 Running 0 8m36s 192.168.10.15 k8s-node5 <none> <none>
kube-system nginx-proxy-k8s-node5 1/1 Running 0 16m 192.168.10.15 k8s-node5 <none> <none>
# wk upgrade : (최초 한번) kube-proxy ds all node 신규 기동
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml -e kube_version="1.33.7" --limit "kube_node"
# 확인
kubectl get node -owide
kubectl get node -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-node1 Ready control-plane 3h23m v1.33.7 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node2 Ready control-plane 3h23m v1.33.7 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node3 Ready control-plane 3h23m v1.33.7 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
k8s-node4 Ready <none> 3h23m v1.33.7 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# admin 용 kubectl 및 kubeconfig 업데이트
# kubectl 버전 정보 확인
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i kubectl version; echo; done
kubectl version
# >> k8s-node1 <<
# Client Version: v1.33.7
# Kustomize Version: v5.6.0
# Server Version: v1.33.7
# >> k8s-node2 <<
# Client Version: v1.33.7
# Kustomize Version: v5.6.0
# Server Version: v1.33.7
# >> k8s-node3 <<
# Client Version: v1.33.7
# Kustomize Version: v5.6.0
# Server Version: v1.33.7
# kubectl 업데이트
cat << EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.33/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.33/rpm/repodata/repomd.xml.key
exclude=kubectl
EOF
dnf install -y -q kubectl --disableexcludes=kubernetes
# kubectl 버전 정보 확인
kubectl version
# Client Version: v1.33.7
# Kustomize Version: v5.6.0
# Server Version: v1.33.7
# admin 용 kubeconfig 업데이트
scp k8s-node1:/root/.kube/config /root/.kube/
cat /root/.kube/config | grep server
sed -i 's/127.0.0.1/192.168.10.10/g' /root/.kube/config
kubespary 업그레이드 수행 `15분 + 4분 소요` : 1.33.7 → 1.34.3
# 현재 정보 확인 : etcd 3.5.25
for i in {1..3}; do echo ">> k8s-node$i <<"; ssh k8s-node$i etcdctl.sh endpoint status -w table; echo; done
# >> k8s-node1 <<
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | 127.0.0.1:2379 | c6702130d82d740f | 3.5.25 | 8.6 MB | false | false | 5 | 41272 | 41272 | |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# >> k8s-node2 <<
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | 127.0.0.1:2379 | 2106626b12a4099f | 3.5.25 | 8.6 MB | true | false | 5 | 41272 | 41272 | |
# +----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# >> k8s-node3 <<
# +----------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
# +----------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# | 127.0.0.1:2379 | 8b0ca30665374b0 | 3.5.25 | 8.6 MB | false | false | 5 | 41273 | 41273 | |
# +----------------+-----------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
# containerd 2.1.5
kubectl get node -owide
# NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
# k8s-node1 Ready control-plane 3h25m v1.33.7 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node2 Ready control-plane 3h25m v1.33.7 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node3 Ready control-plane 3h25m v1.33.7 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node4 Ready <none> 3h24m v1.33.7 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node5 Ready <none> 81m v1.33.7 192.168.10.15 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# 지원 버전 정보 확인
git --no-pager tag
git describe --tags
# v2.29.1
cat roles/kubespray_defaults/vars/main/checksums.yml | grep -i kube -A40
# kubelet_checksums:
# arm64:
# 1.33.7: sha256:3035c44e0d429946d6b4b66c593d371cf5bbbfc85df39d7e2a03c422e4fe404a
# 1.33.6: sha256:7d8b7c63309cfe2da2331a1ae13cce070b9ba01e487099e7881a4281667c131d
# 1.33.5: sha256:c6ad0510c089d49244eede2638b4a4ff125258fd29a0649e7eef05c7f79c737f
# ...
cat /root/kubespray/requirements.txt | grep -v "^#"
# ansible==10.7.0
# cryptography==46.0.2
# jmespath==1.0.1
# netaddr==1.3.0
git checkout v2.30.0
# Previous HEAD position was 0c6a29553 Patch versions updates (#12782)
# HEAD is now at f4ccdb5e7 Docs: update 2.29.0 to 2.30.0 (#12899)
git describe --tags
cat roles/kubespray_defaults/vars/main/checksums.yml | grep -i kube -A40
# kubelet_checksums:
# arm64:
# 1.34.3: sha256:765b740e3ad9c590852652a2623424ec60e2dddce2c6280d7f042f56c8c98619
# 1.34.2: sha256:3e31b1bee9ab32264a67af8a19679777cd372b1c3a04b5d7621289cf137b357c
# 1.34.1: sha256:6a66bc08d6c637fcea50c19063cf49e708fde1630a7f1d4ceca069a45a87e6f1
# 1.34.0: sha256:e45a7795391cd62ee226666039153832d3096c0f892266cd968936e18b2b40b0
# 1.33.7: sha256:3035c44e0d429946d6b4b66c593d371cf5bbbfc85df39d7e2a03c422e4fe404a
# ...
# (옵션) python venv 가상환경 : 프로젝트마다 필요한 라이브러리 버전이 다를 때 발생할 수 있는 '버전 충돌' 방지 방안 : Python 패키지를 완전히 분리
python -m venv venv
tree venv -L 2
venv
# ├── bin
# │ ├── activate
# │ ├── activate.csh
# │ ├── activate.fish
# │ ├── Activate.ps1
# │ ├── pip
# │ ├── pip3
# │ ├── pip3.12
# │ ├── python -> /usr/bin/python
# │ ├── python3 -> python
# │ └── python3.12 -> python
# ├── include
# │ └── python3.12
# ├── lib
# │ └── python3.12
# ├── lib64 -> lib
# └── pyvenv.cfg
source venv/bin/activate # cat venv/bin/activate
# Upgrade Python Dependencies
cat /root/kubespray/requirements.txt | grep -v "^#"
# ansible==10.7.0
# cryptography==46.0.3
# jmespath==1.1.0
# netaddr==1.3.0
pip3 install -r /root/kubespray/requirements.txt
pip list | grep -E 'cryptography|jmespath'
# cryptography 46.0.3
# jmespath 1.1.0
# upgrade ct 15분: etcd 버전 업글 후 재시작 포함
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml -e kube_version="1.34.3" --limit "kube_control_plane:etcd"
# Saturday 07 February 2026 21:25:16 +0900 (0:00:00.040) 0:16:18.243 *****
# ===============================================================================
# kubernetes/control-plane : Kubeadm | Upgrade other control plane nodes to 1.34.3 ----------- 131.72s
# kubernetes/control-plane : Kubeadm | Upgrade other control plane nodes to 1.34.3 ----------- 117.16s
# kubernetes/control-plane : Kubeadm | Upgrade first control plane node to 1.34.3 ------------- 90.97s
# etcd : Restart etcd ------------------------------------------------------------------------- 50.93s
# download : Download_file | Download item ---------------------------------------------------- 42.80s
# download : Download_file | Download item ---------------------------------------------------- 23.40s
# download : Download_file | Download item ---------------------------------------------------- 22.59s
# kubernetes/control-plane : Kubeadm | Check api is up ---------------------------------------- 21.09s
# kubernetes/control-plane : Create kubeadm token for joining nodes with 24h expiration (default) -- 16.80s
# kubernetes/control-plane : Kubeadm | Check api is up ---------------------------------------- 15.86s
# network_plugin/flannel : Flannel | Wait for flannel subnet.env file presence ---------------- 15.45s
# download : Download_container | Download image if required ---------------------------------- 14.46s
# download : Download_file | Download item ---------------------------------------------------- 14.13s
# upgrade/pre-upgrade : Drain node ------------------------------------------------------------ 13.68s
# download : Download_container | Download image if required ---------------------------------- 11.20s
# download : Download_container | Download image if required ---------------------------------- 10.16s
# download : Download_container | Download image if required ----------------------------------- 8.63s
# download : Download_file | Download item ----------------------------------------------------- 8.13s
# container-engine/containerd : Containerd | Unpack containerd archive ------------------------- 7.94s
# download : Download_container | Download image if required ----------------------------------- 7.87s
ssh k8s-node1 tree /var/backups
# /var/backups
# ├── etcd-2026-02-07_17:41:16
# │ ├── member
# │ │ ├── snap
# │ │ │ └── db
# │ │ └── wal
# │ │ └── 0000000000000000-0000000000000000.wal
# │ └── snapshot.db
# └── etcd-2026-02-07_21:13:32
# ├── member
# │ ├── snap
# │ │ └── db
# │ └── wal
# │ └── 0000000000000000-0000000000000000.wal
# └── snapshot.db
ssh k8s-node1 tree /tmp/releases
# /tmp/releases
# ├── cni-plugins-linux-arm64-1.8.0.tgz
# ├── containerd-2.1.5-linux-arm64.tar.gz
# containerd 2.2.1 업글 확인
kubectl get node -owide
# NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
# k8s-node1 Ready control-plane 3h44m v1.34.3 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.2.1
# k8s-node2 Ready control-plane 3h44m v1.34.3 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.2.1
# k8s-node3 Ready control-plane 3h44m v1.34.3 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.2.1
# k8s-node4 Ready <none> 3h43m v1.33.7 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# k8s-node5 Ready <none> 101m v1.33.7 192.168.10.15 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.1.5
# upgrade wk 4분
ansible-playbook -i inventory/mycluster/inventory.ini -v upgrade-cluster.yml -e kube_version="1.34.3" --limit "kube_node"
# Saturday 07 February 2026 21:30:29 +0900 (0:00:00.030) 0:03:59.905 *****
# ===============================================================================
# upgrade/pre-upgrade : Drain node ------------------------------------------------------------ 22.60s
# download : Download_file | Download item ---------------------------------------------------- 19.42s
# download : Download_file | Download item ---------------------------------------------------- 15.26s
# network_plugin/flannel : Flannel | Wait for flannel subnet.env file presence ---------------- 10.36s
# download : Download_file | Download item ----------------------------------------------------- 8.24s
# container-engine/containerd : Containerd | Unpack containerd archive ------------------------- 5.24s
# system_packages : Manage packages ------------------------------------------------------------ 5.14s
# download : Download_file | Download item ----------------------------------------------------- 4.95s
# network_plugin/cni : CNI | Copy cni plugins -------------------------------------------------- 3.94s
# network_plugin/cni : CNI | Copy cni plugins -------------------------------------------------- 3.83s
# download : Download_container | Download image if required ----------------------------------- 3.17s
# download : Download_file | Download item ----------------------------------------------------- 2.90s
# container-engine/validate-container-engine : Populate service facts -------------------------- 2.67s
# container-engine/containerd : Download_file | Download item ---------------------------------- 2.11s
# container-engine/crictl : Download_file | Download item -------------------------------------- 2.09s
# container-engine/crictl : Download_file | Download item -------------------------------------- 2.08s
# container-engine/runc : Download_file | Download item ---------------------------------------- 2.08s
# container-engine/containerd : Download_file | Download item ---------------------------------- 2.08s
# container-engine/runc : Download_file | Download item ---------------------------------------- 2.08s
# container-engine/nerdctl : Download_file | Download item ------------------------------------- 2.08s
# kubectl 버전 정보 확인 : 이전에 v1.32 -> v1.33 미실행헀을 경우 아래 처럼 WARNING 출력
kubectl version
# Client Version: v1.33.7
# Kustomize Version: v5.6.0
# Server Version: v1.34.3
# # kubectl 버전 업그레이드 설치
cat << EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.34/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.34/rpm/repodata/repomd.xml.key
exclude=kubectl
EOF
dnf install -y -q kubectl --disableexcludes=kubernetes
# Upgraded:
# kubectl-1.34.3-150500.1.1.aarch64
kubectl version
# Client Version: v1.34.3
# Kustomize Version: v5.7.1
# Server Version: v1.34.3
# admin 용 kubeconfig 업데이트
scp k8s-node1:/root/.kube/config /root/.kube/
# config 100% 5649 5.8MB/s 00:00
cat /root/.kube/config | grep server
# server: https://127.0.0.1:6443
sed -i 's/127.0.0.1/192.168.10.10/g' /root/.kube/config
kubectl get node -owide
# NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
# k8s-node1 Ready control-plane 3h51m v1.34.3 192.168.10.11 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.2.1
# k8s-node2 Ready control-plane 3h50m v1.34.3 192.168.10.12 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.2.1
# k8s-node3 Ready control-plane 3h50m v1.34.3 192.168.10.13 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.2.1
# k8s-node4 Ready <none> 3h50m v1.34.3 192.168.10.14 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.2.1
# k8s-node5 Ready <none> 107m v1.34.3 192.168.10.15 <none> Rocky Linux 10.0 (Red Quartz) 6.12.0-55.39.1.el10_0.aarch64 containerd://2.2.1
# Upgrade Helm https://helm.sh/ko/docs/v3/topics/version_skew , https://github.com/helm/helm/tags
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | DESIRED_VERSION=v3.20.0 bash # or v3.19.5
helm version
# version.BuildInfo{Version:"v3.20.0", GitCommit:"b2e4314fa0f229a1de7b4c981273f61d69ee5a59", GitTreeState:"clean", GoVersion:"go1.25.6"}
마치며
이번 5주차에서는 Kubespray로 고가용성(HA) Kubernetes 클러스터를 구성하고, 앞단 L4 LB(HAProxy)와 워커의 Client-Side LB(nginx)를 통해 API 서버에 접근하는 구조를 실습했습니다. API 엔드포인트 관점에서 Case1(Client-Side LB)과 Case2(External LB) 동작을 확인하고, 컨트롤 플레인 장애 시 영향도도 살펴보았습니다.
노드 관리에서는 scale.yml로 워커 노드를 추가·삭제하는 방법, 비정상 노드 강제 제거(remove-node.yml), 그리고 reset.yml로 클러스터 전체를 초기화하는 방법을 다뤘습니다. 업그레이드에서는 upgrade-cluster.yml과 serial·pause 옵션을 이용해 패치·마이너 버전을 단계적으로 올리는 과정을 정리했습니다. 모니터링은 NFS provisioner, kube-prometheus-stack, Grafana, etcd 메트릭 설정까지 실습했습니다.
4주차에서 Kubespray로 클러스터를 처음 띄웠다면, 5주차에서는 HA 구성과 일상적인 운영(스케일링·업그레이드·모니터링)까지 경험할 수 있었습니다. 실무에서는 초기 구축 후에도 노드 추가·업그레이드·장애 대응이 필요하므로, 이번에 다룬 playbook과 변수·태그 활용을 잘 익혀 두면 도움이 될 것입니다.
감사합니다.
'클라우드 컴퓨팅 & NoSQL > [K8S Deploy] K8S 디플로이 스터디' 카테고리의 다른 글
| [6주차 - K8S Deploy] Kubespray-offline [Air-gap 환경] (26.02.08) (0) | 2026.02.15 |
|---|---|
| [4주차 - K8S Deploy] kubespray 배포 분석 (26.01.25) (0) | 2026.01.31 |
| [3주차 - K8S Deploy] Kubeadm & K8S Upgrade 2/2 (26.01.23) (0) | 2026.01.23 |
| [3주차 - K8S Deploy] Kubeadm & K8S Upgrade 1/2 (26.01.23) (0) | 2026.01.23 |
| [2주차 - K8S Deploy] Ansible 기초 (26.01.11) (1) | 2026.01.15 |