摘要:二總結(jié)使用的和的,能夠很好的支持這樣的有狀態(tài)服務(wù)部署到集群上。部署方式有待優(yōu)化本次試驗(yàn)中使用靜態(tài)方式部署集群,如果節(jié)點(diǎn)變遷時(shí),需要執(zhí)行等命令手動(dòng)配置集群,嚴(yán)重限制了集群自動(dòng)故障恢復(fù)擴(kuò)容縮容的能力。
kubernetes通過(guò)statefulset為zookeeper、etcd等這類有狀態(tài)的應(yīng)用程序提供完善支持,statefulset具備以下特性:
為pod提供穩(wěn)定的唯一的網(wǎng)絡(luò)標(biāo)識(shí)
穩(wěn)定值持久化存儲(chǔ):通過(guò)pv/pvc來(lái)實(shí)現(xiàn)
啟動(dòng)和停止pod保證有序:優(yōu)雅的部署和伸縮性
本文闡述了如何在k8s集群上部署zookeeper和etcd有狀態(tài)服務(wù),并結(jié)合ceph實(shí)現(xiàn)數(shù)據(jù)持久化。
二. 總結(jié)使用k8s的statefulset、storageclass、pv、pvc和ceph的rbd,能夠很好的支持zookeeper、etcd這樣的有狀態(tài)服務(wù)部署到kubernetes集群上。
k8s不會(huì)主動(dòng)刪除已經(jīng)創(chuàng)建的pv、pvc對(duì)象,防止出現(xiàn)誤刪。
如果用戶確定刪除pv、pvc對(duì)象,同時(shí)還需要手動(dòng)刪除ceph段的rbd鏡像。
遇到的坑
storageclass中引用的ceph客戶端用戶,必須要有mon rw,rbd rwx權(quán)限。如果沒(méi)有mon write權(quán)限,會(huì)導(dǎo)致釋放rbd鎖失敗,無(wú)法將rbd鏡像掛載到其他的k8s worker節(jié)點(diǎn)。
zookeeper使用探針檢查zookeeper節(jié)點(diǎn)的健康狀態(tài),如果節(jié)點(diǎn)不健康,k8s將刪除pod,并自動(dòng)重建該pod,達(dá)到自動(dòng)重啟zookeeper節(jié)點(diǎn)的目的。
因zookeeper 3.4版本的集群配置,是通過(guò)靜態(tài)加載文件zoo.cfg來(lái)實(shí)現(xiàn)的,所以當(dāng)zookeeper節(jié)點(diǎn)pod ip變動(dòng)后,需要重啟zookeeper集群中的所有節(jié)點(diǎn)。
etcd部署方式有待優(yōu)化
本次試驗(yàn)中使用靜態(tài)方式部署etcd集群,如果etcd節(jié)點(diǎn)變遷時(shí),需要執(zhí)行etcdctl member remove/add等命令手動(dòng)配置etcd集群,嚴(yán)重限制了etcd集群自動(dòng)故障恢復(fù)、擴(kuò)容縮容的能力。因此,需要考慮對(duì)部署方式優(yōu)化,改為使用DNS或者etcd descovery的動(dòng)態(tài)方式部署etcd,才能讓etcd更好的運(yùn)行在k8s上。
三. zookeeper集群部署 1. 下載鏡像docker pull gcr.mirrors.ustc.edu.cn/google_containers/kubernetes-zookeeper:1.0-3.4.10 docker tag gcr.mirrors.ustc.edu.cn/google_containers/kubernetes-zookeeper:1.0-3.4.10 172.16.18.100:5000/gcr.io/google_containers/kubernetes-zookeeper:1.0-3.4.10 docker push 172.16.18.100:5000/gcr.io/google_containers/kubernetes-zookeeper:1.0-3.4.102. 定義ceph secret
cat << EOF | kubectl create -f - apiVersion: v1 data: key: QVFBYy9ndGFRUno4QlJBQXMxTjR3WnlqN29PK3VrMzI1a05aZ3c9PQo= kind: Secret metadata: creationTimestamp: 2017-11-20T10:29:05Z name: ceph-secret namespace: default resourceVersion: "2954730" selfLink: /api/v1/namespaces/default/secrets/ceph-secret uid: a288ff74-cffffd-11e7-81cc-000c29f99475 type: kubernetes.io/rbd EOF3. 定義storageclass rbd存儲(chǔ)
cat << EOF | kubectl create -f - apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: ceph parameters: adminId: admin adminSecretName: ceph-secret adminSecretNamespace: default fsType: ext4 imageFormat: "2" imagefeatures: layering monitors: 172.16.13.223 pool: k8s userId: admin userSecretName: ceph-secret provisioner: kubernetes.io/rbd reclaimPolicy: Delete EOF4. 創(chuàng)建zookeeper集群
使用rbd存儲(chǔ)zookeeper節(jié)點(diǎn)數(shù)據(jù)
cat << EOF | kubectl create -f - --- apiVersion: v1 kind: Service metadata: name: zk-hs labels: app: zk spec: ports: - port: 2888 name: server - port: 3888 name: leader-election clusterIP: None selector: app: zk --- apiVersion: v1 kind: Service metadata: name: zk-cs labels: app: zk spec: ports: - port: 2181 name: client selector: app: zk --- apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: zk-pdb spec: selector: matchLabels: app: zk maxUnavailable: 1 --- apiVersion: apps/v1beta2 # for versions before 1.8.0 use apps/v1beta1 kind: StatefulSet metadata: name: zk spec: selector: matchLabels: app: zk serviceName: zk-hs replicas: 3 updateStrategy: type: RollingUpdate podManagementPolicy: Parallel template: metadata: labels: app: zk spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: "app" operator: In values: - zk topologyKey: "kubernetes.io/hostname" containers: - name: kubernetes-zookeeper imagePullPolicy: Always image: "172.16.18.100:5000/gcr.io/google_containers/kubernetes-zookeeper:1.0-3.4.10" ports: - containerPort: 2181 name: client - containerPort: 2888 name: server - containerPort: 3888 name: leader-election command: - sh - -c - "start-zookeeper --servers=3 --data_dir=/var/lib/zookeeper/data --data_log_dir=/var/lib/zookeeper/data/log --conf_dir=/opt/zookeeper/conf --client_port=2181 --election_port=3888 --server_port=2888 --tick_time=2000 --init_limit=10 --sync_limit=5 --heap=512M --max_client_cnxns=60 --snap_retain_count=3 --purge_interval=12 --max_session_timeout=40000 --min_session_timeout=4000 --log_level=INFO" readinessProbe: exec: command: - sh - -c - "zookeeper-ready 2181" initialDelaySeconds: 10 timeoutSeconds: 5 livenessProbe: exec: command: - sh - -c - "zookeeper-ready 2181" initialDelaySeconds: 10 timeoutSeconds: 5 volumeMounts: - name: datadir mountPath: /var/lib/zookeeper securityContext: runAsUser: 1000 fsGroup: 1000 volumeClaimTemplates: - metadata: name: datadir annotations: volume.beta.kubernetes.io/storage-class: ceph spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 1Gi EOF
查看創(chuàng)建結(jié)果
[root@172 zookeeper]# kubectl get no NAME STATUS ROLES AGE VERSION 172.16.20.10 Ready50m v1.8.2 172.16.20.11 Ready 2h v1.8.2 172.16.20.12 Ready 1h v1.8.2 [root@172 zookeeper]# kubectl get po -owide NAME READY STATUS RESTARTS AGE IP NODE zk-0 1/1 Running 0 8m 192.168.5.162 172.16.20.10 zk-1 1/1 Running 0 1h 192.168.2.146 172.16.20.11 [root@172 zookeeper]# kubectl get pv,pvc NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pv/pvc-226cb8f0-d322-11e7-9581-000c29f99475 1Gi RWO Delete Bound default/datadir-zk-0 ceph 1h pv/pvc-22703ece-d322-11e7-9581-000c29f99475 1Gi RWO Delete Bound default/datadir-zk-1 ceph 1h NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE pvc/datadir-zk-0 Bound pvc-226cb8f0-d322-11e7-9581-000c29f99475 1Gi RWO ceph 1h pvc/datadir-zk-1 Bound pvc-22703ece-d322-11e7-9581-000c29f99475 1Gi RWO ceph 1h
zk-0 pod的rbd的鎖信息為
[root@ceph1 ceph]# rbd lock list kubernetes-dynamic-pvc-227b45e5-d322-11e7-90ab-000c29f99475 -p k8s --user admin There is 1 exclusive lock on this image. Locker ID Address client.24146 kubelet_lock_magic_172.16.20.10 172.16.20.10:0/16061523505. 測(cè)試pod遷移
嘗試將172.16.20.10節(jié)點(diǎn)設(shè)置為污點(diǎn),讓zk-0 pod自動(dòng)遷移到172.16.20.12
kubectl cordon 172.16.20.10 [root@172 zookeeper]# kubectl get no NAME STATUS ROLES AGE VERSION 172.16.20.10 Ready,SchedulingDisabled58m v1.8.2 172.16.20.11 Ready 2h v1.8.2 172.16.20.12 Ready 1h v1.8.2 kubectl delete po zk-0
觀察zk-0的遷移過(guò)程
[root@172 zookeeper]# kubectl get po -owide -w NAME READY STATUS RESTARTS AGE IP NODE zk-0 1/1 Running 0 14m 192.168.5.162 172.16.20.10 zk-1 1/1 Running 0 1h 192.168.2.146 172.16.20.11 zk-0 1/1 Terminating 0 16m 192.168.5.162 172.16.20.10 zk-0 0/1 Terminating 0 16m172.16.20.10 zk-0 0/1 Terminating 0 16m 172.16.20.10 zk-0 0/1 Terminating 0 16m 172.16.20.10 zk-0 0/1 Terminating 0 16m 172.16.20.10 zk-0 0/1 Terminating 0 16m 172.16.20.10 zk-0 0/1 Pending 0 0s zk-0 0/1 Pending 0 0s 172.16.20.12 zk-0 0/1 ContainerCreating 0 0s 172.16.20.12 zk-0 0/1 Running 0 3s 192.168.3.4 172.16.20.12
此時(shí)zk-0正常遷移到172.16.20.12
再查看rbd的鎖定信息
[root@ceph1 ceph]# rbd lock list kubernetes-dynamic-pvc-227b45e5-d322-11e7-90ab-000c29f99475 -p k8s --user admin There is 1 exclusive lock on this image. Locker ID Address client.24146 kubelet_lock_magic_172.16.20.10 172.16.20.10:0/1606152350 [root@ceph1 ceph]# rbd lock list kubernetes-dynamic-pvc-227b45e5-d322-11e7-90ab-000c29f99475 -p k8s --user admin There is 1 exclusive lock on this image. Locker ID Address client.24154 kubelet_lock_magic_172.16.20.12 172.16.20.12:0/3715989358
之前在另外一個(gè)ceph集群測(cè)試這個(gè)zk pod遷移的時(shí)候,總是報(bào)錯(cuò)無(wú)法釋放lock,經(jīng)分析應(yīng)該是使用的ceph賬號(hào)沒(méi)有相應(yīng)的權(quán)限,所以導(dǎo)致釋放lock失敗。記錄的報(bào)錯(cuò)信息如下:
Nov 27 10:45:55 172 kubelet: W1127 10:45:55.551768 11556 rbd_util.go:471] rbd: no watchers on kubernetes-dynamic-pvc-f35a411e-d317-11e7-90ab-000c29f99475 Nov 27 10:45:55 172 kubelet: I1127 10:45:55.694126 11556 rbd_util.go:181] remove orphaned locker kubelet_lock_magic_172.16.20.12 from client client.171490: err exit status 13, output: 2017-11-27 10:45:55.570483 7fbdbe922d40 -1 did not load config file, using default settings. Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.600816 7fbdbe922d40 -1 Errors while parsing config file! Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.600824 7fbdbe922d40 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.600825 7fbdbe922d40 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.600825 7fbdbe922d40 -1 parse_file: cannot open ceph.conf: (2) No such file or directory Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.602492 7fbdbe922d40 -1 Errors while parsing config file! Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.602494 7fbdbe922d40 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.602495 7fbdbe922d40 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.602496 7fbdbe922d40 -1 parse_file: cannot open ceph.conf: (2) No such file or directory Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.651594 7fbdbe922d40 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.k8s.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory Nov 27 10:45:55 172 kubelet: rbd: releasing lock failed: (13) Permission denied Nov 27 10:45:55 172 kubelet: 2017-11-27 10:45:55.682470 7fbdbe922d40 -1 librbd: unable to blacklist client: (13) Permission denied
k8s rbd volume的實(shí)現(xiàn)代碼:
if lock { // check if lock is already held for this host by matching lock_id and rbd lock id if strings.Contains(output, lock_id) { // this host already holds the lock, exit glog.V(1).Infof("rbd: lock already held for %s", lock_id) return nil } // clean up orphaned lock if no watcher on the image used, statusErr := util.rbdStatus(&b) if statusErr == nil && !used { re := regexp.MustCompile("client.* " + kubeLockMagic + ".*") locks := re.FindAllStringSubmatch(output, -1) for _, v := range locks { if len(v) > 0 { lockInfo := strings.Split(v[0], " ") if len(lockInfo) > 2 { args := []string{"lock", "remove", b.Image, lockInfo[1], lockInfo[0], "--pool", b.Pool, "--id", b.Id, "-m", mon} args = append(args, secret_opt...) cmd, err = b.exec.Run("rbd", args...) # 執(zhí)行rbd lock remove命令時(shí)返回了錯(cuò)誤信息 glog.Infof("remove orphaned locker %s from client %s: err %v, output: %s", lockInfo[1], lockInfo[0], err, string(cmd)) } } } } // hold a lock: rbd lock add args := []string{"lock", "add", b.Image, lock_id, "--pool", b.Pool, "--id", b.Id, "-m", mon} args = append(args, secret_opt...) cmd, err = b.exec.Run("rbd", args...) }
可以看到,rbd lock remove操作被拒絕了,原因是沒(méi)有權(quán)限rbd: releasing lock failed: (13) Permission denied。
6. 測(cè)試擴(kuò)容zookeeper集群節(jié)點(diǎn)數(shù)從2個(gè)擴(kuò)為3個(gè)。
集群節(jié)點(diǎn)數(shù)為2時(shí),zoo.cfg的配置中定義了兩個(gè)實(shí)例
zookeeper@zk-0:/opt/zookeeper/conf$ cat zoo.cfg #This file was autogenerated DO NOT EDIT clientPort=2181 dataDir=/var/lib/zookeeper/data dataLogDir=/var/lib/zookeeper/data/log tickTime=2000 initLimit=10 syncLimit=5 maxClientCnxns=60 minSessionTimeout=4000 maxSessionTimeout=40000 autopurge.snapRetainCount=3 autopurge.purgeInteval=12 server.1=zk-0.zk-hs.default.svc.cluster.local:2888:3888 server.2=zk-1.zk-hs.default.svc.cluster.local:2888:3888
使用kubectl edit statefulset zk命令修改replicas=3,start-zookeeper --servers=3,
此時(shí)觀察pod的變化
[root@172 zookeeper]# kubectl get po -owide -w NAME READY STATUS RESTARTS AGE IP NODE zk-0 1/1 Running 0 1h 192.168.5.170 172.16.20.10 zk-1 1/1 Running 0 1h 192.168.3.12 172.16.20.12 zk-2 0/1 Pending 0 0szk-2 0/1 Pending 0 0s 172.16.20.11 zk-2 0/1 ContainerCreating 0 0s 172.16.20.11 zk-2 0/1 Running 0 1s 192.168.2.154 172.16.20.11 zk-2 1/1 Running 0 11s 192.168.2.154 172.16.20.11 zk-1 1/1 Terminating 0 1h 192.168.3.12 172.16.20.12 zk-1 0/1 Terminating 0 1h 172.16.20.12 zk-1 0/1 Terminating 0 1h 172.16.20.12 zk-1 0/1 Terminating 0 1h 172.16.20.12 zk-1 0/1 Terminating 0 1h 172.16.20.12 zk-1 0/1 Pending 0 0s zk-1 0/1 Pending 0 0s 172.16.20.12 zk-1 0/1 ContainerCreating 0 0s 172.16.20.12 zk-1 0/1 Running 0 2s 192.168.3.13 172.16.20.12 zk-1 1/1 Running 0 20s 192.168.3.13 172.16.20.12 zk-0 1/1 Terminating 0 1h 192.168.5.170 172.16.20.10 zk-0 0/1 Terminating 0 1h 172.16.20.10 zk-0 0/1 Terminating 0 1h 172.16.20.10 zk-0 0/1 Terminating 0 1h 172.16.20.10 zk-0 0/1 Terminating 0 1h 172.16.20.10 zk-0 0/1 Pending 0 0s zk-0 0/1 Pending 0 0s 172.16.20.10 zk-0 0/1 ContainerCreating 0 0s 172.16.20.10 zk-0 0/1 Running 0 2s 192.168.5.171 172.16.20.10 zk-0 1/1 Running 0 12s 192.168.5.171 172.16.20.10
可以看到zk-0/zk-1都重啟了,這樣可以加載新的zoo.cfg配置文件,保證集群正確配置。
新的zoo.cfg配置文件記錄了3個(gè)實(shí)例:
[root@172 ~]# kubectl exec zk-0 -- cat /opt/zookeeper/conf/zoo.cfg #This file was autogenerated DO NOT EDIT clientPort=2181 dataDir=/var/lib/zookeeper/data dataLogDir=/var/lib/zookeeper/data/log tickTime=2000 initLimit=10 syncLimit=5 maxClientCnxns=60 minSessionTimeout=4000 maxSessionTimeout=40000 autopurge.snapRetainCount=3 autopurge.purgeInteval=12 server.1=zk-0.zk-hs.default.svc.cluster.local:2888:3888 server.2=zk-1.zk-hs.default.svc.cluster.local:2888:3888 server.3=zk-2.zk-hs.default.svc.cluster.local:2888:38887. 測(cè)試縮容
縮容的時(shí)候,zk集群也自動(dòng)重啟了所有的zk節(jié)點(diǎn),縮容過(guò)程如下:
[root@172 ~]# kubectl get po -owide -w NAME READY STATUS RESTARTS AGE IP NODE zk-0 1/1 Running 0 5m 192.168.5.171 172.16.20.10 zk-1 1/1 Running 0 6m 192.168.3.13 172.16.20.12 zk-2 1/1 Running 0 7m 192.168.2.154 172.16.20.11 zk-2 1/1 Terminating 0 7m 192.168.2.154 172.16.20.11 zk-1 1/1 Terminating 0 7m 192.168.3.13 172.16.20.12 zk-2 0/1 Terminating 0 8m四. etcd集群部署 1. 創(chuàng)建etcd集群172.16.20.11 zk-1 0/1 Terminating 0 7m 172.16.20.12 zk-2 0/1 Terminating 0 8m 172.16.20.11 zk-1 0/1 Terminating 0 7m 172.16.20.12 zk-1 0/1 Terminating 0 7m 172.16.20.12 zk-1 0/1 Terminating 0 7m 172.16.20.12 zk-1 0/1 Pending 0 0s zk-1 0/1 Pending 0 0s 172.16.20.12 zk-1 0/1 ContainerCreating 0 0s 172.16.20.12 zk-1 0/1 Running 0 2s 192.168.3.14 172.16.20.12 zk-2 0/1 Terminating 0 8m 172.16.20.11 zk-2 0/1 Terminating 0 8m 172.16.20.11 zk-1 1/1 Running 0 19s 192.168.3.14 172.16.20.12 zk-0 1/1 Terminating 0 7m 192.168.5.171 172.16.20.10 zk-0 0/1 Terminating 0 7m 172.16.20.10 zk-0 0/1 Terminating 0 7m 172.16.20.10 zk-0 0/1 Terminating 0 7m 172.16.20.10 zk-0 0/1 Pending 0 0s zk-0 0/1 Pending 0 0s 172.16.20.10 zk-0 0/1 ContainerCreating 0 0s 172.16.20.10 zk-0 0/1 Running 0 3s 192.168.5.172 172.16.20.10 zk-0 1/1 Running 0 13s 192.168.5.172 172.16.20.10
cat << EOF | kubectl create -f - apiVersion: v1 kind: Service metadata: name: "etcd" annotations: # Create endpoints also if the related pod isn"t ready service.alpha.kubernetes.io/tolerate-unready-endpoints: "true" spec: ports: - port: 2379 name: client - port: 2380 name: peer clusterIP: None selector: component: "etcd" --- apiVersion: apps/v1beta1 kind: StatefulSet metadata: name: "etcd" labels: component: "etcd" spec: serviceName: "etcd" # changing replicas value will require a manual etcdctl member remove/add # command (remove before decreasing and add after increasing) replicas: 3 template: metadata: name: "etcd" labels: component: "etcd" spec: containers: - name: "etcd" image: "172.16.18.100:5000/quay.io/coreos/etcd:v3.2.3" ports: - containerPort: 2379 name: client - containerPort: 2380 name: peer env: - name: CLUSTER_SIZE value: "3" - name: SET_NAME value: "etcd" volumeMounts: - name: data mountPath: /var/run/etcd command: - "/bin/sh" - "-ecx" - | IP=$(hostname -i) for i in $(seq 0 $((${CLUSTER_SIZE} - 1))); do while true; do echo "Waiting for ${SET_NAME}-${i}.${SET_NAME} to come up" ping -W 1 -c 1 ${SET_NAME}-${i}.${SET_NAME}.default.svc.cluster.local > /dev/null && break sleep 1s done done PEERS="" for i in $(seq 0 $((${CLUSTER_SIZE} - 1))); do PEERS="${PEERS}${PEERS:+,}${SET_NAME}-${i}=http://${SET_NAME}-${i}.${SET_NAME}.default.svc.cluster.local:2380" done # start etcd. If cluster is already initialized the `--initial-*` options will be ignored. exec etcd --name ${HOSTNAME} --listen-peer-urls http://${IP}:2380 --listen-client-urls http://${IP}:2379,http://127.0.0.1:2379 --advertise-client-urls http://${HOSTNAME}.${SET_NAME}:2379 --initial-advertise-peer-urls http://${HOSTNAME}.${SET_NAME}:2380 --initial-cluster-token etcd-cluster-1 --initial-cluster ${PEERS} --initial-cluster-state new --data-dir /var/run/etcd/default.etcd ## We are using dynamic pv provisioning using the "standard" storage class so ## this resource can be directly deployed without changes to minikube (since ## minikube defines this class for its minikube hostpath provisioner). In ## production define your own way to use pv claims. volumeClaimTemplates: - metadata: name: data annotations: volume.beta.kubernetes.io/storage-class: ceph spec: accessModes: - "ReadWriteOnce" resources: requests: storage: 1Gi EOF
創(chuàng)建完成之后的po,pv,pvc清單如下:
[root@172 etcd]# kubectl get po -owide NAME READY STATUS RESTARTS AGE IP NODE etcd-0 1/1 Running 0 15m 192.168.5.174 172.16.20.10 etcd-1 1/1 Running 0 15m 192.168.3.16 172.16.20.12 etcd-2 1/1 Running 0 5s 192.168.5.176 172.16.20.102. 測(cè)試縮容
kubectl scale statefulset etcd --replicas=2 [root@172 ~]# kubectl get po -owide -w NAME READY STATUS RESTARTS AGE IP NODE etcd-0 1/1 Running 0 17m 192.168.5.174 172.16.20.10 etcd-1 1/1 Running 0 17m 192.168.3.16 172.16.20.12 etcd-2 1/1 Running 0 1m 192.168.5.176 172.16.20.10 etcd-2 1/1 Terminating 0 1m 192.168.5.176 172.16.20.10 etcd-2 0/1 Terminating 0 1m172.16.20.10
檢查集群健康
kubectl exec etcd-0 -- etcdctl cluster-health failed to check the health of member 42c8b94265b9b79a on http://etcd-2.etcd:2379: Get http://etcd-2.etcd:2379/health: dial tcp: lookup etcd-2.etcd on 10.96.0.10:53: no such host member 42c8b94265b9b79a is unreachable: [http://etcd-2.etcd:2379] are all unreachable member 9869f0647883a00d is healthy: got healthy result from http://etcd-1.etcd:2379 member c799a6ef06bc8c14 is healthy: got healthy result from http://etcd-0.etcd:2379 cluster is healthy
發(fā)現(xiàn)縮容后,etcd-2并沒(méi)有從etcd集群中自動(dòng)刪除,可見(jiàn)這個(gè)etcd鏡像對(duì)自動(dòng)擴(kuò)容縮容的支持并不夠好。
我們手工刪除掉etcd-2
[root@172 etcd]# kubectl exec etcd-0 -- etcdctl member remove 42c8b94265b9b79a Removed member 42c8b94265b9b79a from cluster [root@172 etcd]# kubectl exec etcd-0 -- etcdctl cluster-health member 9869f0647883a00d is healthy: got healthy result from http://etcd-1.etcd:2379 member c799a6ef06bc8c14 is healthy: got healthy result from http://etcd-0.etcd:2379 cluster is healthy3. 測(cè)試擴(kuò)容
從etcd.yaml的啟動(dòng)腳本中可以看出,擴(kuò)容時(shí)新啟動(dòng)一個(gè)etcd pod時(shí)參數(shù)--initial-cluster-state new,該etcd鏡像并不支持動(dòng)態(tài)擴(kuò)容,可以考慮使用基于dns動(dòng)態(tài)部署etcd集群的方式來(lái)修改啟動(dòng)腳本,這樣才能支持etcd cluster動(dòng)態(tài)擴(kuò)容。
文章版權(quán)歸作者所有,未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為,您可以聯(lián)系管理員刪除。
轉(zhuǎn)載請(qǐng)注明本文地址:http://systransis.cn/yun/27153.html
摘要:二總結(jié)使用的和的,能夠很好的支持這樣的有狀態(tài)服務(wù)部署到集群上。部署方式有待優(yōu)化本次試驗(yàn)中使用靜態(tài)方式部署集群,如果節(jié)點(diǎn)變遷時(shí),需要執(zhí)行等命令手動(dòng)配置集群,嚴(yán)重限制了集群自動(dòng)故障恢復(fù)擴(kuò)容縮容的能力。 一. 概述 kubernetes通過(guò)statefulset為zookeeper、etcd等這類有狀態(tài)的應(yīng)用程序提供完善支持,statefulset具備以下特性: 為pod提供穩(wěn)定的唯一的...
摘要:是一個(gè)相對(duì)比較新的微服務(wù)框架,年才推出的版本雖然時(shí)間最短但是相比等框架提供的全套的分布式系統(tǒng)解決方案。提供線程池不同的服務(wù)走不同的線程池,實(shí)現(xiàn)了不同服務(wù)調(diào)用的隔離,避免了服務(wù)器雪崩的問(wèn)題。通過(guò)互相注冊(cè)的方式來(lái)進(jìn)行消息同步和保證高可用。 Spring Cloud 是一個(gè)相對(duì)比較新的微服務(wù)框架,...
摘要:的服務(wù)治理平臺(tái)發(fā)源于早期的個(gè)人項(xiàng)目。客戶端發(fā)現(xiàn)模式要求客戶端負(fù)責(zé)查詢注冊(cè)中心,獲取服務(wù)提供者的列表信息,使用負(fù)載均衡算法選擇一個(gè)合適的服務(wù)提供者,發(fā)起接口調(diào)用請(qǐng)求。系統(tǒng)和系統(tǒng)之間,少不了數(shù)據(jù)的互聯(lián)互通。隨著微服務(wù)的流行,一個(gè)系統(tǒng)內(nèi)的不同應(yīng)用進(jìn)行互聯(lián)互通也是常態(tài)。 PowerDotNet的服務(wù)治理平臺(tái)發(fā)源于早期的個(gè)人項(xiàng)目Power.Apix。這個(gè)項(xiàng)目借鑒了工作過(guò)的公司的服務(wù)治理方案,站在...
摘要:從容器到容器編排平臺(tái)以及周邊生態(tài)系統(tǒng)包含很多工具來(lái)管理容器的生命周期。終止運(yùn)行中的容器。發(fā)現(xiàn)在由運(yùn)行于多個(gè)主機(jī)上的容器組成的分布式部署容器發(fā)現(xiàn)至關(guān)重要。類似的,當(dāng)容器崩潰時(shí),編排工具可以啟動(dòng)替換。 從容器到容器編排 Docker平臺(tái)以及周邊生態(tài)系統(tǒng)包含很多工具來(lái)管理容器的生命周期。例如,Docker Command Line Interface(CLI)支持下面的容器活動(dòng): 從注冊(cè)表...
摘要:谷歌思科華為等等均是的貢獻(xiàn)成員。其中谷歌云平臺(tái)和等大型云提供商成功在生產(chǎn)環(huán)境中使用了。它為良好穩(wěn)定的生產(chǎn)部署提供了一個(gè)良好的起點(diǎn)。預(yù)先準(zhǔn)備在繼續(xù)之前,我們需要準(zhǔn)備一個(gè)谷歌云平臺(tái)的賬號(hào)免費(fèi)的應(yīng)該足夠了。我們將為部署配置。 本文將帶你充分了解Etcd的工作原理,演示如何用Kubernetes建立并運(yùn)行etcd集群,如何與Etcd交互,如何在Etcd中設(shè)置和檢索值,如何配置高可用等等。 sh...
閱讀 1480·2021-11-16 11:44
閱讀 3298·2021-09-29 09:43
閱讀 631·2019-08-30 10:52
閱讀 951·2019-08-29 11:01
閱讀 3265·2019-08-26 11:47
閱讀 2899·2019-08-23 12:18
閱讀 1372·2019-08-22 17:04
閱讀 2058·2019-08-21 17:04