reddey

uid：1772

注册时间：2024-4-24 13:30 上次发表时间：2025-6-30 15:39

好友数 4 | 博客数 98 | 回帖数 291 | 主题数 29

KES读写集群删除节点的操作文档

reddey 已有 524 次阅读2025-3-8 14:25 |系统分类:运维实战

首先在集群中任意一个节点查看一下当前集群的节点状态

bash-4.2# su -l kingbase

Last login: Fri Mar 7 03:13:05 EST 2025 on pts/1

[kingbase@node2:/home/kingbase]$ repmgr cluster show

----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

当前集群中node1(192.168.200.145)是主节点，另外两个备节点的IP地址为192.168.200.15、192.168.200.9。在此次案例中，我们删除一个备节点192.168.200.9。

我们切换到要删除的备节点上，对节点上的install.conf配置文件进行修改。

配置文件内容如下，我们主要是修改shrink标签中的内容。

[shrink]

shrink_type="0" # The node type of standby/witness node, which would be delete from cluster. 0:standby 1:witness

primary_ip="192.168.200.145" # The ip addr of cluster primary node, which need to shrink a standby/witness node.

shrink_ip="192.168.200.9" # The ip addr of standby/witness node, which would be delete from cluster.

node_id="3" # The node_id of standby/witness node, which would be delete from cluster. It does not the same with any one in cluster node

# for example: node_id="3"

## Specific instructions ,see it under [install]

install_dir="/KingbaseES/V9/cluster" # the last layer of directory could not add '/'

ssh_port="22" # the port of ssh, default is 22

Scmd_port="8890"

主要是把主节点的IP地址和要删除的备节点IP地址填写好，要删除的节点的类型要填，node_id、也要写。install_dir目录的填写大家要注意，最后一个文件目录不要加斜杠。这一些注释说得很清楚。同时提醒广大网友们要注意，在部署读写集群时和扩展集群时，在填写nstall_dir是也要注意最后一层目录不能加斜杠。

其它标签install\expand的内容是不用修改的。至于为什么不用填写其它标签的内容，我个人猜想出于以下原因：1、我们只是删除节点而已，和其它标签的内容无关。2、我们在使用cluster_install.sh脚本来删除节点，已经带shrink参数。cluster_install.sh脚本只要读取shrink标签的内容就行了。检查要删除的备节点配置文件内容无误后，在备节点上以kingbase用户登录，开始执行删除备节点操作。

[kingbase@lucifer01:/install]$ sh cluster_install.sh shrink

[CONFIG_CHECK] will deploy the cluster of

[RUNNING] success connect to the target "192.168.200.9" ..... OK

[RUNNING] success connect to "192.168.200.9" from current node by 'ssh' ..... OK

[RUNNING] success connect to the target "192.168.200.145" ..... OK

[RUNNING] success connect to "192.168.200.145" from current node by 'ssh' ..... OK

[RUNNING] Primary node ip is 192.168.200.145 ...

[RUNNING] Primary node ip is 192.168.200.145 ... OK

[CONFIG_CHECK] set install_with_root=1

[RUNNING] success connect to "" from current node by 'ssh' ..... OK

[RUNNING] success connect to the target "192.168.200.145" ..... OK

[RUNNING] success connect to "192.168.200.145" from current node by 'ssh' ..... OK

[INSTALL] load config from cluster.....

[INFO] db_user=system

[INFO] db_port=54321

[INFO] use_scmd=1

[INFO] auto_cluster_recovery_level=1

[INFO] synchronous=quorum

[INSTALL] load config from cluster.....OK

[CONFIG_CHECK] check database connection ...

[CONFIG_CHECK] check database connection ... OK

[CONFIG_CHECK] shrink_ip[192.168.200.9] is a standby node IP in the cluster ...

[CONFIG_CHECK] shrink_ip[192.168.200.9] is a standby node IP in the cluster ...ok

[CONFIG_CHECK] The localhost is shrink_ip:[192.168.200.9] or primary_ip:[192.168.200.145]...

[CONFIG_CHECK] The localhost is shrink_ip:[192.168.200.9] or primary_ip:[192.168.200.145]...ok

[RUNNING] Primary node ip is 192.168.200.145 ...

[RUNNING] Primary node ip is 192.168.200.145 ... OK

[CONFIG_CHECK] check node_id is in cluster ...

[CONFIG_CHECK] check node_id is in cluster ...OK

[RUNNING] The /KingbaseES/V9/cluster/kingbase/bin dir exist on "192.168.200.9" ...

[RUNNING] The /KingbaseES/V9/cluster/kingbase/bin dir exist on "192.168.200.9" ... OK

[RUNNING] Del node is standby ...

[INFO] node:192.168.200.9 can be deleted ... OK

[RUNNING] query archive command at 192.168.200.145 ...

[RUNNING] current cluster not config sys_rman,return.

[Fri Mar 7 19:57:05 CST 2025] [INFO] /KingbaseES/V9/cluster/kingbase/bin/repmgr standby unregister --node-id=3 ...

[INFO] connecting to local standby

[INFO] connecting to primary database

[NOTICE] unregistering node 3

[INFO] SET synchronous TO "quorum" on primary host

[INFO] change synchronous_standby_names from "ANY 1( node2,node3)" to "ANY 1( node2)"

[INFO] try to drop slot "repmgr_slot_3" of node 3 on primary node

[WARNING] replication slot "repmgr_slot_3" is still active on node 3

[INFO] standby unregistration complete

[Fri Mar 7 19:57:07 CST 2025] [INFO] /KingbaseES/V9/cluster/kingbase/bin/repmgr standby unregister --node-id=3 ...OK

[Fri Mar 7 19:57:07 CST 2025] [INFO] check db connection ...

[Fri Mar 7 19:57:08 CST 2025] [INFO] check db connection ...ok

2025-03-07 19:57:08 Ready to stop local kbha daemon and repmgrd daemon ...

2025-03-07 19:57:17 begin to stop repmgrd on "[localhost]".

2025-03-07 19:57:19 repmgrd on "[localhost]" stop success.

2025-03-07 19:57:19 Done.

2025-03-07 19:57:19 begin to stop DB on "[localhost]".

waiting for server to shut down.... done

server stopped

2025-03-07 19:57:20 DB on "[localhost]" stop success.

[Fri Mar 7 19:57:21 CST 2025] [INFO] drop replication slot:repmgr_slot_3...

pg_drop_replication_slot

--------------------------

(1 row)

[Fri Mar 7 19:57:21 CST 2025] [INFO] drop replication slot:repmgr_slot_3...OK

[Fri Mar 7 19:57:21 CST 2025] [INFO] modify synchronous parameter configuration...

[Fri Mar 7 19:57:23 CST 2025] [INFO] modify synchronous parameter configuration...ok

用户可以在任意一个节点以kingbase用户登录，以查看备节点是否被删除。如下所示

[kingbase@node2:/home/kingbase]$ repmgr service status

----+-------+---------+-----------+----------+---------+--------+---------+--------------------

1 | node1 | primary | * running | | running | 10768 | no | n/a

收藏 0 邀请举报

reddey

KES读写集群删除节点的操作文档

全部作者的其他最新博客

评论 (0 个评论)