|
首先在集群中任意一个节点查看一下当前集群的节点状态
bash-4.2# su -l kingbase
Last login: Fri Mar 7 03:13:05 EST 2025 on pts/1
[kingbase@node2:/home/kingbase]$ repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=192.168.200.145 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.200.15 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
3 | node3 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.200.9 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
当前集群中node1(192.168.200.145)是主节点,另外两个备节点的IP地址为192.168.200.15、192.168.200.9。在此次案例中,我们删除一个备节点192.168.200.9。
我们切换到要删除的备节点上,对节点上的install.conf配置文件进行修改。
配置文件内容如下,我们主要是修改shrink标签中的内容。
[shrink]
shrink_type="0" # The node type of standby/witness node, which would be delete from cluster. 0:standby 1:witness
primary_ip="192.168.200.145" # The ip addr of cluster primary node, which need to shrink a standby/witness node.
shrink_ip="192.168.200.9" # The ip addr of standby/witness node, which would be delete from cluster.
node_id="3" # The node_id of standby/witness node, which would be delete from cluster. It does not the same with any one in cluster node
# for example: node_id="3"
## Specific instructions ,see it under [install]
install_dir="/KingbaseES/V9/cluster" # the last layer of directory could not add '/'
ssh_port="22" # the port of ssh, default is 22
Scmd_port="8890"
主要是把主节点的IP地址和要删除的备节点IP地址填写好,要删除的节点的类型要填,node_id、也要写。install_dir目录的填写大家要注意,最后一个文件目录不要加斜杠。这一些注释说得很清楚。同时提醒广大网友们要注意,在部署读写集群时和扩展集群时,在填写nstall_dir是也要注意最后一层目录不能加斜杠。
其它标签install\expand的内容是不用修改的。至于为什么不用填写其它标签的内容,我个人猜想出于以下原因:1、我们只是删除节点而已,和其它标签的内容无关。2、我们在使用cluster_install.sh脚本来删除节点,已经带shrink参数。cluster_install.sh脚本只要读取shrink标签的内容就行了。检查要删除的备节点配置文件内容无误后,在备节点上以kingbase用户登录,开始执行删除备节点操作。
[kingbase@lucifer01:/install]$ sh cluster_install.sh shrink
[CONFIG_CHECK] will deploy the cluster of
[RUNNING] success connect to the target "192.168.200.9" ..... OK
[RUNNING] success connect to "192.168.200.9" from current node by 'ssh' ..... OK
[RUNNING] success connect to the target "192.168.200.145" ..... OK
[RUNNING] success connect to "192.168.200.145" from current node by 'ssh' ..... OK
[RUNNING] Primary node ip is 192.168.200.145 ...
[RUNNING] Primary node ip is 192.168.200.145 ... OK
[CONFIG_CHECK] set install_with_root=1
[RUNNING] success connect to "" from current node by 'ssh' ..... OK
[RUNNING] success connect to the target "192.168.200.145" ..... OK
[RUNNING] success connect to "192.168.200.145" from current node by 'ssh' ..... OK
[INSTALL] load config from cluster.....
[INFO] db_user=system
[INFO] db_port=54321
[INFO] use_scmd=1
[INFO] auto_cluster_recovery_level=1
[INFO] synchronous=quorum
[INSTALL] load config from cluster.....OK
[CONFIG_CHECK] check database connection ...
[CONFIG_CHECK] check database connection ... OK
[CONFIG_CHECK] shrink_ip[192.168.200.9] is a standby node IP in the cluster ...
[CONFIG_CHECK] shrink_ip[192.168.200.9] is a standby node IP in the cluster ...ok
[CONFIG_CHECK] The localhost is shrink_ip:[192.168.200.9] or primary_ip:[192.168.200.145]...
[CONFIG_CHECK] The localhost is shrink_ip:[192.168.200.9] or primary_ip:[192.168.200.145]...ok
[RUNNING] Primary node ip is 192.168.200.145 ...
[RUNNING] Primary node ip is 192.168.200.145 ... OK
[CONFIG_CHECK] check node_id is in cluster ...
[CONFIG_CHECK] check node_id is in cluster ...OK
[RUNNING] The /KingbaseES/V9/cluster/kingbase/bin dir exist on "192.168.200.9" ...
[RUNNING] The /KingbaseES/V9/cluster/kingbase/bin dir exist on "192.168.200.9" ... OK
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=192.168.200.145 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.200.15 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
3 | node3 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.200.9 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[RUNNING] Del node is standby ...
[INFO] node:192.168.200.9 can be deleted ... OK
[RUNNING] query archive command at 192.168.200.145 ...
[RUNNING] current cluster not config sys_rman,return.
[Fri Mar 7 19:57:05 CST 2025] [INFO] /KingbaseES/V9/cluster/kingbase/bin/repmgr standby unregister --node-id=3 ...
[INFO] connecting to local standby
[INFO] connecting to primary database
[NOTICE] unregistering node 3
[INFO] SET synchronous TO "quorum" on primary host
[INFO] change synchronous_standby_names from "ANY 1( node2,node3)" to "ANY 1( node2)"
[INFO] try to drop slot "repmgr_slot_3" of node 3 on primary node
[WARNING] replication slot "repmgr_slot_3" is still active on node 3
[INFO] standby unregistration complete
[Fri Mar 7 19:57:07 CST 2025] [INFO] /KingbaseES/V9/cluster/kingbase/bin/repmgr standby unregister --node-id=3 ...OK
[Fri Mar 7 19:57:07 CST 2025] [INFO] check db connection ...
[Fri Mar 7 19:57:08 CST 2025] [INFO] check db connection ...ok
2025-03-07 19:57:08 Ready to stop local kbha daemon and repmgrd daemon ...
2025-03-07 19:57:17 begin to stop repmgrd on "[localhost]".
2025-03-07 19:57:19 repmgrd on "[localhost]" stop success.
2025-03-07 19:57:19 Done.
2025-03-07 19:57:19 begin to stop DB on "[localhost]".
waiting for server to shut down.... done
server stopped
2025-03-07 19:57:20 DB on "[localhost]" stop success.
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=192.168.200.145 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.200.15 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
[Fri Mar 7 19:57:21 CST 2025] [INFO] drop replication slot:repmgr_slot_3...
pg_drop_replication_slot
--------------------------
(1 row)
[Fri Mar 7 19:57:21 CST 2025] [INFO] drop replication slot:repmgr_slot_3...OK
[Fri Mar 7 19:57:21 CST 2025] [INFO] modify synchronous parameter configuration...
[Fri Mar 7 19:57:23 CST 2025] [INFO] modify synchronous parameter configuration...ok
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=192.168.200.145 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.200.15 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000
用户可以在任意一个节点以kingbase用户登录,以查看备节点是否被删除。如下所示
[kingbase@node2:/home/kingbase]$ repmgr service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+--------+---------+--------------------
1 | node1 | primary | * running | | running | 10768 | no | n/a
2 | node2 | standby | running | node1 | running | 122625 | no | 1 second(s) ago
合作电话:010-64087828
社区邮箱:greatsql@greatdb.com