GreatSQL社区

搜索

reddey

KES读写集群删除节点的操作文档

reddey 已有 25 次阅读2025-3-8 14:25 |系统分类:运维实战


首先在集群中任意一个节点查看一下当前集群的节点状态

bash-4.2# su -l kingbase

Last login: Fri Mar  7 03:13:05 EST 2025 on pts/1

[kingbase@node2:/home/kingbase]$ repmgr cluster show

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                              

----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.168.200.145 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000

2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.200.15 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000

3  | node3 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.200.9 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000

当前集群中node1(192.168.200.145)是主节点,另外两个备节点的IP地址为192.168.200.15192.168.200.9。在此次案例中,我们删除一个备节点192.168.200.9

我们切换到要删除的备节点上,对节点上的install.conf配置文件进行修改。

配置文件内容如下,我们主要是修改shrink标签中的内容。

[shrink]

shrink_type="0"                   # The node type of standby/witness node, which would be delete from cluster. 0:standby  1:witness

primary_ip="192.168.200.145"                    # The ip addr of cluster primary node, which need to shrink a standby/witness node.

shrink_ip="192.168.200.9"                     # The ip addr of standby/witness node, which would be delete from cluster.

node_id="3"                       # The node_id of standby/witness node, which would be delete from cluster. It does not the same with any one in  cluster node

                                # for example: node_id="3"

## Specific instructions ,see it under [install]

install_dir="/KingbaseES/V9/cluster"                   # the last layer of directory could not add '/'

ssh_port="22"                    # the port of ssh, default is 22

Scmd_port="8890"

主要是把主节点的IP地址和要删除的备节点IP地址填写好,要删除的节点的类型要填,node_id、也要写。install_dir目录的填写大家要注意,最后一个文件目录不要加斜杠。这一些注释说得很清楚。同时提醒广大网友们要注意,在部署读写集群时和扩展集群时,在填写nstall_dir是也要注意最后一层目录不能加斜杠。

其它标签install\expand的内容是不用修改的。至于为什么不用填写其它标签的内容,我个人猜想出于以下原因:1、我们只是删除节点而已,和其它标签的内容无关。2、我们在使用cluster_install.sh脚本来删除节点,已经带shrink参数。cluster_install.sh脚本只要读取shrink标签的内容就行了。检查要删除的备节点配置文件内容无误后,在备节点上以kingbase用户登录,开始执行删除备节点操作。

[kingbase@lucifer01:/install]$ sh cluster_install.sh shrink

[CONFIG_CHECK] will deploy the cluster of

[RUNNING] success connect to the target "192.168.200.9" ..... OK

[RUNNING] success connect to "192.168.200.9" from current node by 'ssh' ..... OK

[RUNNING] success connect to the target "192.168.200.145" ..... OK

[RUNNING] success connect to "192.168.200.145" from current node by 'ssh' ..... OK

[RUNNING] Primary node ip is 192.168.200.145 ...

[RUNNING] Primary node ip is 192.168.200.145 ... OK

[CONFIG_CHECK] set install_with_root=1

[RUNNING] success connect to "" from current node by 'ssh' ..... OK

[RUNNING] success connect to the target "192.168.200.145" ..... OK

[RUNNING] success connect to "192.168.200.145" from current node by 'ssh' ..... OK

[INSTALL] load config from cluster.....

[INFO] db_user=system

[INFO] db_port=54321

[INFO] use_scmd=1

[INFO] auto_cluster_recovery_level=1

[INFO] synchronous=quorum

[INSTALL] load config from cluster.....OK

[CONFIG_CHECK] check database connection ...

[CONFIG_CHECK] check database connection ... OK

[CONFIG_CHECK] shrink_ip[192.168.200.9] is a standby node IP in the cluster ...

[CONFIG_CHECK] shrink_ip[192.168.200.9] is a standby node IP in the cluster ...ok

[CONFIG_CHECK] The localhost is shrink_ip:[192.168.200.9] or primary_ip:[192.168.200.145]...

[CONFIG_CHECK] The localhost is shrink_ip:[192.168.200.9] or primary_ip:[192.168.200.145]...ok

[RUNNING] Primary node ip is 192.168.200.145 ...

[RUNNING] Primary node ip is 192.168.200.145 ... OK

[CONFIG_CHECK] check node_id is in cluster ...

[CONFIG_CHECK] check node_id is in cluster ...OK

[RUNNING] The /KingbaseES/V9/cluster/kingbase/bin dir exist on "192.168.200.9" ...

[RUNNING] The /KingbaseES/V9/cluster/kingbase/bin dir exist on "192.168.200.9" ... OK

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                              

----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.168.200.145 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000

2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.200.15 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000

3  | node3 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.200.9 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000

[RUNNING] Del node is standby ...

[INFO] node:192.168.200.9 can be deleted ... OK

[RUNNING] query archive command at 192.168.200.145 ...

[RUNNING] current cluster not config sys_rman,return.

[Fri Mar  7 19:57:05 CST 2025] [INFO] /KingbaseES/V9/cluster/kingbase/bin/repmgr standby unregister --node-id=3 ...

[INFO] connecting to local standby

[INFO] connecting to primary database

[NOTICE] unregistering node 3

[INFO] SET synchronous TO "quorum" on primary host

[INFO] change synchronous_standby_names from "ANY 1( node2,node3)" to "ANY 1( node2)"

[INFO] try to drop slot "repmgr_slot_3" of node 3 on primary node

[WARNING] replication slot "repmgr_slot_3" is still active on node 3

[INFO] standby unregistration complete

[Fri Mar  7 19:57:07 CST 2025] [INFO] /KingbaseES/V9/cluster/kingbase/bin/repmgr standby unregister --node-id=3 ...OK

[Fri Mar  7 19:57:07 CST 2025] [INFO] check db connection ...

[Fri Mar  7 19:57:08 CST 2025] [INFO] check db connection ...ok

2025-03-07 19:57:08 Ready to stop local kbha daemon and repmgrd daemon ...

2025-03-07 19:57:17 begin to stop repmgrd on "[localhost]".

2025-03-07 19:57:19 repmgrd on "[localhost]" stop success.

2025-03-07 19:57:19 Done.

2025-03-07 19:57:19 begin to stop DB on "[localhost]".

waiting for server to shut down.... done

server stopped

2025-03-07 19:57:20 DB on "[localhost]" stop success.

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                              

----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.168.200.145 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000

2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.200.15 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000

[Fri Mar  7 19:57:21 CST 2025] [INFO] drop replication slot:repmgr_slot_3...

pg_drop_replication_slot

--------------------------

(1 row)

[Fri Mar  7 19:57:21 CST 2025] [INFO] drop replication slot:repmgr_slot_3...OK

[Fri Mar  7 19:57:21 CST 2025] [INFO] modify synchronous parameter configuration...

[Fri Mar  7 19:57:23 CST 2025] [INFO] modify synchronous parameter configuration...ok

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                              

----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.168.200.145 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000

2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.200.15 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000

用户可以在任意一个节点以kingbase用户登录,以查看备节点是否被删除。如下所示

[kingbase@node2:/home/kingbase]$ repmgr service status

ID | Name  | Role    | Status    | Upstream | repmgrd | PID    | Paused? | Upstream last seen

----+-------+---------+-----------+----------+---------+--------+---------+--------------------

1  | node1 | primary | * running |          | running | 10768  | no      | n/a

2  | node2 | standby |   running | node1    | running | 122625 | no      | 1 second(s) ago

评论 (0 个评论)

facelist

您需要登录后才可以评论 登录 | 立即注册

合作电话:010-64087828

社区邮箱:greatsql@greatdb.com

社区公众号
社区小助手
QQ群
GMT+8, 2025-3-12 19:14 , Processed in 0.016177 second(s), 9 queries , Redis On.
返回顶部