GreatSQL社区

搜索

[待回复] MGR 主节点发生切换后,vip重新漂移之后无法访问

2152 14 2024-7-15 10:34
本帖最后由 mabai 于 2024-7-15 13:42 编辑

环境:产线
数据库版本  8.0.32-25
服务器版本:centos7.9  内核版本:3.10.0
服务器为双网卡:分为内部同步网卡   和  业务应用网卡

同步网段为 10.0.20.0/24    业务应用网段为10.0.30.0/24

配置mgr以及vip的配置如下:


#mgr settings
loose-plugin_load_add = 'mysql_clone.so'
loose-plugin_load_add = 'group_replication.so'
loose-group_replication_group_name = "550e8400-e29b-11d4-a716-446655440000"
loose-group_replication_local_address = "10.0.20.22:33061"
loose-group_replication_group_seeds = "10.0.20.22:33061,10.0.20.21:33061,10.0.20.20:33061"
loose-group_replication_start_on_boot = OFF
loose-group_replication_bootstrap_group = OFF
loose-group_replication_exit_state_action = READ_ONLY
loose-group_replication_flow_control_mode = "DISABLED"
loose-group_replication_single_primary_mode = ON
loose-group_replication_majority_after_mode = ON
loose-group_replication_communication_max_message_size = 10M
loose-group_replication_arbitrator = 0
loose-group_replication_single_primary_fast_mode = 1
loose-group_replication_request_time_threshold = 100
loose-group_replication_primary_election_mode = GTID_FIRST
loose-group_replication_unreachable_majority_timeout = 0
loose-group_replication_member_expel_timeout = 5
loose-group_replication_autorejoin_tries = 288
loose-group_replication_consistency="BEFORE_ON_PRIMARY_FAILOVER"



#mgr vip
loose-plugin_load_add = 'greatdb_ha.so'
loose-greatdb_ha_enable_mgr_vip = 1
loose-greatdb_ha_mgr_vip_nic = 'bond0'
loose-greatdb_ha_mgr_vip_ip = '10.0.30.240'
loose-greatdb_ha_mgr_vip_mask = '255.255.255.0'
loose-greatdb_ha_port = 33062
loose-greatdb_ha_mgr_read_vip_ips = "10.0.30.241"
loose-greatdb_ha_mgr_read_vip_floating_type = "TO_ANOTHER_SECONDARY"
loose-greatdb_ha_send_arp_packge_times = 5
report_host = "10.0.20.22"
report_port = 3306
loose-group_replication_enforce_update_everywhere_checks=0
bind_address="0.0.0.0"


解绑日志如下:

2024-07-13T00:01:39.189872+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] 1720800099.189858 pid 45628 xcom_id 65b1915b state xcom_fsm_run action x_fsm_terminate'
2024-07-13T00:01:39.189912+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] set CON_NULL for fd:376 in close_connection'
2024-07-13T00:01:39.189933+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] set CON_NULL for fd:329 in close_connection'
2024-07-13T00:01:39.189948+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] 1720800099.189944 pid 45628 xcom_id 114a9b0 state xcom_fsm_start action x_fsm_exit'
2024-07-13T00:01:39.189958+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Exiting xcom thread'
2024-07-13T00:01:39.189970+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] set CON_NULL for fd:327 in close_connection'
2024-07-13T00:01:39.190008+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] set CON_NULL for fd:326 in close_connection'
2024-07-13T00:01:39.602684+08:00 91 [Note] [MY-011457] [Repl] Plugin group_replication reported: 'Broadcast of committed transactions message failed.'
2024-07-13T00:01:39.702853+08:00 91 [Note] [MY-011725] [Repl] Plugin group_replication reported: 'Error while sending stats message'
2024-07-13T00:01:40.544181+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Installing leave view.'
2024-07-13T00:01:40.544208+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] ::install_view():: No exchanged data'
2024-07-13T00:01:40.544221+08:00 0 [Note] [MY-011071] [Repl] Plugin group_replication reported: 'on_view_changed is called'
2024-07-13T00:01:40.544262+08:00 0 [System] [MY-011504] [Repl] Plugin group_replication reported: 'Group membership changed: This member has left the group.'
2024-07-13T00:01:40.545004+08:00 0 [ERROR] [MY-000000] [Server] Plugin greatdb_ha reported: 'kill connections binding to vip: 10.0.30.240'
2024-07-13T00:01:40.545803+08:00 0 [Warning] [MY-000000] [Server] Plugin greatdb_ha reported: 'try to unbind vip : 10.0.30.240 success'
2024-07-13T00:01:40.545853+08:00 0 [Note] [MY-010067] [Server] Giving 31 client threads a chance to die gracefully
2024-07-13T00:01:40.545968+08:00 0 [Note] [MY-010117] [Server] Shutting down slave threads
2024-07-13T00:01:40.546010+08:00 0 [Note] [MY-010054] [Server] Event Scheduler: Killing the scheduler thread, thread id 7
2024-07-13T00:01:40.546039+08:00 0 [Note] [MY-010050] [Server] Event Scheduler: Waiting for the scheduler thread to reply
2024-07-13T00:01:40.547499+08:00 0 [Note] [MY-010048] [Server] Event Scheduler: Stopped
2024-07-13T00:01:42.547783+08:00 0 [Note] [MY-010118] [Server] Forcefully disconnecting 6 remaining clients
2024-07-13T00:01:42.547815+08:00 0 [Note] [MY-011650] [Repl] Plugin group_replication reported: 'Plugin 'group_replication' is stopping.'
2024-07-13T00:01:42.548450+08:00 90 [Note] [MY-011444] [Repl] Plugin group_replication reported: 'The group replication applier thread was killed.'
2024-07-13T00:01:42.548629+08:00 0 [Note] [MY-011071] [Repl] Plugin group_replication reported: 'Destroy certifier broadcast thread'
2024-07-13T00:01:42.548743+08:00 0 [System] [MY-011651] [Repl] Plugin group_replication reported: 'Plugin 'group_replication' has been stopped.'
2024-07-13T00:01:42.550998+08:00 0 [Note] [MY-010043] [Server] Event Scheduler: Purging the queue. 0 events
2024-07-13T00:01:42.553116+08:00 0 [Note] [MY-012330] [InnoDB] FTS optimize thread exiting.





绑定日志如下:

2024-07-13T00:01:36.819177+08:00 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to 10.0.20.22:3306, 10.0.20.20:3306 on view 17177509283958545:34.'
2024-07-13T00:01:36.819420+08:00 951 [System] [MY-011565] [Repl] Plugin group_replication reported: 'Setting super_read_only=ON.'
2024-07-13T00:01:36.819859+08:00 118 [System] [MY-013731] [Repl] Plugin group_replication reported: 'The member action "mysql_disable_super_read_only_if_primary" for event "AFTER_PRIMARY_ELECTION" with priority "1" will be run.'
2024-07-13T00:01:36.819891+08:00 118 [System] [MY-011566] [Repl] Plugin group_replication reported: 'Setting super_read_only=OFF.'
2024-07-13T00:01:36.820067+08:00 118 [System] [MY-013731] [Repl] Plugin group_replication reported: 'The member action "mysql_start_failover_channels_if_primary" for event "AFTER_PRIMARY_ELECTION" with priority "10" will be run.'
2024-07-13T00:01:36.831148+08:00 951 [System] [MY-011510] [Repl] Plugin group_replication reported: 'This server is working as primary member.'
2024-07-13T00:01:36.831180+08:00 50 [Note] [MY-011485] [Repl] Plugin group_replication reported: 'Primary had applied all relay logs, disabled conflict detection.'
2024-07-13T00:01:36.884938+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] set local notify true when site is different'
2024-07-13T00:01:36.884959+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] call deliver_view_msg in detector'
2024-07-13T00:01:36.884979+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] xcom_receive_local_view is called'
2024-07-13T00:01:36.884997+08:00 0 [Note] [MY-011071] [Repl] Plugin group_replication reported: 'on_suspicions is activated'
2024-07-13T00:01:36.885013+08:00 0 [Note] [MY-011071] [Repl] Plugin group_replication reported: 'on_suspicions is called over'
2024-07-13T00:01:36.885025+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] xcom_receive_local_view return true'
2024-07-13T00:01:37.337731+08:00 0 [Warning] [MY-000000] [Server] Plugin greatdb_ha reported: 'try to bind vip : 10.0.30.240 success'
2024-07-13T00:01:39.189221+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] set CON_NULL for fd:326 in close_connection'
2024-07-13T00:01:39.189307+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Failure reading from fd=329 n=0 from 10.0.20.21:33061'
2024-07-13T00:01:39.189320+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] set CON_NULL for fd:329 in close_connection'
2024-07-13T00:01:39.189337+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] fast_skip_allowed_for_kill is set here for server:10.0.20.21,port:33061'
2024-07-13T00:03:25.160020+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Adding new node to the configuration: 10.0.20.21:33061'
2024-07-13T00:03:25.160077+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] handle_add_node calls site_install_action'
2024-07-13T00:03:25.160474+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] update_servers is called, max nodes:3'
2024-07-13T00:03:25.160489+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Updating physical connections to other servers'
2024-07-13T00:03:25.160500+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Using existing server node 0 host 10.0.20.20:33061'
2024-07-13T00:03:25.160515+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Using existing server node 1 host 10.0.20.22:33061'
2024-07-13T00:03:25.160524+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Using existing server node 2 host 10.0.20.21:33061'
2024-07-13T00:03:25.160537+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Sucessfully installed new site definition. Start synode for this configuration is {f6b80450 57136804 0}, boot key synode is {f6b80450 57136793 0}, configured event horizon=10, my node identifier is 1'
2024-07-13T00:03:25.160690+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Connecting to 10.0.20.21:33061'
2024-07-13T00:03:25.160801+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Connected to 10.0.20.21:33061'
2024-07-13T00:03:25.160830+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] sender_task sets CON_PROTO for fd:131'
2024-07-13T00:03:25.160847+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] sent negotiation request for protocol 10 fd 131'
2024-07-13T00:03:25.233188+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] read_msg sets CON_PROTO for fd:131 in mark, tag:314'
2024-07-13T00:03:25.259965+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] proto is done for fd:131'
2024-07-13T00:03:25.689791+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] before deliver_global_view_msg is called'
2024-07-13T00:03:25.689821+08:00 0 [Note] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] after deliver_global_view_msg is called'





在发生切换之后    必须要手动刷新arp 之后才能正常访问vip

其中有一个疑问report_host 的配置 是要配置为  同步网卡的ip还是业务网卡的ip


全部回复(14)
yejr 2024-7-15 10:51:12
1. 先回答第二个问题
report_host 设置的IP要和 group_replication_local_address 指定的IP一样。即:要上报的IP和指定的MGR节点IP一致。

2. 第一个问题
请尝试用手工方式绑定VIP,看看能否自动刷新ARP表,先确认在OS层这个工作是否正常。如果手动绑定VIP后,ARP能自动刷新,那再有较大理由怀疑是greatdb_ha Plugin的问题。
mabai 2024-7-15 10:54:11
yejr 发表于 2024-7-15 10:51
1. 先回答第二个问题
report_host 设置的IP要和 group_replication_local_address 指定的IP一样。即:要上 ...

2.手动 arp 刷新之后  能通

发生切换之后执行
/usr/sbin/arping -I bond0 -c 3 -s 10.0.30.240 10.0.30.1


mabai 2024-7-15 10:54:52
mabai 发表于 2024-7-15 10:54
2.手动 arp 刷新之后  能通

发生切换之后执行

手动执行完arp刷新之后  可以正常通
mabai 2024-7-15 11:04:02
yejr 发表于 2024-7-15 10:51
1. 先回答第二个问题
report_host 设置的IP要和 group_replication_local_address 指定的IP一样。即:要上 ...

手动绑定 可以直接通
yejr 2024-7-15 13:53:38
mabai 发表于 2024-7-15 11:04
手动绑定 可以直接通

手动绑定的具体操作是怎样的呢,帮忙提供下详细过程及其输出等内容
mabai 2024-7-15 13:57:45
本帖最后由 mabai 于 2024-7-15 13:59 编辑
yejr 发表于 2024-7-15 13:53
手动绑定的具体操作是怎样的呢,帮忙提供下详细过程及其输出等内容

直接通过命令  /sbin/ifconfig eth0:1 10.0.20.3/24   ,没有报错,然后通过ip a  查看 有绑定到eth0 网卡

绑定完成之后再其他  服务器  ping  10.0.20.3  是可以ping通的
yejr 2024-7-15 14:16:52
mabai 发表于 2024-7-15 13:57
直接通过命令  /sbin/ifconfig eth0:1 10.0.20.3/24   ,没有报错,然后通过ip a  查看 有绑定到eth0 网卡 ...

你手动绑定的VIP是:10.0.20.3(eth0:1),
但看你上面贴的配置中要绑定的VIP是:10.0.30.240和10.0.30.241(网卡 bond0),
二者的条件并不一样,请先尝试手动绑定成这两个VIP试试看呢
mabai 2024-7-15 14:26:26
yejr 发表于 2024-7-15 14:16
你手动绑定的VIP是:10.0.20.3(eth0:1),
但看你上面贴的配置中要绑定的VIP是:10.0.30.240和10.0.30.2 ...

因为产线数据库现在正在使用这两个ip  所以没法再去绑定 这两个ip  只能用其他未使用的ip

执行
/sbin/ifconfig bond0:1 10.0.20.242/24  

然后再其他的服务器上ping 可以直接ping通
yejr 2024-7-15 14:35:08
mabai 发表于 2024-7-15 14:26
因为产线数据库现在正在使用这两个ip  所以没法再去绑定 这两个ip  只能用其他未使用的ip

执行

  1. /sbin/ip a add 10.0.30.242/24 dev bond0
复制代码

麻烦用上面的方法添加新VIP(注意确认这个VIP是否已被使用,并且要求是10.0.30.x网段的)后试试看能不能直接ping通
12下一页
mabai

13

主题

0

博客

73

贡献

注册会员

Rank: 2

积分
127

助人为乐(铜)勤学好问(铜)

合作电话:010-64087828

社区邮箱:greatsql@greatdb.com

社区公众号
社区小助手
QQ群
GMT+8, 2024-9-8 08:35 , Processed in 0.021775 second(s), 18 queries , Redis On.
快速回复 返回顶部 返回列表