MGR 主节点发生切换后,vip重新漂移之后无法访问
环境:产线数据库版本8.0.32-25
服务器版本:centos7.9内核版本:3.10.0
服务器为双网卡:分为内部同步网卡 和业务应用网卡
同步网段为 10.0.20.0/24 业务应用网段为10.0.30.0/24
配置mgr以及vip的配置如下:
#mgr settings
loose-plugin_load_add = 'mysql_clone.so'
loose-plugin_load_add = 'group_replication.so'
loose-group_replication_group_name = "550e8400-e29b-11d4-a716-446655440000"
loose-group_replication_local_address = "10.0.20.22:33061"
loose-group_replication_group_seeds = "10.0.20.22:33061,10.0.20.21:33061,10.0.20.20:33061"
loose-group_replication_start_on_boot = OFF
loose-group_replication_bootstrap_group = OFF
loose-group_replication_exit_state_action = READ_ONLY
loose-group_replication_flow_control_mode = "DISABLED"
loose-group_replication_single_primary_mode = ON
loose-group_replication_majority_after_mode = ON
loose-group_replication_communication_max_message_size = 10M
loose-group_replication_arbitrator = 0
loose-group_replication_single_primary_fast_mode = 1
loose-group_replication_request_time_threshold = 100
loose-group_replication_primary_election_mode = GTID_FIRST
loose-group_replication_unreachable_majority_timeout = 0
loose-group_replication_member_expel_timeout = 5
loose-group_replication_autorejoin_tries = 288
loose-group_replication_consistency="BEFORE_ON_PRIMARY_FAILOVER"
#mgr vip
loose-plugin_load_add = 'greatdb_ha.so'
loose-greatdb_ha_enable_mgr_vip = 1
loose-greatdb_ha_mgr_vip_nic = 'bond0'
loose-greatdb_ha_mgr_vip_ip = '10.0.30.240'
loose-greatdb_ha_mgr_vip_mask = '255.255.255.0'
loose-greatdb_ha_port = 33062
loose-greatdb_ha_mgr_read_vip_ips = "10.0.30.241"
loose-greatdb_ha_mgr_read_vip_floating_type = "TO_ANOTHER_SECONDARY"
loose-greatdb_ha_send_arp_packge_times = 5
report_host = "10.0.20.22"
report_port = 3306
loose-group_replication_enforce_update_everywhere_checks=0
bind_address="0.0.0.0"
解绑日志如下:
2024-07-13T00:01:39.189872+08:00 0 Plugin group_replication reported: ' 1720800099.189858 pid 45628 xcom_id 65b1915b state xcom_fsm_run action x_fsm_terminate'
2024-07-13T00:01:39.189912+08:00 0 Plugin group_replication reported: ' set CON_NULL for fd:376 in close_connection'
2024-07-13T00:01:39.189933+08:00 0 Plugin group_replication reported: ' set CON_NULL for fd:329 in close_connection'
2024-07-13T00:01:39.189948+08:00 0 Plugin group_replication reported: ' 1720800099.189944 pid 45628 xcom_id 114a9b0 state xcom_fsm_start action x_fsm_exit'
2024-07-13T00:01:39.189958+08:00 0 Plugin group_replication reported: ' Exiting xcom thread'
2024-07-13T00:01:39.189970+08:00 0 Plugin group_replication reported: ' set CON_NULL for fd:327 in close_connection'
2024-07-13T00:01:39.190008+08:00 0 Plugin group_replication reported: ' set CON_NULL for fd:326 in close_connection'
2024-07-13T00:01:39.602684+08:00 91 Plugin group_replication reported: 'Broadcast of committed transactions message failed.'
2024-07-13T00:01:39.702853+08:00 91 Plugin group_replication reported: 'Error while sending stats message'
2024-07-13T00:01:40.544181+08:00 0 Plugin group_replication reported: ' Installing leave view.'
2024-07-13T00:01:40.544208+08:00 0 Plugin group_replication reported: ' ::install_view():: No exchanged data'
2024-07-13T00:01:40.544221+08:00 0 Plugin group_replication reported: 'on_view_changed is called'
2024-07-13T00:01:40.544262+08:00 0 Plugin group_replication reported: 'Group membership changed: This member has left the group.'
2024-07-13T00:01:40.545004+08:00 0 Plugin greatdb_ha reported: 'kill connections binding to vip: 10.0.30.240'
2024-07-13T00:01:40.545803+08:00 0 Plugin greatdb_ha reported: 'try to unbind vip : 10.0.30.240 success'
2024-07-13T00:01:40.545853+08:00 0 Giving 31 client threads a chance to die gracefully
2024-07-13T00:01:40.545968+08:00 0 Shutting down slave threads
2024-07-13T00:01:40.546010+08:00 0 Event Scheduler: Killing the scheduler thread, thread id 7
2024-07-13T00:01:40.546039+08:00 0 Event Scheduler: Waiting for the scheduler thread to reply
2024-07-13T00:01:40.547499+08:00 0 Event Scheduler: Stopped
2024-07-13T00:01:42.547783+08:00 0 Forcefully disconnecting 6 remaining clients
2024-07-13T00:01:42.547815+08:00 0 Plugin group_replication reported: 'Plugin 'group_replication' is stopping.'
2024-07-13T00:01:42.548450+08:00 90 Plugin group_replication reported: 'The group replication applier thread was killed.'
2024-07-13T00:01:42.548629+08:00 0 Plugin group_replication reported: 'Destroy certifier broadcast thread'
2024-07-13T00:01:42.548743+08:00 0 Plugin group_replication reported: 'Plugin 'group_replication' has been stopped.'
2024-07-13T00:01:42.550998+08:00 0 Event Scheduler: Purging the queue. 0 events
2024-07-13T00:01:42.553116+08:00 0 FTS optimize thread exiting.
绑定日志如下:
2024-07-13T00:01:36.819177+08:00 0 Plugin group_replication reported: 'Group membership changed to 10.0.20.22:3306, 10.0.20.20:3306 on view 17177509283958545:34.'
2024-07-13T00:01:36.819420+08:00 951 Plugin group_replication reported: 'Setting super_read_only=ON.'
2024-07-13T00:01:36.819859+08:00 118 Plugin group_replication reported: 'The member action "mysql_disable_super_read_only_if_primary" for event "AFTER_PRIMARY_ELECTION" with priority "1" will be run.'
2024-07-13T00:01:36.819891+08:00 118 Plugin group_replication reported: 'Setting super_read_only=OFF.'
2024-07-13T00:01:36.820067+08:00 118 Plugin group_replication reported: 'The member action "mysql_start_failover_channels_if_primary" for event "AFTER_PRIMARY_ELECTION" with priority "10" will be run.'
2024-07-13T00:01:36.831148+08:00 951 Plugin group_replication reported: 'This server is working as primary member.'
2024-07-13T00:01:36.831180+08:00 50 Plugin group_replication reported: 'Primary had applied all relay logs, disabled conflict detection.'
2024-07-13T00:01:36.884938+08:00 0 Plugin group_replication reported: ' set local notify true when site is different'
2024-07-13T00:01:36.884959+08:00 0 Plugin group_replication reported: ' call deliver_view_msg in detector'
2024-07-13T00:01:36.884979+08:00 0 Plugin group_replication reported: ' xcom_receive_local_view is called'
2024-07-13T00:01:36.884997+08:00 0 Plugin group_replication reported: 'on_suspicions is activated'
2024-07-13T00:01:36.885013+08:00 0 Plugin group_replication reported: 'on_suspicions is called over'
2024-07-13T00:01:36.885025+08:00 0 Plugin group_replication reported: ' xcom_receive_local_view return true'
2024-07-13T00:01:37.337731+08:00 0 Plugin greatdb_ha reported: 'try to bind vip : 10.0.30.240 success'
2024-07-13T00:01:39.189221+08:00 0 Plugin group_replication reported: ' set CON_NULL for fd:326 in close_connection'
2024-07-13T00:01:39.189307+08:00 0 Plugin group_replication reported: ' Failure reading from fd=329 n=0 from 10.0.20.21:33061'
2024-07-13T00:01:39.189320+08:00 0 Plugin group_replication reported: ' set CON_NULL for fd:329 in close_connection'
2024-07-13T00:01:39.189337+08:00 0 Plugin group_replication reported: ' fast_skip_allowed_for_kill is set here for server:10.0.20.21,port:33061'
2024-07-13T00:03:25.160020+08:00 0 Plugin group_replication reported: ' Adding new node to the configuration: 10.0.20.21:33061'
2024-07-13T00:03:25.160077+08:00 0 Plugin group_replication reported: ' handle_add_node calls site_install_action'
2024-07-13T00:03:25.160474+08:00 0 Plugin group_replication reported: ' update_servers is called, max nodes:3'
2024-07-13T00:03:25.160489+08:00 0 Plugin group_replication reported: ' Updating physical connections to other servers'
2024-07-13T00:03:25.160500+08:00 0 Plugin group_replication reported: ' Using existing server node 0 host 10.0.20.20:33061'
2024-07-13T00:03:25.160515+08:00 0 Plugin group_replication reported: ' Using existing server node 1 host 10.0.20.22:33061'
2024-07-13T00:03:25.160524+08:00 0 Plugin group_replication reported: ' Using existing server node 2 host 10.0.20.21:33061'
2024-07-13T00:03:25.160537+08:00 0 Plugin group_replication reported: ' Sucessfully installed new site definition. Start synode for this configuration is {f6b80450 57136804 0}, boot key synode is {f6b80450 57136793 0}, configured event horizon=10, my node identifier is 1'
2024-07-13T00:03:25.160690+08:00 0 Plugin group_replication reported: ' Connecting to 10.0.20.21:33061'
2024-07-13T00:03:25.160801+08:00 0 Plugin group_replication reported: ' Connected to 10.0.20.21:33061'
2024-07-13T00:03:25.160830+08:00 0 Plugin group_replication reported: ' sender_task sets CON_PROTO for fd:131'
2024-07-13T00:03:25.160847+08:00 0 Plugin group_replication reported: ' sent negotiation request for protocol 10 fd 131'
2024-07-13T00:03:25.233188+08:00 0 Plugin group_replication reported: ' read_msg sets CON_PROTO for fd:131 in mark, tag:314'
2024-07-13T00:03:25.259965+08:00 0 Plugin group_replication reported: ' proto is done for fd:131'
2024-07-13T00:03:25.689791+08:00 0 Plugin group_replication reported: ' before deliver_global_view_msg is called'
2024-07-13T00:03:25.689821+08:00 0 Plugin group_replication reported: ' after deliver_global_view_msg is called'
在发生切换之后 必须要手动刷新arp 之后才能正常访问vip
其中有一个疑问report_host 的配置 是要配置为同步网卡的ip还是业务网卡的ip
1. 先回答第二个问题
report_host 设置的IP要和 group_replication_local_address 指定的IP一样。即:要上报的IP和指定的MGR节点IP一致。
2. 第一个问题
请尝试用手工方式绑定VIP,看看能否自动刷新ARP表,先确认在OS层这个工作是否正常。如果手动绑定VIP后,ARP能自动刷新,那再有较大理由怀疑是greatdb_ha Plugin的问题。 yejr 发表于 2024-7-15 10:51
1. 先回答第二个问题
report_host 设置的IP要和 group_replication_local_address 指定的IP一样。即:要上 ...
2.手动 arp 刷新之后能通
发生切换之后执行
/usr/sbin/arping -I bond0 -c 3 -s 10.0.30.240 10.0.30.1
mabai 发表于 2024-7-15 10:54
2.手动 arp 刷新之后能通
发生切换之后执行
手动执行完arp刷新之后可以正常通 yejr 发表于 2024-7-15 10:51
1. 先回答第二个问题
report_host 设置的IP要和 group_replication_local_address 指定的IP一样。即:要上 ...
手动绑定 可以直接通 mabai 发表于 2024-7-15 11:04
手动绑定 可以直接通
手动绑定的具体操作是怎样的呢,帮忙提供下详细过程及其输出等内容 本帖最后由 mabai 于 2024-7-15 13:59 编辑
yejr 发表于 2024-7-15 13:53
手动绑定的具体操作是怎样的呢,帮忙提供下详细过程及其输出等内容
直接通过命令/sbin/ifconfig eth0:1 10.0.20.3/24 ,没有报错,然后通过ip a查看 有绑定到eth0 网卡
绑定完成之后再其他服务器ping10.0.20.3是可以ping通的
mabai 发表于 2024-7-15 13:57
直接通过命令/sbin/ifconfig eth0:1 10.0.20.3/24 ,没有报错,然后通过ip a查看 有绑定到eth0 网卡 ...
你手动绑定的VIP是:10.0.20.3(eth0:1),
但看你上面贴的配置中要绑定的VIP是:10.0.30.240和10.0.30.241(网卡 bond0),
二者的条件并不一样,请先尝试手动绑定成这两个VIP试试看呢 yejr 发表于 2024-7-15 14:16
你手动绑定的VIP是:10.0.20.3(eth0:1),
但看你上面贴的配置中要绑定的VIP是:10.0.30.240和10.0.30.2 ...
因为产线数据库现在正在使用这两个ip所以没法再去绑定 这两个ip只能用其他未使用的ip
执行
/sbin/ifconfig bond0:1 10.0.20.242/24
然后再其他的服务器上ping 可以直接ping通 mabai 发表于 2024-7-15 14:26
因为产线数据库现在正在使用这两个ip所以没法再去绑定 这两个ip只能用其他未使用的ip
执行
/sbin/ip a add 10.0.30.242/24 dev bond0
麻烦用上面的方法添加新VIP(注意确认这个VIP是否已被使用,并且要求是10.0.30.x网段的)后试试看能不能直接ping通
页:
[1]
2