GreatSQL社区

搜索

[已解决] 华为欧拉2203sp1容器内启动group_replication,数据库就崩溃重启

861 14 2024-6-6 19:52
greatsql数据库安装在华为欧拉2203sp1.x86_64虚拟机里面的docker容器内,容器的系统也是华为欧拉2203sp1.x86_64:


安装部署的greatsql版本是8.0.32-25,目前的最新版本:


一共部署了三台虚拟机,每台虚拟机的欧拉容器内的etc/hosts都已经加了解析:

172.16.6.178 172-16-6-178
172.16.6.140 172-16-6-140
172.16.6.169 172-16-6-169


数据库my.cnf的配置文件内容如下:

datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
slow_query_log = ON
long_query_time = 1
log_slow_verbosity = FULL
log_error_verbosity = 3

max_connections = 12000
innodb_buffer_pool_size = 3G
innodb_log_file_size = 1G
innodb_file_per_table = 1
innodb_flush_method = O_DIRECT
tmp_table_size = 32M
max_heap_table_size = 32M
thread_cache_size = 200
table_open_cache = 20
open_files_limit = 65535
sql-mode = NO_ENGINE_SUBSTITUTION

# add
binlog-format=row
binlog_checksum=CRC32
binlog_transaction_dependency_tracking=writeset
enforce-gtid-consistency=true
gtid-mode=on
log-bin=/var/lib/mysql/mysql-bin
log_slave_updates=ON
loose-greatdb_ha_enable_mgr_vip=1
loose-greatdb_ha_mgr_vip_ip=172.16.6.1
loose-greatdb_ha_mgr_vip_mask='255.255.0.0'
loose-greatdb_ha_mgr_vip_nic='ens3'
loose-greatdb_ha_send_arp_package_times=5
loose-group-replication-ip-whitelist="172.16.6.178,172.16.6.140,172.16.6.169"
loose-group_replication_bootstrap_group=off
loose-group_replication_exit_state_action=READ_ONLY
loose-group_replication_flow_control_mode="DISABLED"
loose-group_replication_group_name="1b6241bb-77b7-40e2-9157-fa33a4b2df89"
loose-group_replication_group_seeds="172.16.6.178:33061,172.16.6.140:33061,172.16.6.169:33061"
loose-group_replication_local_address="172.16.6.178:33061"
loose-group_replication_single_primary_mode=ON
loose-group_replication_start_on_boot=off
loose-plugin_load_add='greatdb_ha.so'
loose-plugin_load_add='group_replication.so'
loose-plugin_load_add='mysql_clone.so'
master-info-repository=TABLE
relay-log-info-repository=TABLE
relay_log_recovery=on
# 每个节点的server_id要唯一
server_id=1
slave_checkpoint_period=2
slave_parallel_type=LOGICAL_CLOCK
slave_parallel_workers=128
slave_preserve_commit_order=1
sql_require_primary_key=1
transaction_write_set_extraction=XXHASH64


每个节点也配置了集群账号:
create user repl@'%' identified with mysql_native_password by 'Ch345@123';
grant replication slave, backup_admin on *.* to 'repl'@'%';
change master to master_user='repl', master_password='Ch345' for channel 'group_replication_recovery';


在主控节点启用集群执行到第二句就报错:
set global group_replication_bootstrap_group=ON;
start group_replication;

查看数据库的日志,日志中显示数据库遇到了致命错误,已崩溃:

2024-06-06T08:46:34.281353Z 154 [Note] [MY-011071] [Repl] Plugin group_replication reported: 'build_donor_list is called'
2024-06-06T08:46:34.281380Z 154 [Note] [MY-011071] [Repl] Plugin group_replication reported: 'build_donor_list is called over, size:0'
2024-06-06T08:46:34.281413Z 0 [Note] [MY-011071] [Repl] Plugin group_replication reported: 'handle_leader_election_if_needed is activated,suggested_primary:'
2024-06-06T08:46:34.281445Z 0 [System] [MY-011565] [Repl] Plugin group_replication reported: 'Setting super_read_only=ON.'
2024-06-06T08:46:34.281458Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to 172-16-6-178:3306 on view 17176635942809729:1.'
2024-06-06T08:46:34.281475Z 154 [Note] [MY-011623] [Repl] Plugin group_replication reported: 'Only one server alive. Declaring this server as online within the replication group'
2024-06-06T08:46:34.282537Z 21 [Note] [MY-011071] [Repl] Plugin group_replication reported: 'continue to process queue after suspended'
2024-06-06T08:46:34.282569Z 21 [Note] [MY-011071] [Repl] Plugin group_replication reported: 'before getting certification info in log_view_change_event_in_order'
2024-06-06T08:46:34.282581Z 21 [Note] [MY-011071] [Repl] Plugin group_replication reported: 'after setting certification info in log_view_change_event_in_order'
2024-06-06T08:46:34.306474Z 0 [System] [MY-011490] [Repl] Plugin group_replication reported: 'This server was declared online within the replication group.'
2024-06-06T08:46:34.306595Z 0 [Note] [MY-013519] [Repl] Plugin group_replication reported: 'Elected primary member gtid_executed: 1b6241bb-77b7-40e2-9157-fa33a4b2df89:1'
2024-06-06T08:46:34.306612Z 0 [Note] [MY-013519] [Repl] Plugin group_replication reported: 'Elected primary member applier channel received_transaction_set: 1b6241bb-77b7-40e2-9157-fa33a4b2df89:1'
2024-06-06T08:46:34.306620Z 0 [System] [MY-011507] [Repl] Plugin group_replication reported: 'A new primary with address 172-16-6-178:3306 was elected. The new primary will execute all previous group transactions before allowing writes.'
2024-06-06T08:46:34Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=fa0a5c40a97c29e73be6d445cdc39dbd25c6e45c
Build ID: fa0a5c40a97c29e73be6d445cdc39dbd25c6e45c
Server Version: 8.0.32-25 GreatSQL (GPL), Release 25, Revision db07cc5cb73
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x100000
2024-06-06T08:46:34.308180Z 155 [System] [MY-011565] [Repl] Plugin group_replication reported: 'Setting super_read_only=ON.'
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x235bfcd]
/usr/sbin/mysqld(print_fatal_signal(int)+0x3e3) [0x136ca03]
/usr/sbin/mysqld(handle_fatal_signal+0xc5) [0x136cad5]
/usr/lib64/libc.so.6(+0x40de0) [0x7fb59b7cfde0]
/usr/lib64/libc.so.6(+0x8ceef) [0x7fb59b81beef]
/usr/lib64/libc.so.6(raise+0x16) [0x7fb59b7cfd36]
/usr/lib64/libc.so.6(abort+0xd7) [0x7fb59b7bb177]
/usr/sbin/mysqld() [0xe905f2]
/usr/lib64/mysql/plugin/greatdb_ha.so(+0xe7e5) [0x7fb58c2617e5]
/usr/lib64/mysql/plugin/greatdb_ha.so(+0x141eb) [0x7fb58c2671eb]
/usr/lib64/mysql/plugin/greatdb_ha.so(+0x16269) [0x7fb58c269269]
/usr/lib64/mysql/plugin/greatdb_ha.so(+0x17d41) [0x7fb58c26ad41]
/usr/lib64/libc.so.6(+0x8b31a) [0x7fb59b81a31a]
/usr/lib64/libc.so.6(+0x10da80) [0x7fb59b89ca80]
Please help us make Percona Server better by reporting any
bugs at https://bugs.percona.com/
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
查看插件的状态都是正常:
show plugins;   


            
是不是greatsql最新版本目前还不支持在欧拉系统启用MGR?

【问题处理最新进展】
greatsql官方维护人员回复说是VIP功能导致的崩溃,我尝试禁用掉viploose-greatdb_ha_enable_mgr_vip=0 后集群可以开启


在容器里面setcap加上NET_ADMIN的权限就能设vip,我们在centos7.9容器里面是可以正常设置vip并开启集群的,这个欧拉里面就不行,
但是我们现在因国产化替代,需要在欧拉容器中使用到集群vip,有没有大佬知道该如何开启集群vip又规避这个问题?


问题链接:
greatsql issue:https://gitee.com/GreatSQL/GreatSQL/issues/I9VTF8
欧拉kernel issue:https://gitee.com/openeuler/kernel/issues/I9VXDB













全部回复(14)
yejr 2024-6-7 09:03:30
应该是在容器中不支持VIP特性导致的。

一般而言,容器中想要自行绑定VIP需要获得宿主授权,通常不这么玩。

另外,既然用到容器,感觉更应该采用类似k8s这种做法,而不是在容器里搞VIP。
yejr 2024-6-7 09:11:50
whx 2024-6-7 09:21:15
yejr 发表于 2024-6-7 09:03
应该是在容器中不支持VIP特性导致的。

一般而言,容器中想要自行绑定VIP需要获得宿主授权,通常不这么玩。 ...

感谢大佬的支持,用容器是为了减少适配成本,在容器里面setcap加上NET_ADMIN的权限就能设vip,我们在centos7.9容器里面是可以正常设置vip并开启集群的,这个欧拉里面就不行,但是我们现在因国产化替代,必须要在容器中使用欧拉的系统,大佬有没有什么处理思路
yejr 2024-6-7 09:29:15
whx 发表于 2024-6-7 09:21
感谢大佬的支持,用容器是为了减少适配成本,在容器里面setcap加上NET_ADMIN的权限就能设vip,我们在cent ...

是否可以试试,CentOS 8.x的也支持这么做吗

另外,感觉你们这种做法有点掩耳盗铃啊,上层还用CentOS,容器里跑欧拉,好神奇的玩法
whx 2024-6-7 09:37:49
yejr 发表于 2024-6-7 09:29
是否可以试试,CentOS 8.x的也支持这么做吗

另外,感觉你们这种做法有点掩耳盗铃啊,上层还用CentOS,容 ...

虚拟机的系统和容器的系统都是欧拉2203sp1的,已经在帖子里说了
yejr 2024-6-7 10:11:47
whx 发表于 2024-6-7 09:37
虚拟机的系统和容器的系统都是欧拉2203sp1的,已经在帖子里说了

抱歉,抱歉,我理解错了
欧拉社区那边有解决方案的话也麻烦也给同步下哈
reddey 2024-6-7 11:20:26
yejr 发表于 2024-6-7 09:03
应该是在容器中不支持VIP特性导致的。

一般而言,容器中想要自行绑定VIP需要获得宿主授权,通常不这么玩。 ...

在K8S集群中,由于自身具有集群特性,可以自由扩展节点,提供系统伸缩性能。VIP特性,它可以不用。
一个学艺不精的国产数据库爱好者
yejr 2024-6-7 12:12:24
whx 发表于 2024-6-7 09:21
感谢大佬的支持,用容器是为了减少适配成本,在容器里面setcap加上NET_ADMIN的权限就能设vip,我们在cent ...

https://gitee.com/src-openeuler/greatsql/ 这个项目也是我们在维护的

https://gitee.com/src-openeuler/ ... ?from=project-issue  这是欧拉社区技术支持建议你提交的吗,感觉应该提交给欧拉kernel维护者更合适,也就是需要在容器中获得提权能绑定VIP
whx 2024-6-7 13:09:28
yejr 发表于 2024-6-7 12:12
https://gitee.com/src-openeuler/greatsql/ 这个项目也是我们在维护的

https://gitee.com/src-openeule ...

哦哦,没注意,那你可以删掉那个issue,我已经在欧拉kernel下面提了同样的问题
12下一页
whx

1

主题

0

博客

7

贡献

新手上路

Rank: 1

积分
13

合作电话:010-64087828

社区邮箱:greatsql@greatdb.com

社区公众号
社区小助手
QQ群
GMT+8, 2024-11-21 21:29 , Processed in 0.027558 second(s), 23 queries , Redis On.
快速回复 返回顶部 返回列表