beckhann 发表于 2022-8-25 14:18:18

greatsql mgr双节点同时UNREACHABLE

版本Server version: 8.0.25-16 GreatSQL,3节点mgr
操作系统:redhat7.9



+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME            | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
| group_replication_applier | f7331775-fb83-11ec-863d-005056bbad6b | 10.13.1.51|      3306 | ONLINE       | PRIMARY   | 8.0.25         |
| group_replication_applier | fb900aa8-fb83-11ec-ab6d-005056bbbae4 | 10.13.1.52|      3306 | UNREACHABLE| SECONDARY   | 8.0.25         |
| group_replication_applier | fc8f8547-fb83-11ec-9b2f-005056bbde88 | 10.13.1.53|      3306 | UNREACHABLE| SECONDARY   | 8.0.25         |
+---------------------------+--------------------------------------+-------------+-------------+--------------+-------------+----------------+
3 rows in set (0.00 sec)


两个节点报错日志如下:

2022-08-25T11:36:53.788561+08:00 14 Cannot allocate 1073741856 bytes of memory after 60 retries over 60 seconds. OS error: Cannot allocate memory (12). Check if you should increase the swap file or ulimits of your operating system. Note that on most 32-bit computers the process memory space is limited to 2 GB or 4 GB.
2022-08-25T11:36:53.788701+08:00 14 Assertion failure: ut0ut.cc:565 thread 139985073673984
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
03:36:53 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.


yejr 发表于 2022-8-25 14:24:43

1.系统环境配置是什么样的,尤其是内存多大。
2.什么情况下发生crash的,也就是运行什么业务之后crash,当时并发连接数多大。
3.上述crash场景是否可稳定复现。

beckhann 发表于 2022-8-25 14:30:58

yejr 发表于 2022-8-25 14:24
1.系统环境配置是什么样的,尤其是内存多大。
2.什么情况下发生crash的,也就是运行什么业务之后crash,当 ...

一个节点4C8G
当时只是测试插入大概8万数据,然后commit,发现commit不上,我就怀疑节点挂了,附上图。

yejr 发表于 2022-8-25 14:40:43

beckhann 发表于 2022-8-25 14:30
一个节点4C8G
当时只是测试插入大概8万数据,然后commit,发现commit不上,我就怀疑节点挂了,附上图。 ...

1.在crash的节点上观察mysqld进程内存增长情况。
2.参考文档 https://gitee.com/GreatSQL/GreatSQL-Doc/blob/master/deep-dive-mgr/deep-dive-mgr-06.md 里提到的监控方法,观察MGR中从节点事务排队情况。

yejr 发表于 2022-8-25 14:44:13

此时MGR集群的恢复方式参考文档 https://gitee.com/GreatSQL/GreatSQL-Doc/blob/master/deep-dive-mgr/deep-dive-mgr-15.md#3-%E5%A4%9A%E6%95%B0%E6%B4%BE%E6%88%90%E5%91%98%E5%A4%B1%E8%81%94%E6%97%B6
页: [1]
查看完整版本: greatsql mgr双节点同时UNREACHABLE