GreatSQL社区

搜索

chongzh

GreatSQL MGR 集群故障转移和恢复场景

chongzh 已有 385 次阅读2023-8-31 08:46 |个人分类:Mysql MGR|系统分类:运维实战

这篇博客文章将关注使用shell,完成GreatSQL MGR 集群的故障转移和恢复场景。

一、GreatSQL MGR 集群故障转移

通过MySQLShell连接到第一个集群(“GreatSQLMGR”)的任何节点,并获取详细信息。


c=dba.get_cluster();
c.status();

现在执行从实例(“172.26.170.14:3306”)到(“172.26.170.13:3306”)的主切换。


c=dba.get_cluster();
c.set_primary_instance('172.26.170.13:3306');

实例(“172.26.170.13:3306”)将显示状态为 primary

c.status();

二、如何重新加入丢失的实例

如果由于某种原因,一个实例离开集群或失去连接,并且不能自动重新加入集群,我们可能需要通过发出“Cluster.rejoinInstance(实例)“命令。这里,我们将尝试创建一个小示例来演示该命令的用法。

  • 通过停止实例上的组复制来创建一些阻止程序(“172.26.170.15:3306”)。

MySQL  172.26.170.15:3306 ssl  Py > \sql
Switching to SQL mode... Commands end with ;
 MySQL  172.26.170.15:3306 ssl  SQL >  stop group_replication;

查看下面的信息,我们可以看到实例(“172.26.170.15:3306”)显示“丢失”状态。

现在,用rejoin_instance()命令再次添加实例(“172.26.170.15:3306”)。

MySQL  172.26.170.13:3306 ssl  Py > c=dba.get_cluster();
 MySQL  172.26.170.13:3306 ssl  Py > c.rejoin_instance('172.26.170.15:3306');
Rejoining instance '172.26.170.15:3306' to cluster 'GreatSQLMGR'...
The instance '172.26.170.15:3306' was successfully rejoined to the cluster.

三、如何从重大故障中恢复整个集群

有时,即使所有节点都已启动,并且发生了一些内部问题,如组复制停滞或一些网络问题,您也可能会遇到完全中断,无法在群集上执行任何写入/活动。

在这种情况下,您可以使用任何一个节点并使用其元数据来恢复群集。您需要连接到最新的实例,否则,您可能会丢失数据或在集群节点中出现不一致。

让我们看看如何手动引入这种情况,然后尝试修复它。

MySQL  172.26.170.13:3306 ssl  andy  SQL > stop group_replication;
 MySQL  172.26.170.14:3306 ssl  andy  SQL > stop group_replication;
 MySQL  172.26.170.15:3306 ssl  andy  SQL > stop group_replication;

此时,MGR集群的所有节点都处于 OFFLINE ,不可写状态

随机连接一个MGR节点,执行 reboot_cluster_from_complete_outage(),

每个输入项为 y,可以看到当执行命令的节点GTID非最多时,会明确提示你在哪个正确节点(GTID最多的节点)执行启动MGR集群命令。提示信息如下RuntimeError: Dba.reboot_cluster_from_complete_outage: The active session instance (172.26.170.14:3306) isn't the most updated in comparison with the ONLINE instances of the Cluster's metadata. Please use the most up to date instance: '172.26.170.13:3306'

MySQL  172.26.170.14:3306 ssl  Py > dba.reboot_cluster_from_complete_outage()
Restoring the default cluster from complete outage...

The instance '172.26.170.13:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y

The instance '172.26.170.15:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y

Traceback (most recent call last):
  File "", line 1, in 
RuntimeError: Dba.reboot_cluster_from_complete_outage: The active session instance (172.26.170.14:3306) isn't the most updated in comparison with the ONLINE instances of the Cluster's metadata. Please use the most up to date instance: '172.26.170.13:3306'.

GTID最多的节点 执行启动MGR集群命令

MySQL  172.26.170.13:3306 ssl  Py > dba.reboot_cluster_from_complete_outage()
Restoring the default cluster from complete outage...

The instance '172.26.170.14:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y

The instance '172.26.170.15:3306' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y

172.26.170.13:3306 was restored.
Rejoining '172.26.170.14:3306' to the cluster.
Rejoining instance '172.26.170.14:3306' to cluster 'GreatSQLMGR'...
The instance '172.26.170.14:3306' was successfully rejoined to the cluster.

Rejoining '172.26.170.15:3306' to the cluster.
Rejoining instance '172.26.170.15:3306' to cluster 'GreatSQLMGR'...
The instance '172.26.170.15:3306' was successfully rejoined to the cluster.

The cluster was successfully rebooted.

查看MGR集群状态,ok!

MySQL  172.26.170.13:3306 ssl  SQL > SELECT MEMBER_ID, MEMBER_HOST, MEMBER_PORT, MEMBER_STATE, IF(global_status.VARIABLE_NAME IS NOT NULL, 'PRIMARY', 'SECONDARY') AS MEMBER_ROLE FROM performance_schema.replication_group_members LEFT JOIN performance_schema.global_status ON global_status.VARIABLE_NAME = 'group_replication_primary_member' AND global_status.VARIABLE_VALUE = replication_group_members.MEMBER_ID; 
+--------------------------------------+---------------+-------------+--------------+-------------+
| MEMBER_ID                            | MEMBER_HOST   | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE |
+--------------------------------------+---------------+-------------+--------------+-------------+
| 3ad980b6-3e47-11ee-84b3-005056b261a4 | 172.26.170.13 |        3306 | ONLINE       | PRIMARY     |
| 63b7cca8-3e47-11ee-a9a0-005056b23879 | 172.26.170.14 |        3306 | ONLINE       | SECONDARY   |
| 63ce874f-3e47-11ee-832d-005056b22d45 | 172.26.170.15 |        3306 | ONLINE       | SECONDARY   |
+--------------------------------------+---------------+-------------+--------------+-------------+
3 rows in set (0.0012 sec)



评论 (0 个评论)

facelist

您需要登录后才可以评论 登录 | 立即注册

合作电话:010-64087828

社区邮箱:greatsql@greatdb.com

社区公众号
社区小助手
QQ群
GMT+8, 2024-5-4 11:02 , Processed in 0.014336 second(s), 8 queries , Redis On.