GreatSQL社区

搜索

MGR整个集群挂掉后,如何才能自动选主?

2450 1 2022-8-18 17:01
MGR集群所有节点都宕机后,集群重启,能否自动选主和启动MGR服务?
全部回复(1)
GreatSQL万答 2022-8-18 17:07:40
首先,MySQL服务利用 systemd 即可实现故障后自启动,注意下面这个配置即可:
  1. [root@GreatSQL ~]# cat /usr/lib/systemd/system/greatsql.service
  2. ...
  3. Restart=on-failure
复制代码
其次,mysqld进程启动后,想要实现MGR的自动选主及自启动也是可以的,利用MySQL Shell即可,例如:
  1. [root@GreatSQL ~]# mysqlsh --uri greatsql@yejr-mgr3:3306
  2. ...
  3. -- 不管干啥,都先看 help,这是玩转Linux的必备素养,啥事不清楚都先找男人(man)
  4. -- 注意到有这样的一个方法 rebootClusterFromCompleteOutage(),看起来没跑了

  5. MySQL  yejr-mgr3:3306 ssl  JS > \help dba      rebootClusterFromCompleteOutage([clusterName][, options])
  6.             Brings a cluster back ONLINE when all members are OFFLINE.

  7. -- 跑一个试试看
  8. MySQL  yejr-mgr3:3306 ssl  JS > dba.rebootClusterFromCompleteOutage()
  9. Restoring the default cluster from complete outage...

  10. The instance 'yejr-mgr4:3306' was part of the cluster configuration.
  11. Would you like to rejoin it to the cluster? [y/N]: y

  12. The instance 'yejr-mgr2:3306' was part of the cluster configuration.
  13. Would you like to rejoin it to the cluster? [y/N]: y

  14. Dba.rebootClusterFromCompleteOutage: The active session instance (yejr-mgr3:3306) isn't the most updated in comparison with the ONLINE instances of the Cluster's metadata. Please use the most up to date instance: 'yejr-mgr4:3306'. (RuntimeError)
复制代码
可以看到错误信息提示我们当前节点上没有最新的数据,不能直接启动MGR,错误信息中还提供了该去哪个节点启动的建议,所以我们改成在 yejr-mgr4 节点上执行拉起MGR:
  1. [root@GreatSQL ~]# mysqlsh --uri greatsql@yejr-mgr3:3306
  2. ...
  3. MySQL  yejr-mgr4:3306 ssl  JS > dba.rebootClusterFromCompleteOutage()
  4. Restoring the default cluster from complete outage...

  5. The instance 'yejr-mgr3:3306' was part of the cluster configuration.
  6. Would you like to rejoin it to the cluster? [y/N]: y

  7. The instance 'yejr-mgr2:3306' was part of the cluster configuration.
  8. Would you like to rejoin it to the cluster? [y/N]: y

  9. yejr-mgr4:3306 was restored.
  10. Rejoining 'yejr-mgr3:3306' to the cluster.
  11. Rejoining instance 'yejr-mgr3:3306' to cluster 'GreatSQLMGR'...
  12. The instance 'yejr-mgr3:3306' was successfully rejoined to the cluster.

  13. Rejoining 'yejr-mgr2:3306' to the cluster.
  14. Rejoining instance 'yejr-mgr2:3306' to cluster 'GreatSQLMGR'...
  15. The instance 'yejr-mgr2:3306' was successfully rejoined to the cluster.

  16. The cluster was successfully rebooted.
复制代码
可以看到,MGR集群已经被正常启动了。
上面是利用MySQL Shell启动一个发生过故障的MGR集群,如果是手动的话该怎么办呢?
首先,在各个节点执行下面的SQL,确认各节点当前的事务执行情况:
  1. -- yejr-mgr2节点
  2. root@GreatSQL [none]> select RECEIVED_TRANSACTION_SET from performance_schema.replication_connection_status where
  3. channel_name = 'group_replication_applier' union all
  4. select variable_value from performance_schema.global_variables where
  5. variable_name = 'gtid_executed'\G
  6. *************************** 1. row ***************************
  7. RECEIVED_TRANSACTION_SET:
  8. *************************** 2. row ***************************
  9. RECEIVED_TRANSACTION_SET: 1c293e90-3bdc-11ec-bca1-525400e2078a:1-4537605,
  10. 4b7b3b88-3b13-11ec-86e9-525400e2078a:1

  11. -- yejr-mgr3节点
  12. ...
  13. *************************** 1. row ***************************
  14. RECEIVED_TRANSACTION_SET:
  15. *************************** 2. row ***************************
  16. RECEIVED_TRANSACTION_SET: 1c293e90-3bdc-11ec-bca1-525400e2078a:1-4542304,
  17. 4b7b3b88-3b13-11ec-86e9-525400e2078a:1

  18. -- yejr-mgr4节点
  19. ...
  20. *************************** 1. row ***************************
  21. RECEIVED_TRANSACTION_SET:
  22. *************************** 2. row ***************************
  23. RECEIVED_TRANSACTION_SET: 1c293e90-3bdc-11ec-bca1-525400e2078a:1-4652391,
  24. 4b7b3b88-3b13-11ec-86e9-525400e2078a:1
复制代码
从上面的结果可以看到,yejr-mgr4 节点上已执行完的事务GTID值最大:4652391 > 4542304 > 4537605,因此应该选择 yejr-mgr4 节点作为 Primary 节点。
将该节点设置为引导模式,然后启动MGR服务:
  1. [root@GreatSQL ~]# mysql -hyejr-mgr4 -P3306 -ugreatsql -p
  2. ...
  3. greatsql@mgr4:3306 [(none)]>set global group_replication_bootstrap_group=ON;

  4. greatsql@mgr4:3306 [(none)]>start group_replication;

  5. -- 启动完MGR后,记得立即将其设置为OFF
  6. greatsql@mgr4:3306 [(none)]>set global group_replication_bootstrap_group=OFF;
复制代码
在其他节点上,则直接启动MGR服务即可,切记无需再次设置引导模式,否则它就会变成一个全新的MGR集群的Primary节点了。
好了,自动、手动两种方式拉起一个故障MGR集群方法都介绍完毕了。

GreatSQL万答

22

主题

0

博客

73

贡献

版主

Rank: 7Rank: 7Rank: 7

积分
117

合作电话:010-64087828

社区邮箱:greatsql@greatdb.com

社区公众号
社区小助手
QQ群
GMT+8, 2024-9-8 08:55 , Processed in 0.020215 second(s), 15 queries , Redis On.
快速回复 返回顶部 返回列表