|
首先,MySQL服务利用 systemd 即可实现故障后自启动,注意下面这个配置即可:
- [root@GreatSQL ~]# cat /usr/lib/systemd/system/greatsql.service
- ...
- Restart=on-failure
复制代码 其次,mysqld进程启动后,想要实现MGR的自动选主及自启动也是可以的,利用MySQL Shell即可,例如:
- [root@GreatSQL ~]# mysqlsh --uri greatsql@yejr-mgr3:3306
- ...
- -- 不管干啥,都先看 help,这是玩转Linux的必备素养,啥事不清楚都先找男人(man)
- -- 注意到有这样的一个方法 rebootClusterFromCompleteOutage(),看起来没跑了
- MySQL yejr-mgr3:3306 ssl JS > \help dba rebootClusterFromCompleteOutage([clusterName][, options])
- Brings a cluster back ONLINE when all members are OFFLINE.
- -- 跑一个试试看
- MySQL yejr-mgr3:3306 ssl JS > dba.rebootClusterFromCompleteOutage()
- Restoring the default cluster from complete outage...
- The instance 'yejr-mgr4:3306' was part of the cluster configuration.
- Would you like to rejoin it to the cluster? [y/N]: y
- The instance 'yejr-mgr2:3306' was part of the cluster configuration.
- Would you like to rejoin it to the cluster? [y/N]: y
- Dba.rebootClusterFromCompleteOutage: The active session instance (yejr-mgr3:3306) isn't the most updated in comparison with the ONLINE instances of the Cluster's metadata. Please use the most up to date instance: 'yejr-mgr4:3306'. (RuntimeError)
复制代码 可以看到错误信息提示我们当前节点上没有最新的数据,不能直接启动MGR,错误信息中还提供了该去哪个节点启动的建议,所以我们改成在 yejr-mgr4 节点上执行拉起MGR:
- [root@GreatSQL ~]# mysqlsh --uri greatsql@yejr-mgr3:3306
- ...
- MySQL yejr-mgr4:3306 ssl JS > dba.rebootClusterFromCompleteOutage()
- Restoring the default cluster from complete outage...
- The instance 'yejr-mgr3:3306' was part of the cluster configuration.
- Would you like to rejoin it to the cluster? [y/N]: y
- The instance 'yejr-mgr2:3306' was part of the cluster configuration.
- Would you like to rejoin it to the cluster? [y/N]: y
- yejr-mgr4:3306 was restored.
- Rejoining 'yejr-mgr3:3306' to the cluster.
- Rejoining instance 'yejr-mgr3:3306' to cluster 'GreatSQLMGR'...
- The instance 'yejr-mgr3:3306' was successfully rejoined to the cluster.
- Rejoining 'yejr-mgr2:3306' to the cluster.
- Rejoining instance 'yejr-mgr2:3306' to cluster 'GreatSQLMGR'...
- The instance 'yejr-mgr2:3306' was successfully rejoined to the cluster.
- The cluster was successfully rebooted.
复制代码 可以看到,MGR集群已经被正常启动了。
上面是利用MySQL Shell启动一个发生过故障的MGR集群,如果是手动的话该怎么办呢?
首先,在各个节点执行下面的SQL,确认各节点当前的事务执行情况:
- -- yejr-mgr2节点
- root@GreatSQL [none]> select RECEIVED_TRANSACTION_SET from performance_schema.replication_connection_status where
- channel_name = 'group_replication_applier' union all
- select variable_value from performance_schema.global_variables where
- variable_name = 'gtid_executed'\G
- *************************** 1. row ***************************
- RECEIVED_TRANSACTION_SET:
- *************************** 2. row ***************************
- RECEIVED_TRANSACTION_SET: 1c293e90-3bdc-11ec-bca1-525400e2078a:1-4537605,
- 4b7b3b88-3b13-11ec-86e9-525400e2078a:1
- -- yejr-mgr3节点
- ...
- *************************** 1. row ***************************
- RECEIVED_TRANSACTION_SET:
- *************************** 2. row ***************************
- RECEIVED_TRANSACTION_SET: 1c293e90-3bdc-11ec-bca1-525400e2078a:1-4542304,
- 4b7b3b88-3b13-11ec-86e9-525400e2078a:1
- -- yejr-mgr4节点
- ...
- *************************** 1. row ***************************
- RECEIVED_TRANSACTION_SET:
- *************************** 2. row ***************************
- RECEIVED_TRANSACTION_SET: 1c293e90-3bdc-11ec-bca1-525400e2078a:1-4652391,
- 4b7b3b88-3b13-11ec-86e9-525400e2078a:1
复制代码 从上面的结果可以看到,yejr-mgr4 节点上已执行完的事务GTID值最大:4652391 > 4542304 > 4537605,因此应该选择 yejr-mgr4 节点作为 Primary 节点。
将该节点设置为引导模式,然后启动MGR服务:
- [root@GreatSQL ~]# mysql -hyejr-mgr4 -P3306 -ugreatsql -p
- ...
- greatsql@mgr4:3306 [(none)]>set global group_replication_bootstrap_group=ON;
- greatsql@mgr4:3306 [(none)]>start group_replication;
- -- 启动完MGR后,记得立即将其设置为OFF
- greatsql@mgr4:3306 [(none)]>set global group_replication_bootstrap_group=OFF;
复制代码 在其他节点上,则直接启动MGR服务即可,切记无需再次设置引导模式,否则它就会变成一个全新的MGR集群的Primary节点了。
好了,自动、手动两种方式拉起一个故障MGR集群方法都介绍完毕了。
|
|