|
世界千奇百怪,每个人都有自己独立的思想,有些事情即使你附耳告知,也可能如风般吹过,进而消逝,为了性能为了不延迟,表要加索引嘛,然而在某业务场景,业务表数千张,无索引的表几百张,这些表都是上百万的数据。
在 GreatSQL 主从架构中,某天在系统资源充足的情况下,主从突然延迟,而且持续增长,我们通过SHOW PROCESSLIST
和 SHOW SLAVE STATUS
观察是由于回放 DELETE
事务造成的,但是 GTID``
在不断地增长,不过增长的非常缓慢,但是平时的时候是没有如此缓慢的,我们该如何快速的定位这些回放缓慢的 GTID
的涉及表呢,接下来就在测试环境演示下如何定位。
greatsql> CREATE DATABASE fcmark;
Query OK, 1 row affected (0.03 sec)
$ sysbench --db-driver=mysql --mysql-host=192.168.139.230 --mysql-port=3307 --mysql-user=greatsql --mysql-password=greatsql@2024 --mysql-db=fcmark --table_size=10000000 --tables=5 oltp_write_only prepare
sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)
Creating table 'sbtest1'...
Inserting 10000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'...
Creating table 'sbtest2'...
Inserting 10000000 records into 'sbtest2'
Creating a secondary index on 'sbtest2'...
Creating table 'sbtest3'...
Inserting 10000000 records into 'sbtest3'
Creating a secondary index on 'sbtest3'...
Creating table 'sbtest4'...
Inserting 10000000 records into 'sbtest4'
Creating a secondary index on 'sbtest4'...
Creating table 'sbtest5'...
Inserting 10000000 records into 'sbtest5'
Creating a secondary index on 'sbtest5'...
greatsql> ALTER TABLE sbtest1 MODIFY id int not null;
Query OK, 10000000 rows affected (3 min 2.62 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
greatsql> ALTER TABLE sbtest1 DROP key `k_1`;
Query OK, 0 rows affected (0.15 sec)
Records: 0 Duplicates: 0 Warnings: 0
greatsql> ALTER TABLE sbtest1 DROP PRIMARY KEY;
Query OK, 10000000 rows affected (1 min 50.19 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
greatsql> ALTER TABLE sbtest2 MODIFY id int not null;
Query OK, 10000000 rows affected (3 min 2.62 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
greatsql> ALTER TABLE sbtest2 DROP key `k_2`;
Query OK, 0 rows affected (0.15 sec)
Records: 0 Duplicates: 0 Warnings: 0
greatsql> ALTER TABLE sbtest2 DROP PRIMARY KEY;
Query OK, 10000000 rows affected (1 min 50.19 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
greatsql> ALTER TABLE sbtest3 MODIFY id int not null;
Query OK, 10000000 rows affected (3 min 2.72 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
greatsql> ALTER TABLE sbtest3 DROP key `k_3`;
Query OK, 0 rows affected (0.15 sec)
Records: 0 Duplicates: 0 Warnings: 0
greatsql> ALTER TABLE sbtest3 DROP PRIMARY KEY;
Query OK, 10000000 rows affected (1 min 52.19 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
greatsql> ALTER TABLE sbtest4 MODIFY id int not null;
Query OK, 10000000 rows affected (3 min 3.64 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
greatsql> ALTER TABLE sbtest4 DROP key `k_4`;
Query OK, 0 rows affected (0.15 sec)
Records: 0 Duplicates: 0 Warnings: 0
greatsql> ALTER TABLE sbtest4 DROP PRIMARY KEY;
Query OK, 10000000 rows affected (1 min 53.19 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
greatsql> ALTER TABLE sbtest5 MODIFY id int not null;
Query OK, 10000000 rows affected (3 min 5.62 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
greatsql> ALTER TABLE sbtest5 DROP key `k_5`;
Query OK, 0 rows affected (0.15 sec)
Records: 0 Duplicates: 0 Warnings: 0
greatsql> ALTER TABLE sbtest5 DROP PRIMARY KEY;
Query OK, 10000000 rows affected (1 min 51 sec)
Records: 10000000 Duplicates: 0 Warnings: 0
greatsql> DELETE FROM sbtest1 WHERE k<200;
Query OK, 214 rows affected (41.14 sec)
greatsql> DELETE FROM sbtest2 WHERE k<200;
Query OK, 193 rows affected (41.58 sec)
greatsql> DELETE FROM sbtest3 WHERE k<200;
Query OK, 177 rows affected (40.00 sec)
greatsql> DELETE FROM sbtest4 WHERE k<200;
Query OK, 184 rows affected (39.38 sec)
greatsql> DELETE FROM sbtest5 WHERE k<200;
Query OK, 212 rows affected (37.56 sec)
greatsql> DELETE FROM sbtest1 WHERE k<400;
Query OK, 182 rows affected (39.09 sec)
greatsql> DELETE FROM sbtest2 WHERE k<400;
Query OK, 193 rows affected (38.21 sec)
greatsql> DELETE FROM sbtest3 WHERE k<400;
Query OK, 215 rows affected (34.45 sec)
greatsql> DELETE FROM sbtest4 WHERE k<400;
Query OK, 219 rows affected (37.45 sec)
greatsql> DELETE FROM sbtest5 WHERE k<400;
Query OK, 224 rows affected (34.63 sec)
greatsql> DELETE FROM sbtest1 WHERE k<600;
Query OK, 185 rows affected (34.11 sec)
greatsql> DELETE FROM sbtest2 WHERE k<600;
Query OK, 245 rows affected (35.99 sec)
greatsql> DELETE FROM sbtest3 WHERE k<600;
Query OK, 173 rows affected (36.08 sec)
greatsql> DELETE FROM sbtest4 WHERE k<600;
Query OK, 230 rows affected (36.18 sec)
greatsql> DELETE FROM sbtest5 WHERE k<600;
Query OK, 186 rows affected (38.00 sec)
等待执行完
通过 SHOW PROCESSLIST
观察到回放数据的时候是Applying batch of row changes (delete)
通过SHOW SLAVE STATUS
观察到:数据库一直在应用133171 GTID
,主库的binlog位置对应的位点是binlog.000064
和714314051
注意:
greatsql> SHOW PROCESSLIST;
+--------+-----------------+----------------------+--------+---------+---------+---------------------------------------------+------------------+------------+-----------+---------------+
| Id | User | Host | db | Command | Time | State | Info | Time_ms | Rows_sent | Rows_examined |
+--------+-----------------+----------------------+--------+---------+---------+---------------------------------------------+------------------+------------+-----------+---------------+
| 5 | event_scheduler | localhost | NULL | Daemon | 2356467 | Waiting on empty queue | NULL | 2356467469 | 0 | 0 |
| 17 | system user | connecting host | NULL | Connect | 2356460 | Waiting for source to send event | NULL | 2356459985 | 0 | 0 |
| 18 | system user | | NULL | Query | 99 | Waiting for dependent transaction to commit | NULL | 99081 | 0 | 0 |
| 19 | system user | | fcmark | Query | 697 | Applying batch of row changes (delete) | NULL | 32977 | 0 | 0 |
| 20 | system user | | NULL | Query | 3244 | Waiting for an event from Coordinator | NULL | 3242579 | 0 | 0 |
| 21 | system user | | NULL | Connect | 2356460 | Waiting for an event from Coordinator | NULL | 2356459981 | 0 | 0 |
| 22 | system user | | NULL | Connect | 2356460 | Waiting for an event from Coordinator | NULL | 2356459980 | 0 | 0 |
| 540493 | greatsql | 172.17.136.93:34298 | NULL | Sleep | 83 | | NULL | 83365 | 0 | 0 |
| 540593 | greatsql | 172.17.139.230:51514 | NULL | Query | 0 | init | show processlist | 0 | 0 | 0 |
| 540660 | greatsql | 172.17.136.93:46212 | NULL | Sleep | 383 | | NULL | 383304 | 0 | 0 |
| 540730 | greatsql | 172.17.136.93:51128 | NULL | Sleep | 143 | | NULL | 143319 | 0 | 0 |
| 540758 | greatsql | 172.17.136.93:53112 | NULL | Sleep | 23 | | NULL | 23205 | 0 | 0 |
| 540763 | greatsql | 172.17.139.230:52204 | NULL | Sleep | 1 | | NULL | 1246 | 0 | 0 |
+--------+-----------------+----------------------+--------+---------+---------+---------------------------------------------+------------------+------------+-----------+---------------+
13 rows in set (0.00 sec)
greatsql> SHOW SLAVE STATUS \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 172.17.139.230
Master_User: greatsql
Master_Port: 3307
Connect_Retry: 60
Master_Log_File: binlog.000064
Read_Master_Log_Pos: 714784459
Relay_Log_File: relaylog.000189
Relay_Log_Pos: 714314261
Relay_Master_Log_File: binlog.000064
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 714314051
Relay_Log_Space: 714784956
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 700
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1000403307
Master_UUID: 1d0963e9-85d9-11ef-80e1-00163e28e06a
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Waiting for dependent transaction to commit
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-133182
Executed_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-133170
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set, 1 warning (0.00 sec)
greatsql> SHOW SLAVE STATUS \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 172.17.139.230
Master_User: greatsql
Master_Port: 3307
Connect_Retry: 60
Master_Log_File: binlog.000064
Read_Master_Log_Pos: 714784459
Relay_Log_File: relaylog.000189
Relay_Log_Pos: 714314261
Relay_Master_Log_File: binlog.000064
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 714314051
Relay_Log_Space: 714784956
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 704
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1000403307
Master_UUID: 1d0963e9-85d9-11ef-80e1-00163e28e06a
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Waiting for dependent transaction to commit
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-133182
Executed_Gtid_Set: 1d0963e9-85d9-11ef-80e1-00163e28e06a:1-133170
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set, 1 warning (0.00 sec)
主节点操作,根据从节点观察的信息,在主节点上观察 binlog 信息,回放内容是关于 fcmark.sbtest4``
表的多个DELETE
事件,观察该表结构是无索引,并且数据量接近 1千万
greatsql> SHOW BINLOG EVENTS IN 'binlog.000064' FROM 714314051 LIMIT 50;
+---------------+-----------+-------------+------------+-------------+------------------------------------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+---------------+-----------+-------------+------------+-------------+------------------------------------------------------------------------+
| binlog.000064 | 714314051 | Gtid | 1000403307 | 714314130 | SET @@SESSION.GTID_NEXT= '1d0963e9-85d9-11ef-80e1-00163e28e06a:133171' |
| binlog.000064 | 714314130 | Query | 1000403307 | 714314207 | BEGIN |
| binlog.000064 | 714314207 | Table_map | 1000403307 | 714314274 | table_id: 207 (fcmark.sbtest4) |
| binlog.000064 | 714314274 | Delete_rows | 1000403307 | 714322479 | table_id: 207 |
| binlog.000064 | 714322479 | Delete_rows | 1000403307 | 714330684 | table_id: 207 |
| binlog.000064 | 714330684 | Delete_rows | 1000403307 | 714338889 | table_id: 207 |
| binlog.000064 | 714338889 | Delete_rows | 1000403307 | 714347094 | table_id: 207 |
| binlog.000064 | 714347094 | Delete_rows | 1000403307 | 714349409 | table_id: 207 flags: STMT_END_F |
| binlog.000064 | 714349409 | Xid | 1000403307 | 714349440 | COMMIT /* xid=6241839 */ |
| binlog.000064 | 714349440 | Gtid | 1000403307 | 714349519 | SET @@SESSION.GTID_NEXT= '1d0963e9-85d9-11ef-80e1-00163e28e06a:133172' |
| binlog.000064 | 714349519 | Query | 1000403307 | 714349596 | BEGIN |
| binlog.000064 | 714349596 | Table_map | 1000403307 | 714349663 | table_id: 210 (fcmark.sbtest5) |
| binlog.000064 | 714349663 | Delete_rows | 1000403307 | 714357868 | table_id: 210 |
| binlog.000064 | 714357868 | Delete_rows | 1000403307 | 714366073 | table_id: 210 |
| binlog.000064 | 714366073 | Delete_rows | 1000403307 | 714374278 | table_id: 210 |
| binlog.000064 | 714374278 | Delete_rows | 1000403307 | 714382483 | table_id: 210 |
| binlog.000064 | 714382483 | Delete_rows | 1000403307 | 714390118 | table_id: 210 flags: STMT_END_F |
| binlog.000064 | 714390118 | Xid | 1000403307 | 714390149 | COMMIT /* xid=6241935 */ |
| binlog.000064 | 714390149 | Gtid | 1000403307 | 714390228 | SET @@SESSION.GTID_NEXT= '1d0963e9-85d9-11ef-80e1-00163e28e06a:133173' |
| binlog.000064 | 714390228 | Query | 1000403307 | 714390305 | BEGIN |
| binlog.000064 | 714390305 | Table_map | 1000403307 | 714390372 | table_id: 198 (fcmark.sbtest1) |
| binlog.000064 | 714390372 | Delete_rows | 1000403307 | 714398577 | table_id: 198 |
| binlog.000064 | 714398577 | Delete_rows | 1000403307 | 714406782 | table_id: 198 |
| binlog.000064 | 714406782 | Delete_rows | 1000403307 | 714414987 | table_id: 198 |
| binlog.000064 | 714414987 | Delete_rows | 1000403307 | 714423192 | table_id: 198 |
| binlog.000064 | 714423192 | Delete_rows | 1000403307 | 714425127 | table_id: 198 flags: STMT_END_F |
| binlog.000064 | 714425127 | Xid | 1000403307 | 714425158 | COMMIT /* xid=6242038 */ |
| binlog.000064 | 714425158 | Gtid | 1000403307 | 714425237 | SET @@SESSION.GTID_NEXT= '1d0963e9-85d9-11ef-80e1-00163e28e06a:133174' |
| binlog.000064 | 714425237 | Query | 1000403307 | 714425314 | BEGIN |
| binlog.000064 | 714425314 | Table_map | 1000403307 | 714425381 | table_id: 201 (fcmark.sbtest2) |
| binlog.000064 | 714425381 | Delete_rows | 1000403307 | 714433586 | table_id: 201 |
| binlog.000064 | 714433586 | Delete_rows | 1000403307 | 714441791 | table_id: 201 |
| binlog.000064 | 714441791 | Delete_rows | 1000403307 | 714449996 | table_id: 201 |
| binlog.000064 | 714449996 | Delete_rows | 1000403307 | 714458201 | table_id: 201 |
| binlog.000064 | 714458201 | Delete_rows | 1000403307 | 714462226 | table_id: 201 flags: STMT_END_F |
| binlog.000064 | 714462226 | Xid | 1000403307 | 714462257 | COMMIT /* xid=6242139 */ |
| binlog.000064 | 714462257 | Gtid | 1000403307 | 714462336 | SET @@SESSION.GTID_NEXT= '1d0963e9-85d9-11ef-80e1-00163e28e06a:133175' |
| binlog.000064 | 714462336 | Query | 1000403307 | 714462413 | BEGIN |
| binlog.000064 | 714462413 | Table_map | 1000403307 | 714462480 | table_id: 204 (fcmark.sbtest3) |
| binlog.000064 | 714462480 | Delete_rows | 1000403307 | 714470685 | table_id: 204 |
| binlog.000064 | 714470685 | Delete_rows | 1000403307 | 714478890 | table_id: 204 |
| binlog.000064 | 714478890 | Delete_rows | 1000403307 | 714487095 | table_id: 204 |
| binlog.000064 | 714487095 | Delete_rows | 1000403307 | 714495300 | table_id: 204 |
| binlog.000064 | 714495300 | Delete_rows | 1000403307 | 714503505 | table_id: 204 flags: STMT_END_F |
| binlog.000064 | 714503505 | Xid | 1000403307 | 714503536 | COMMIT /* xid=6242232 */ |
| binlog.000064 | 714503536 | Gtid | 1000403307 | 714503615 | SET @@SESSION.GTID_NEXT= '1d0963e9-85d9-11ef-80e1-00163e28e06a:133176' |
| binlog.000064 | 714503615 | Query | 1000403307 | 714503692 | BEGIN |
| binlog.000064 | 714503692 | Table_map | 1000403307 | 714503759 | table_id: 207 (fcmark.sbtest4) |
| binlog.000064 | 714503759 | Delete_rows | 1000403307 | 714511964 | table_id: 207 |
| binlog.000064 | 714511964 | Delete_rows | 1000403307 | 714520169 | table_id: 207 |
+---------------+-----------+-------------+------------+-------------+------------------------------------------------------------------------+
50 rows in set (0.00 sec)
greatsql> SHOW CREATE TABLE fcmark.sbtest4\G
*************************** 1. row ***************************
Table: sbtest4
Create Table: CREATE TABLE `sbtest4` (
`id` int NOT NULL,
`k` int NOT NULL DEFAULT '0',
`c` char(120) NOT NULL DEFAULT '',
`pad` char(60) NOT NULL DEFAULT ''
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)
greatsql>SELECT count(1) FROM fcmark.sbtest4;
+----------+
| count(1) |
+----------+
| 9999367 |
+----------+
1 row in set (11.35 sec)
使用SHOW PROCESSLIST
和SHOW SLAVE STATUS
观察是数据库正在回放数据,找到正在回放的主库的binlog位点以及正在回放的GTID,在主库上使用SHOW BINLOG EVENTS IN binlog文件 FROM 位点 LIMIT 步长
,来找到对应的表结构和事件类型,进而观察表结构,该表无索引且数据量大,所以导致从库回放的时候比较慢,至于处理办法,可以临时KILL 掉回放线程,然后加上索引,在重启回放线程,具体细节就不详细展开了。
合作电话:010-64087828
社区邮箱:greatsql@greatdb.com