§ MDL锁等待分析
本文介绍在GreatSQL数据库中,如何查看MDL锁以及发生MDL锁等待时如何排查分析。
§ 关于MDL锁
在UPDATE执行慢排查分析一文中提到,当执行SHOW PROCESSLIST
时,可能会看到一种状态是Waiting for XX metadata lock
,这就意味着当前发生了MDL锁等待。
MDL锁全称为Metadata Lock(元数据锁)。在MySQL/GreatSQL中,DDL是不支持事务特性的,当事务和DDL同时操作同一个表,可能会出现各种意想不到问题,如事务特性被破坏、binlog顺序错乱等。为了解决类似这些问题,MySQL在5.5开始引入了MDL锁(Metadata Locking)。也就是说,MDL锁的作用是保证表元数据的一致性,避免DDL和DML并行导致元数据不一致。
MDL锁的范围主要包括以下几种:
- GLOBAL,即全局读锁,例如执行
FLUSH TABLES WITH READ LOCK
。 - TABLE/TABLESPACE/SCHEMA,保护元数据。
- FUNCTION/PROCEDURE/TRIGGER/EVENT,保护元数据。
- COMMIT,用于阻塞事务提交,例如在事务提交前,MDL锁还没释放,此时提交会被阻塞。
- BACKUP,全局备份锁以及单表备份锁,8.0以后新增备份锁。
- USER_LEVEL_LOCK,用户级自定义锁。
- FOREIGN_KEY/CHECK_CONSTRAINT,约束校验锁。
还有其他MDL锁范围,这里未能全部列出,MySQL仍在持续优化MDL锁。
§ 查看MDL锁状态
MDL锁是Server层的锁,对象级锁。
发起DML请求时,会对表同时申请MDL共享锁(只读锁);发起DDL请求时,会对表同时申请MDL排他锁(写锁)。申请MDL加锁的操作会形成一个队列,队列中写锁获取优先级高于读锁。
通过查询 performance_schema.metadata_locks
可以看到当前的MDL加锁及锁等待信息,不过要先进行相应的设置:
-- 1. 打开PFS中MDL观测开关
UPDATE setup_consumers SET ENABLED = 'YES' WHERE NAME ='global_instrumentation';
UPDATE setup_instruments SET ENABLED = 'YES' WHERE NAME =‘wait/lock/metadata/sql/mdl';
2
3
4
下面模拟几种场景,分别观察MDL锁的不同加锁表现。
1. 发起一个事务,提交DML请求
在会话1中发起下面的请求:
BEGIN;
UPDATE t1 SET k=RAND()*102400 WHERE id = 3;
2
在另一个会话中,观察MDL加锁情况:
greatsql> SELECT * FROM performance_schema.metadata_locks\G
*************************** 1. row ***************************
OBJECT_TYPE: TABLE #<-- 表级锁
OBJECT_SCHEMA: greatsql
OBJECT_NAME: t1
COLUMN_NAME: NULL
OBJECT_INSTANCE_BEGIN: 139835142929568
LOCK_TYPE: SHARED_WRITE #<-- 共享写锁,因为这是一个UPDATE事务,如果发起SELECT...FOR SHARE请求,则加锁类型是SHARED_READ
LOCK_DURATION: TRANSACTION #<-- 事务里发起的MDL锁
LOCK_STATUS: GRANTED #<-- 加锁状态:已获得
SOURCE: sql_parse.cc:6516
OWNER_THREAD_ID: 11648
OWNER_EVENT_ID: 18
2
3
4
5
6
7
8
9
10
11
12
13
2. 发起一个显式LOCK WRITE请求
会话1:
LOCK TABLE t1 WRITE;
会话2:
greatsql> SELECT * FROM performance_schema.metadata_locks;
+---------------+--------------------+----------------+-------------+-----------------------+----------------------+---------------+-------------+-------------------+-----------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | COLUMN_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_DURATION | LOCK_STATUS | SOURCE | OWNER_THREAD_ID | OWNER_EVENT_ID |
+---------------+--------------------+----------------+-------------+-----------------------+----------------------+---------------+-------------+-------------------+-----------------+----------------+
| GLOBAL | NULL | NULL | NULL | 139835077968576 | INTENTION_EXCLUSIVE | STATEMENT | GRANTED | sql_base.cc:5588 | 11649 | 18 |
| SCHEMA | greatsql | NULL | NULL | 139835167440320 | INTENTION_EXCLUSIVE | TRANSACTION | GRANTED | sql_base.cc:5575 | 11649 | 18 |
| TABLE | greatsql | t1 | NULL | 139835167441840 | SHARED_NO_READ_WRITE | TRANSACTION | GRANTED | sql_parse.cc:6516 | 11649 | 18 |
| BACKUP TABLES | NULL | NULL | NULL | 139835167437280 | INTENTION_EXCLUSIVE | STATEMENT | GRANTED | lock.cc:1269 | 11649 | 18 |
| TABLESPACE | NULL | greatsql/t1 | NULL | 139835176538640 | INTENTION_EXCLUSIVE | TRANSACTION | GRANTED | lock.cc:816 | 11649 | 18 |
+---------------+--------------------+----------------+-------------+-----------------------+----------------------+---------------+-------------+-------------------+-----------------+----------------+
2
3
4
5
6
7
8
9
10
可以看到,LOCK WRITE 请求实际上要申请好几个排他锁。
3. 发起一个显式LOCK READ请求
会话1:
LOCK TABLE t1 READ;
会话2:
greatsql> SELECT * FROM performance_schema.metadata_locks\G
*************************** 1. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: greatsql
OBJECT_NAME: t1
COLUMN_NAME: NULL
OBJECT_INSTANCE_BEGIN: 139835167440320
LOCK_TYPE: SHARED_READ_ONLY
LOCK_DURATION: TRANSACTION
LOCK_STATUS: GRANTED
SOURCE: sql_parse.cc:6516
OWNER_THREAD_ID: 11649
OWNER_EVENT_ID: 23
2
3
4
5
6
7
8
9
10
11
12
13
如果是 LOCK READ 请求则简单了很多,只有一个表级对象共享锁。
4. 发起一个DDL请求
会话1:
ALTER TABLE t1 ADD c2 INT UNSIGNED NOT NULL;
会话2:
greatsql> SELECT * FROM performance_schema.metadata_locks\G
+---------------+--------------------+------------------+-------------+-----------------------+---------------------+---------------+-------------+--------------------+-----------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | COLUMN_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_DURATION | LOCK_STATUS | SOURCE | OWNER_THREAD_ID | OWNER_EVENT_ID |
+---------------+--------------------+------------------+-------------+-----------------------+---------------------+---------------+-------------+--------------------+-----------------+----------------+
| GLOBAL | NULL | NULL | NULL | 139835077968576 | INTENTION_EXCLUSIVE | STATEMENT | GRANTED | sql_base.cc:5588 | 11649 | 32 |
| BACKUP LOCK | NULL | NULL | NULL | 139835176542080 | INTENTION_EXCLUSIVE | TRANSACTION | GRANTED | sql_base.cc:5600 | 11649 | 32 |
| SCHEMA | greatsql | NULL | NULL | 139835167443440 | INTENTION_EXCLUSIVE | TRANSACTION | GRANTED | sql_base.cc:5575 | 11649 | 32 |
| TABLE | greatsql | t1 | NULL | 139835167442560 | SHARED_UPGRADABLE | TRANSACTION | GRANTED | sql_parse.cc:6516 | 11649 | 32 |
| BACKUP TABLES | NULL | NULL | NULL | 139835090555168 | INTENTION_EXCLUSIVE | STATEMENT | GRANTED | lock.cc:1269 | 11649 | 32 |
| TABLESPACE | NULL | greatsql/t1 | NULL | 139835176538640 | INTENTION_EXCLUSIVE | TRANSACTION | GRANTED | lock.cc:816 | 11649 | 32 |
| TABLE | greatsql | #sql-1cbfa7_3436 | NULL | 139833831927776 | EXCLUSIVE | STATEMENT | GRANTED | sql_table.cc:17750 | 11649 | 32 |
| TABLE | performance_schema | metadata_locks | NULL | 139835329552848 | SHARED_READ | TRANSACTION | GRANTED | sql_parse.cc:6516 | 12409 | 3 |
+---------------+--------------------+------------------+-------------+-----------------------+---------------------+---------------+-------------+--------------------+-----------------+----------------+
2
3
4
5
6
7
8
9
10
11
12
13
也是要申请好几个锁。
5. 发起一个备份锁
会话1:
LOCK INSTANCE FOR BACKUP;
会话2:
greatsql> SELECT * FROM performance_schema.metadata_locks\G
*************************** 1. row ***************************
OBJECT_TYPE: BACKUP LOCK
OBJECT_SCHEMA: NULL
OBJECT_NAME: NULL
COLUMN_NAME: NULL
OBJECT_INSTANCE_BEGIN: 139835167443600
LOCK_TYPE: SHARED
LOCK_DURATION: EXPLICIT
LOCK_STATUS: GRANTED
SOURCE: sql_backup_lock.cc:100
OWNER_THREAD_ID: 11649
OWNER_EVENT_ID: 48
2
3
4
5
6
7
8
9
10
11
12
13
6. 发起FTWRL锁
会话1:
FLUSH TABLES WITH READ LOCK;
会话2:
greatsql> SELECT * FROM performance_schema.metadata_locks\G
*************************** 1. row ***************************
OBJECT_TYPE: GLOBAL
OBJECT_SCHEMA: NULL
OBJECT_NAME: NULL
COLUMN_NAME: NULL
OBJECT_INSTANCE_BEGIN: 139835167443600
LOCK_TYPE: SHARED
LOCK_DURATION: EXPLICIT
LOCK_STATUS: GRANTED
SOURCE: lock.cc:1076
OWNER_THREAD_ID: 11649
OWNER_EVENT_ID: 62
*************************** 2. row ***************************
OBJECT_TYPE: COMMIT
OBJECT_SCHEMA: NULL
OBJECT_NAME: NULL
COLUMN_NAME: NULL
OBJECT_INSTANCE_BEGIN: 139835061536608
LOCK_TYPE: SHARED
LOCK_DURATION: EXPLICIT
LOCK_STATUS: GRANTED
SOURCE: lock.cc:1151
OWNER_THREAD_ID: 11649
OWNER_EVENT_ID: 62
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
看到除了GLOBAL锁,还有COMMIT锁。
§ 查看分析MDL锁等待
MDL锁是比较粗粒度的锁,一旦出现写锁等待,不但当前操作会被阻塞,同时还会阻塞后续该表的所有操作,如下例所示:
会话1 | 会话2 | 会话3 |
---|---|---|
BEGIN; SELECT * FROM t1 WHERE id = 3 FOR UPDATE; 发起事务,申请加行锁,以及MDL锁 | ||
ALTER TABLE t1 ADD c2 INT UNSIGNED NOT NULL; 被会话1阻塞 | ||
SELECT * FROM t1 WHERE id=5; 被MDL阻塞,进入等待 |
这时,如果执行 SHOW PROCESSLIST
就可以看到会话2 & 3的状态都是等待获得MDL锁:
| 13365 | root | localhost | greatsql | Sleep | 41 | | NULL | 41867 | 1 | 1 |
| 13366 | root | localhost | greatsql | Query | 14 | Waiting for table metadata lock | ALTER TABLE t1 ADD c2 INT UNSIGNED NOT NULL | 13943 | 0 | 0 |
| 13368 | root | localhost | greatsql | Query | 5 | Waiting for table metadata lock | SELECT * FROM t1 WHERE id=5 | 5701 | 0 | 0 |
2
3
查看 performance_schema.metadata_locks
可以看到当前MDL锁的状态是这样的:
greatsql> SELECT * FROM performance_schema.metadata_locks;
+---------------+---------------+------------------+-------------+-----------------------+---------------------+---------------+-------------+--------------------+-----------------+----------------+
| OBJECT_TYPE | OBJECT_SCHEMA | OBJECT_NAME | COLUMN_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_DURATION | LOCK_STATUS | SOURCE | OWNER_THREAD_ID | OWNER_EVENT_ID |
+---------------+---------------+------------------+-------------+-----------------------+---------------------+---------------+-------------+--------------------+-----------------+----------------+
| TABLE | greatsql | t1 | NULL | 139835061536448 | SHARED_WRITE | TRANSACTION | GRANTED | sql_parse.cc:6516 | 11648 | 46 |
| GLOBAL | NULL | NULL | NULL | 139836552110240 | INTENTION_EXCLUSIVE | STATEMENT | GRANTED | sql_base.cc:5588 | 11649 | 45 |
| BACKUP LOCK | NULL | NULL | NULL | 139835167443600 | INTENTION_EXCLUSIVE | TRANSACTION | GRANTED | sql_base.cc:5600 | 11649 | 45 |
| SCHEMA | greatsql | NULL | NULL | 139835167442560 | INTENTION_EXCLUSIVE | TRANSACTION | GRANTED | sql_base.cc:5575 | 11649 | 45 |
| TABLE | greatsql | t1 | NULL | 139835167442400 | SHARED_UPGRADABLE | TRANSACTION | GRANTED | sql_parse.cc:6516 | 11649 | 45 |
| BACKUP TABLES | NULL | NULL | NULL | 139833831927776 | INTENTION_EXCLUSIVE | STATEMENT | GRANTED | lock.cc:1269 | 11649 | 45 |
| TABLESPACE | NULL | greatsql/t1 | NULL | 139835167444000 | INTENTION_EXCLUSIVE | TRANSACTION | GRANTED | lock.cc:816 | 11649 | 45 |
| TABLE | greatsql | #sql-1cbfa7_3436 | NULL | 139835167444960 | EXCLUSIVE | STATEMENT | GRANTED | sql_table.cc:17750 | 11649 | 45 |
| TABLE | greatsql | t1 | NULL | 139835167443280 | EXCLUSIVE | TRANSACTION | PENDING | mdl.cc:3784 | 11649 | 45 |
| TABLE | greatsql | t1 | NULL | 139835080065408 | SHARED_READ | TRANSACTION | PENDING | sql_parse.cc:6516 | 11651 | 41 |
+---------------+---------------+------------------+-------------+-----------------------+---------------------+---------------+-------------+--------------------+-----------------+----------------+
2
3
4
5
6
7
8
9
10
11
12
13
14
15
这个视角可读性太差了,改成查看 sys.schema_table_lock_waits
更清晰:
greatsql> SELECT * FROM sys.schema_table_lock_waits\G
*************************** 1. row ***************************
object_schema: greatsql
object_name: t1
waiting_thread_id: 11649
waiting_pid: 13366
waiting_account: root@localhost
waiting_lock_type: EXCLUSIVE
waiting_lock_duration: TRANSACTION
waiting_query: ALTER TABLE t1 ADD c2 INT UNSIGNED NOT NULL
waiting_query_secs: 39
waiting_query_rows_affected: 0
waiting_query_rows_examined: 0
blocking_thread_id: 11648
blocking_pid: 13365
blocking_account: root@localhost
blocking_lock_type: SHARED_WRITE
blocking_lock_duration: TRANSACTION
sql_kill_blocking_query: KILL QUERY 13365
sql_kill_blocking_connection: KILL 13365
*************************** 2. row ***************************
object_schema: greatsql
object_name: t1
waiting_thread_id: 11651
waiting_pid: 13368
waiting_account: root@localhost
waiting_lock_type: SHARED_READ
waiting_lock_duration: TRANSACTION
waiting_query: SELECT * FROM t1 WHERE id=5
waiting_query_secs: 36
waiting_query_rows_affected: 0
waiting_query_rows_examined: 0
blocking_thread_id: 11648
blocking_pid: 13365
blocking_account: root@localhost
blocking_lock_type: SHARED_WRITE
blocking_lock_duration: TRANSACTION
sql_kill_blocking_query: KILL QUERY 13365
sql_kill_blocking_connection: KILL 13365
*************************** 3. row ***************************
object_schema: greatsql
object_name: t1
waiting_thread_id: 11649
waiting_pid: 13366
waiting_account: root@localhost
waiting_lock_type: EXCLUSIVE
waiting_lock_duration: TRANSACTION
waiting_query: ALTER TABLE t1 ADD c2 INT UNSIGNED NOT NULL
waiting_query_secs: 39
waiting_query_rows_affected: 0
waiting_query_rows_examined: 0
blocking_thread_id: 11649
blocking_pid: 13366
blocking_account: root@localhost
blocking_lock_type: SHARED_UPGRADABLE
blocking_lock_duration: TRANSACTION
sql_kill_blocking_query: KILL QUERY 13366
sql_kill_blocking_connection: KILL 13366
*************************** 4. row ***************************
object_schema: greatsql
object_name: t1
waiting_thread_id: 11651
waiting_pid: 13368
waiting_account: root@localhost
waiting_lock_type: SHARED_READ
waiting_lock_duration: TRANSACTION
waiting_query: SELECT * FROM t1 WHERE id=5
waiting_query_secs: 36
waiting_query_rows_affected: 0
waiting_query_rows_examined: 0
blocking_thread_id: 11649
blocking_pid: 13366
blocking_account: root@localhost
blocking_lock_type: SHARED_UPGRADABLE
blocking_lock_duration: TRANSACTION
sql_kill_blocking_query: KILL QUERY 13366
sql_kill_blocking_connection: KILL 13366
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
在上述输出结果中,甚至还提供了解除MDL锁等待的方法,通过KILL持有MDL锁的连接或正在执行的SQL以释放MDL锁。不过这种是比较粗暴的做法,最好是找到持有MDL锁的那个事务,主动发起COMMIT/ROLLBACK结束这个事务,或执行 UNLOCK TABLES
,就可以释放相应的MDL锁了。
§ MDL锁等待优化建议
MDL锁等待超时阈值由选项 lock_wait_timeout
定义,默认值是 31536000 秒(即:一年),这个值太大了,建议调低,在 GreatSQL my.cnf模板 (opens new window) 中的建议参考值是 3600。
除此外,还应该定期监控MDL锁及MDL锁等待状态,一旦发现有超过设定阈值时长的MDL锁等待发生,应立即发出告警,通知DBA及时检查处理。
以下是几个容易造成较大范围MDL锁等待的操作,尽量放在业务低谷期间执行:
- 备份实例,或备份单表。
- 表DDL操作。
- 长时间未结束的事务。
- 较早的某些版本可能存在bug,导致频繁执行
SHOW TABLE STATUS
时也会造成MDL锁等待。 - 个别图形化数据库管理工具可能在鼠标点中某个数据对象时,也会主动请求MDL锁。
参考资料
扫码关注微信公众号
← 死锁排查分析 sysbench 测试 →