|
在文章 该开始关注 MySQL 8.4 了 中提到MySQL 8.4支持直方图自动更新,不需要手动定期更新,这是个非常好的新特性,今天就体验下这个新特性。
首先要说的是,直方图自动更新功能是 默认关闭的,需要在创建直方图时指定为 自动更新 才可以,否则默认的做法是像以前那样需要 手动更新。
1、创建测试表
greatsql> CREATE TABLE `t1` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`c1` int unsigned NOT NULL,
`c2` int unsigned NOT NULL,
`c4` varchar(20) NOT NULL,
`c5` varchar(50) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
2、填充测试数据
利用 mysql_random_data_load
填充2万条数据
$ mysql_random_data_load test t1 20000
表中的 c2 列值设计为不超过50的整型,因此再手动更新
UPDATE t1 SET c2 = RAND()*50;
查看确认数据
greatsql> SELECT MIN(c2), MAX(c2) FROM t1;
+---------+---------+
| MIN(c2) | MAX(c2) |
+---------+---------+
| 0 | 50 |
+---------+---------+
在创建直方图之前,先查看查询的执行计划,方便后面做对比
greatsql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE c2=24\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 20130
filtered: 10.00
Extra: Using where
greatsql> EXPLAIN ANALYZE SELECT COUNT(*) FROM t1 WHERE c2=24\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0) (cost=2255 rows=1) (actual time=2.93..2.93 rows=1 loops=1)
-> Filter: (t1.c2 = 24) (cost=2053 rows=2013) (actual time=0.149..2.91 rows=378 loops=1)
-> Table scan on t1 (cost=2053 rows=20130) (actual time=0.148..2.28 rows=20000 loops=1)
3、创建直方图,并指定为自动更新
greatsql> ANALYZE TABLE t1 UPDATE HISTOGRAM ON c2 WITH 51 BUCKETS AUTO UPDATE;
+---------+-----------+----------+-----------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+---------+-----------+----------+-----------------------------------------------+
| test.t1 | histogram | status | Histogram statistics created for column 'c2'. |
+---------+-----------+----------+-----------------------------------------------+
表中c2 列值分布为 [0, 50],共51个值,因此我创建直方图时干脆指定为51个桶,并设置自动更新,其关键字是在末尾的 AUTO UPDATE
。
4、查看直方图元数据
greatsql> SELECT SCHEMA_NAME, TABLE_NAME, COLUMN_NAME, JSON_PRETTY(HISTOGRAM) FROM INFORMATION_SCHEMA.COLUMN_STATISTICS\G
*************************** 1. row ***************************
SCHEMA_NAME: test -- 库
TABLE_NAME: t1 -- 表
COLUMN_NAME: c2 -- 列
JSON_PRETTY(HISTOGRAM): {
"buckets": [
[
0, -- 桶1存储的数值
0.01035 -- 该数值频率
],
[
1,
0.0313
],
[
2,
0.05085
],
...
[
48,
0.9712
],
[
49,
0.9903
],
[
50,
1.0
]
],
"data-type": "int",
"auto-update": true, -- 自动更新,如果不是自动更新则显示为false
"null-values": 0.0,
"collation-id": 8,
"last-updated": "2025-08-20 08:04:52.407763",
"sampling-rate": 1.0, -- 数据采样比例,100%
"histogram-type": "singleton", -- 等宽直方图
"number-of-buckets-specified": 51 -- 共51个桶
}
对比创建直方图后的执行计划
greatsql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE c2=24\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 20130
filtered: 1.89 -- 从原来的10.00下降到了1.89
Extra: Using where
greatsql> EXPLAIN ANALYZE SELECT COUNT(*) FROM t1 WHERE c2=24\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0) (cost=2091 rows=1) (actual time=2.9..2.9 rows=1 loops=1) -- cost更小一些(之前是2255)
-> Filter: (t1.c2 = 24) (cost=2053 rows=380) (actual time=0.142..2.89 rows=378 loops=1) -- 在Filter阶段的rows明显下降(之前是2013)
-> Table scan on t1 (cost=2053 rows=20130) (actual time=0.141..2.3 rows=20000 loops=1)
5、写入更多测试数据,观察直方图统计信息变化
复制一遍原表数据,id 和 c2 列发生变化,其余照旧
greatsql> INSERT INTO t1 SELECT 0, c1, c2+RAND()*50, c4, c5 FROM t1;
Query OK, 20000 rows affected (0.52 sec)
Records: 20000 Duplicates: 0 Warnings: 0
greatsql> SELECT MIN(c2), MAX(c2) FROM t1;
+---------+---------+
| MIN(c2) | MAX(c2) |
+---------+---------+
| 0 | 99 |
+---------+---------+
看到 c2 列值分布从原来的 [0, 50] 变成了 [0,99]。
再次查询直方图统计信息
greatsql> SELECT SCHEMA_NAME, TABLE_NAME, COLUMN_NAME, JSON_PRETTY(HISTOGRAM) FROM INFORMATION_SCHEMA.COLUMN_STATISTICS\G
*************************** 1. row ***************************
SCHEMA_NAME: test
TABLE_NAME: t1
COLUMN_NAME: c2
JSON_PRETTY(HISTOGRAM): {
"buckets": [
[
0,
2,
0.02595,
3
],
[
3,
4,
0.046825,
2
],
...
[
50,
51,
0.765925,
2
],
[
52,
53,
0.785175,
2
],
...
[
82,
91,
0.99315,
10
],
[
92,
99,
1.0,
8
]
],
"data-type": "int",
"auto-update": true,
"null-values": 0.0,
"collation-id": 8,
"last-updated": "2025-08-20 08:25:56.049093",
"sampling-rate": 1.0,
"histogram-type": "equi-height",
"number-of-buckets-specified": 51
}
看到直方图已完成自动更新,统计信息中已包含新写入的数据。
如果想要让直方图保持 默认需要手动更新 的行为,在创建直方图时,不增加 AUTO UPDATE
关键字即可,或显式加上 MANUAL UPDATE
关键字
greatsql> ANALYZE TABLE t1 UPDATE HISTOGRAM ON c2 WITH 51 BUCKETS;
-- 或显式加上 `MANUAL UPDATE` 关键字
greatsql> ANALYZE TABLE t1 UPDATE HISTOGRAM ON c2 WITH 51 BUCKETS MANUAL UPDATE;
-- 再次查看直方图元信息,确认 auto-update 属性为 false
greatsql> SELECT SCHEMA_NAME, TABLE_NAME, COLUMN_NAME, JSON_PRETTY(HISTOGRAM) FROM INFORMATION_SCHEMA.COLUMN_STATISTICS\G
...
"data-type": "int",
"auto-update": false, -- 确认 auto-update 属性为 false
"null-values": 0.0,
"collation-id": 8,
"last-updated": "2025-08-20 08:32:00.153552",
"sampling-rate": 1.0,
"histogram-type": "equi-height",
"number-of-buckets-specified": 51
以上,全文完。
扩展阅读
Enjoy MySQL 8.4 :)
合作电话:010-64087828
社区邮箱:greatsql@greatdb.com