GreatSQL社区

搜索

yejr

MySQL 8.4新特性之直方图自动更新

yejr 已有 30 次阅读2025-10-9 17:11 |系统分类:原理&产品解读

在文章 该开始关注 MySQL 8.4 了 中提到MySQL 8.4支持直方图自动更新,不需要手动定期更新,这是个非常好的新特性,今天就体验下这个新特性。

首先要说的是,直方图自动更新功能是 默认关闭的,需要在创建直方图时指定为 自动更新 才可以,否则默认的做法是像以前那样需要 手动更新

1、创建测试表

greatsql> CREATE TABLE `t1` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `c1` int unsigned NOT NULL,
  `c2` int unsigned NOT NULL,
  `c4` varchar(20) NOT NULL,
  `c5` varchar(50) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

2、填充测试数据

利用 mysql_random_data_load 填充2万条数据

$ mysql_random_data_load test t1 20000

表中的 c2 列值设计为不超过50的整型,因此再手动更新

UPDATE t1 SET c2 = RAND()*50;

查看确认数据

greatsql> SELECT MIN(c2), MAX(c2) FROM t1;
+---------+---------+
| MIN(c2) | MAX(c2) |
+---------+---------+
|       0 |      50 |
+---------+---------+

在创建直方图之前,先查看查询的执行计划,方便后面做对比

greatsql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE c2=24\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 20130
     filtered: 10.00
        Extra: Using where
        
greatsql> EXPLAIN ANALYZE SELECT COUNT(*) FROM t1 WHERE c2=24\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)  (cost=2255 rows=1) (actual time=2.93..2.93 rows=1 loops=1)
    -> Filter: (t1.c2 = 24)  (cost=2053 rows=2013) (actual time=0.149..2.91 rows=378 loops=1)
        -> Table scan on t1  (cost=2053 rows=20130) (actual time=0.148..2.28 rows=20000 loops=1)

3、创建直方图,并指定为自动更新

greatsql> ANALYZE TABLE t1 UPDATE HISTOGRAM ON c2 WITH 51 BUCKETS AUTO UPDATE;
+---------+-----------+----------+-----------------------------------------------+
| Table   | Op        | Msg_type | Msg_text                                      |
+---------+-----------+----------+-----------------------------------------------+
| test.t1 | histogram | status   | Histogram statistics created for column 'c2'. |
+---------+-----------+----------+-----------------------------------------------+

表中c2 列值分布为 [0, 50],共51个值,因此我创建直方图时干脆指定为51个桶,并设置自动更新,其关键字是在末尾的 AUTO UPDATE

4、查看直方图元数据

greatsql> SELECT SCHEMA_NAME, TABLE_NAME, COLUMN_NAME, JSON_PRETTY(HISTOGRAM) FROM INFORMATION_SCHEMA.COLUMN_STATISTICS\G
*************************** 1. row ***************************
           SCHEMA_NAME: test  -- 库
            TABLE_NAME: t1    -- 表
           COLUMN_NAME: c2    -- 列
JSON_PRETTY(HISTOGRAM): {
  "buckets": [
    [
      0,  -- 桶1存储的数值
      0.01035  -- 该数值频率
    ],
    [
      1,
      0.0313
    ],
    [
      2,
      0.05085
    ],
...
    [
      48,
      0.9712
    ],
    [
      49,
      0.9903
    ],
    [
      50,
      1.0
    ]
  ],
  "data-type": "int",
  "auto-update": true,  -- 自动更新,如果不是自动更新则显示为false
  "null-values": 0.0,
  "collation-id": 8,
  "last-updated": "2025-08-20 08:04:52.407763",
  "sampling-rate": 1.0,  -- 数据采样比例,100%
  "histogram-type": "singleton",  -- 等宽直方图
  "number-of-buckets-specified": 51  -- 共51个桶
}

对比创建直方图后的执行计划

greatsql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE c2=24\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 20130
     filtered: 1.89  -- 从原来的10.00下降到了1.89
        Extra: Using where

greatsql> EXPLAIN ANALYZE SELECT COUNT(*) FROM t1 WHERE c2=24\G
*************************** 1. row ***************************
EXPLAIN: -> Aggregate: count(0)  (cost=2091 rows=1) (actual time=2.9..2.9 rows=1 loops=1)  -- cost更小一些(之前是2255)
    -> Filter: (t1.c2 = 24)  (cost=2053 rows=380) (actual time=0.142..2.89 rows=378 loops=1)  -- 在Filter阶段的rows明显下降(之前是2013)
        -> Table scan on t1  (cost=2053 rows=20130) (actual time=0.141..2.3 rows=20000 loops=1)

5、写入更多测试数据,观察直方图统计信息变化

复制一遍原表数据,idc2 列发生变化,其余照旧

greatsql> INSERT INTO t1 SELECT 0, c1, c2+RAND()*50, c4, c5 FROM t1;
Query OK, 20000 rows affected (0.52 sec)
Records: 20000  Duplicates: 0  Warnings: 0

greatsql> SELECT MIN(c2), MAX(c2) FROM t1;
+---------+---------+
| MIN(c2) | MAX(c2) |
+---------+---------+
|       0 |      99 |
+---------+---------+

看到 c2 列值分布从原来的 [0, 50] 变成了 [0,99]

再次查询直方图统计信息

greatsql> SELECT SCHEMA_NAME, TABLE_NAME, COLUMN_NAME, JSON_PRETTY(HISTOGRAM) FROM INFORMATION_SCHEMA.COLUMN_STATISTICS\G
*************************** 1. row ***************************
           SCHEMA_NAME: test
            TABLE_NAME: t1
           COLUMN_NAME: c2
JSON_PRETTY(HISTOGRAM): {
  "buckets": [
    [
      0,
      2,
      0.02595,
      3
    ],
    [
      3,
      4,
      0.046825,
      2
    ],
...
    [
      50,
      51,
      0.765925,
      2
    ],
    [
      52,
      53,
      0.785175,
      2
    ],
...
    [
      82,
      91,
      0.99315,
      10
    ],
    [
      92,
      99,
      1.0,
      8
    ]
  ],
  "data-type": "int",
  "auto-update": true,
  "null-values": 0.0,
  "collation-id": 8,
  "last-updated": "2025-08-20 08:25:56.049093",
  "sampling-rate": 1.0,
  "histogram-type": "equi-height",
  "number-of-buckets-specified": 51
}        

看到直方图已完成自动更新,统计信息中已包含新写入的数据。

如果想要让直方图保持 默认需要手动更新 的行为,在创建直方图时,不增加 AUTO UPDATE 关键字即可,或显式加上 MANUAL UPDATE 关键字

greatsql> ANALYZE TABLE t1 UPDATE HISTOGRAM ON c2 WITH 51 BUCKETS;

-- 或显式加上 `MANUAL UPDATE` 关键字
greatsql> ANALYZE TABLE t1 UPDATE HISTOGRAM ON c2 WITH 51 BUCKETS MANUAL UPDATE;

-- 再次查看直方图元信息,确认 auto-update 属性为 false
greatsql> SELECT SCHEMA_NAME, TABLE_NAME, COLUMN_NAME, JSON_PRETTY(HISTOGRAM) FROM INFORMATION_SCHEMA.COLUMN_STATISTICS\G
...
  "data-type": "int",
  "auto-update": false,  -- 确认 auto-update 属性为 false
  "null-values": 0.0,
  "collation-id": 8,
  "last-updated": "2025-08-20 08:32:00.153552",
  "sampling-rate": 1.0,
  "histogram-type": "equi-height",
  "number-of-buckets-specified": 51

以上,全文完。

扩展阅读

Enjoy MySQL 8.4 :)


评论 (0 个评论)

facelist

您需要登录后才可以评论 登录 | 立即注册

合作电话:010-64087828

社区邮箱:greatsql@greatdb.com

社区公众号
社区小助手
QQ群
GMT+8, 2025-10-13 04:13 , Processed in 0.014401 second(s), 9 queries , Redis On.
返回顶部