|
GreatSQL 优化器的索引跳跃扫描(Index Skip Scan) 是一种优化查询的技术,尤其在联合索引中用于减少扫描的无效行数。它通过"跳跃"式的扫描方式,避免了对索引中无用部分的扫描,从而提升查询效率。这种技术适合特定场景,并有一定的优缺点。
索引跳跃扫描利用的是联合索引中非首列(非最左前缀)的索引列,来提高查询效率。例如,如果你有一个复合索引 (A, B),在传统的 B-Tree 索引中,只有当查询条件包含 A 列时,索引才会生效。但在跳跃扫描中,即使 A 没有出现在查询条件中,仍然可以通过扫描 B 列来有效查询。跳跃扫描会逐步扫描 A 列的每一个可能值,然后在每个 A 值下查找 B 列中符合条件的记录。这样避免了扫描大量无关记录,提升了查询性能。
下面用一个简单的例子来说明INDEX_SKIP_SCAN
怎么应用在优化器。
CREATE TABLE t1(c1 INT, c2 INT, c3 INT, c4 INT);
CREATE UNIQUE INDEX i1_t1 ON t1(c1,c2,c3);
INSERT INTO t1 VALUES (1,1,1,1), (1,1,2,2), (1,3,3,3), (1,4,4,4), (1,5,5,5),
(2,1,1,1), (2,2,2,2), (2,3,3,3), (2,4,4,4), (2,5,5,5);
INSERT INTO t1 SELECT c1, c2, c3+5, c4+10 FROM t1;
INSERT INTO t1 SELECT c1, c2, c3+10, c4+20 FROM t1;
INSERT INTO t1 SELECT c1, c2, c3+20, c4+40 FROM t1;
INSERT INTO t1 SELECT c1, c2, c3+40, c4+80 FROM t1;
ANALYZE TABLE t1;
-- 下面的结果用到了skip scan
greatsql> EXPLAIN SELECT c1, c2 FROM t1 WHERE c2 > 40;
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
| 1 | SIMPLE | t1 | NULL | range | i1_t1 | i1_t1 | 10 | NULL | 53 | 100.00 | Using where; Using index for skip scan |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
/*运行过程:
1、先找到第一列的不同值,得到结果为1和2
select distinct c1 from t1;
2、根据第一列的分组结果在分组内执行范围扫描
select c1, c2 from t1 where c1 = 1 and c2 > 40;
select c1, c2 from t1 where c1 = 2 and c2 > 40;*/
skip scan 是根据条件生成 mm tree 以后,根据 mm tree 结果和索引进行判定能不能用 skip scan,其中联合索引中范围条件列之前的列为prefix列,即用来作为分组的依据的列,如果有多个除了第一列之外的范围列,那么选取遇到的第一个列之前的列为前缀进行分组。
int test_quick_select() {
// 先判断skip_scan是否满足条件,不满足的话执行索引合并。
AccessPath *skip_scan_path = get_best_skip_scan();
// 不满足skip_scan执行索引合并,相关知识看上一节介绍
}
AccessPath *get_best_skip_scan() {
// 先判断sql是否支持skip_scan,见表一
// 接着开始执行skip_scan,遍历所有索引
for (uint cur_param_idx = 0; cur_param_idx < param->keys; ++cur_param_idx) {
// covering_keys代表可以使用的联合索引,sql涉及的所有列必须都在联合索引上才会有值
if (!table->covering_keys.is_set(cur_index)) {
cause = "query_references_nonkey_column";
goto next_index;
}
// 从mm tree抽出当前联合索引相关的SEL_ROOT,如果没有那么就不能执行skip scan
cur_index_range_tree = get_index_range_tree(cur_index, tree, param);
// 遍历当前索引涉及的所有列
for (cur_part = cur_index_info->key_part; cur_part != end_part;
cur_part++, part++) {
// 从mm tree获取当前列相关的SEL_ROOT,就是查询条件涉及的列
get_sel_root_for_keypart();
1、如果找到联合索引前缀列相关条件并且不是等号条件,那么就判断为"prefix_not_const_equality"并且跳到下一个索引
2、如果找到联合索引前缀列以外的列相关条件,以遇到的第一个条件作为skip scan的条件。
}
// 针对联合索引的前缀列估计范围内包含的行数,用在条件包含前缀列有等号条件的情况
quick_prefix_records = check_quick_select();
// 计算skip_scan的行数和cost
cost_skip_scan();
// 最后生成AccessPath,包含IndexSkipScanParameters
}
}
// 获取skip scan的行数和cost结果
void cost_skip_scan() {
table_records = table->file->stats.records;
// 如果可以根据前缀列估算出行数,那么就用估算的值,如果不能就用全表的行数。
if (quick_prefix_records == HA_POS_ERROR)
skip_scan_records = table_records;
else
skip_scan_records = quick_prefix_records;
// 首先根据联合索引前缀列数值获取每个唯一数组包含的相同数的数量
/* Compute the number of keys in a group. */
if (index_info->has_records_per_key(distinct_key_parts - 1)) {
keys_per_group = index_info->records_per_key(distinct_key_parts - 1);
} else
keys_per_group = guess_rec_per_key(table, index_info, distinct_key_parts);
num_groups = (uint)(skip_scan_records / keys_per_group) + 1;
num_groups = std::max(num_groups, 1U);
// 接着根据可以用于skip scan的列,获取这个列的不同唯一数值包含的相同数的数量
if (index_info->has_records_per_key(distinct_key_parts))
keys_per_range = index_info->records_per_key(distinct_key_parts);
else
keys_per_range =
guess_rec_per_key(table, index_info, distinct_key_parts + 1);
//
double max_distinct_values = max(
1.0, static_cast(uint(keys_per_group) / uint(keys_per_range)));
// 根据条件估算过滤系数
float filtering_effect = where_cond->get_filtering_effect();
// 计算得出的总行数
*records = max(ha_rows(1), ha_rows(skip_scan_records * filtering_effect));
// 最后估算IO cost和CPU cost
Cost_estimate cost_skip_scan =
table->file->index_scan_cost(key, num_groups, *records);
}
最后具体实现过程见代码IndexSkipScanIterator::Read(),这里不再展开赘述。
表一:不能使用 skip_scan 的场合
场合 | 说明 |
---|---|
sql语句带有ORDER DESC | 倒序排序不支持skip_scan |
sql语句带有group by | |
没有生成mm tree | 条件必须带有列相关的范围 |
sql语句带有distinct | |
选择项包含聚合函数和DISTINCT | |
查询语句的Item包含不在联合索引的列 | |
联合索引前缀列涉及的条件不为等号条件 | 比如where c1<10 and c2>10 |
联合索引非前缀列涉及的条件生成的mm tree元素超过1个 | 比如where c2>10 or c2<1 |
表二:索引判断状态迁移
索引状态 | 说明 |
---|---|
EQUALITY_KEYPART | 当前列是联合索引的前缀列,并且条件是等于条件 |
SKIPPED_KEYPART | 当前列是联合索引的非条件列,即当前列没有相关条件 |
RANGE_KEYPART | 当前列是联合索引的最终条件列,并且条件是单个范围,不是多个范围。 |
TRAILING_KEYPART | 状态结束 |
表三:skip index 使用场景
场景 | 说明 |
---|---|
联合索引查询 | 当查询条件不包括索引的最左前缀列,而仅包括后面的列时,可以使用跳跃扫描。 |
低基数列查询 | 对于列值种类少、重复率高的列,跳跃扫描可以减少扫描无效记录的时间。 |
避免额外索引 | 当现有的联合索引足够支持查询,而不想为特定列额外创建索引时,跳跃扫描是一种权衡。 |
表四:skip index 优点
优点 | 说明 |
---|---|
提高查询效率 | 对于联合索引,如果查询条件只涉及非最左前缀列,跳跃扫描能够提高查询效率,减少全表扫描的次数。 |
减少 I/O 操作 | 通过避免扫描无效的索引行,跳跃扫描减少了对数据页的访问,从而节省了 I/O 操作。 |
降低索引空间要求 | 在某些场景下,可以减少为查询额外建立索引的需求,因为即使只使用了非首列,跳跃扫描也能利用现有的复合索引。 |
表五:skip index 缺点
缺点 | 说明 |
---|---|
不适合高基数列 | 跳跃扫描对低基数列(值不多但重复率高的列)有较好的效果。但如果参与跳跃扫描的列基数高,可能需要大量跳跃,反而影响效率。 |
无法替代覆盖索引 | 对于那些经常查询的列,跳跃扫描并不能代替为每个列创建单独的索引。对于常用列,覆盖索引的效果会更好。 |
不适用于所有查询类型 | 跳跃扫描仅在某些查询模式下有效,特别是当查询条件中不包含索引的最左前缀列时。如果最左列经常被查询,跳跃扫描无法发挥作用。 |
表六:相关关键词
关键词 | 值 |
---|---|
OPTIMIZER_SKIP_SCAN | ON/OFF |
SKIP_SCAN | 参数为表名,必须满足使用条件才生效,使用见下面例子 |
NO_SKIP_SCAN | 参数为表名和索引名,必须满足使用条件才生效,使用见下面例子 |
接下来看几个例子来说明上面的代码:
greatsql> EXPLAIN SELECT c1, c2 FROM t1 WHERE c2 > 40;
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
| 1 | SIMPLE | t1 | NULL | range | i1_t1 | i1_t1 | 10 | NULL | 53 | 100.00 | Using where; Using index for skip scan |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
"skip_scan_range": {
"potential_skip_scan_indexes": [
{
"index": "i1_t1",
"tree_travel_cost": 0.4,
"num_groups": 3, -- 计算方式见下面
"rows": 53, -- 计算方式见下面
"cost": 11.483
}
]
},
"best_skip_scan_summary": {
"type": "skip_scan",
"index": "i1_t1",
"key_parts_used_for_access": [
"c1",
"c2"
],
"range": [
"40 < c2"
],
"chosen": true
},
/* num_groups和rows计算过程:
skip_scan_records = table->file->stats.records = 160
keys_per_group = 条件涉及的第一个列的每个key包含的值 = 80
keys_per_range = 条件涉及列的每个key包含的值 = 17.7777786
num_groups = (skip_scan_records / keys_per_group) + 1 = 160 / 80 + 1 = 3 --->num_groups结果
max_distinct_values = keys_per_group / keys_per_range = 80 / 17.7777786 = 4
filtering_effect = max(1/4, COND_FILTER_INEQUALITY) = 0.333
records = skip_scan_records * filtering_effect = 160 * 0.333 = 53 --->rows结果 */
下面例子有的没用 skip scan 有的用了,区别仅在于包含的条件,根据条件最后生成的 mm tree 不同导致结果不同。
-- 查询语句的Item包含不在联合索引的列c4,因此被判定为"query_references_nonkey_column",不使用skip scan。
greatsql> EXPLAIN SELECT c1, c2,c3,c4 FROM t1 WHERE c3 >= 40 and c1<5;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | ALL | i1_t1 | NULL | NULL | NULL | 160 | 33.33 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
-- 这个例子mm tree的c1<5因为不是等号条件因此被判断为"prefix_not_const_equality",因此不执行skip scan。
greatsql> EXPLAIN SELECT c1, c2,c3 FROM t1 WHERE c3 >= 40 and c1<5;
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | t1 | NULL | index | i1_t1 | i1_t1 | 15 | NULL | 160 | 33.33 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
-- 这个例子mm tree只有一个40 <= c3,没有c1相关的tree,因此不需要判断索引前缀。
greatsql> EXPLAIN SELECT c1, c2,c3 FROM t1 WHERE c3 >= 40 and 0<c1<3;
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
| 1 | SIMPLE | t1 | NULL | range | i1_t1 | i1_t1 | 15 | NULL | 53 | 100.00 | Using where; Using index for skip scan |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
/* ※0<c1<3不是标准条件,不支持生成mm tree,详情见https://bugs.mysql.com/bug.php?id=116515&thanks=4。这里可以改成c1 between 0 and 3 */
/*这个例子mm tree有两个40 <= c3和0<c1<3,0<c1<3被判断为"prefix_not_const_equality",因此不能用skip scan方式。*/
greatsql> EXPLAIN SELECT c1, c2,c3 FROM t1 WHERE c3 >= 40 and 0<c1 and c1<3;
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | t1 | NULL | index | i1_t1 | i1_t1 | 15 | NULL | 160 | 33.33 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
-- 这个例子mm tree有两个:c1 = 5和40 <= c3,c1条件满足等号条件的要求,可以考虑用skip scan。
greatsql> EXPLAIN SELECT c1, c2,c3 FROM t1 WHERE c3 >= 40 and c1=5;
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+--------------------------+
| 1 | SIMPLE | t1 | NULL | ref | i1_t1 | i1_t1 | 5 | const | 1 | 33.33 | Using where; Using index |
+----+-------------+-------+------------+------+---------------+-------+---------+-------+------+----------+--------------------------+
"skip_scan_range": {
"potential_skip_scan_indexes": [
{
"index": "i1_t1",
"tree_travel_cost": 0.4,
"num_groups": 1, -- 这里分组为1,因为c1=5前缀列算出来的skip_scan_records=1,因为不足1所以最后取1
"rows": 1,
"cost": 1.2
}
]
},
"best_skip_scan_summary": {
"type": "skip_scan",
"index": "i1_t1",
"key_parts_used_for_access": [
"c1",
"c2",
"c3"
],
"prefix ranges": [ -- 这里的prefix指的是条件出现的联合索引的第一列
"c1 = 5"
],
"range": [ -- 这里的范围指的是可以用skip scan的列
"40 <= c3"
],
"chosen": true
/* 这个例子mm tree有两个:c2=2和40 <= c3,skip scan只选除了联合索引第一个列以外遇到的第一个条件,因此最后用的是c2=2作为skip scan的条件。*/
greatsql> EXPLAIN SELECT c1, c2,c3 FROM t1 WHERE c3 >= 40 and c2=2;
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
| 1 | SIMPLE | t1 | NULL | range | i1_t1 | i1_t1 | 10 | NULL | 40 | 33.33 | Using where; Using index for skip scan |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
"skip_scan_range": {
"potential_skip_scan_indexes": [
{
"index": "i1_t1",
"tree_travel_cost": 0.4,
"num_groups": 3, -- 这里分组为3,因为用c1来分组的,c1只有1和2两个不同值,因此结果为2+1=3
"rows": 40,
"cost": 8.67494
}
]
},
"best_skip_scan_summary": {
"type": "skip_scan",
"index": "i1_t1",
"key_parts_used_for_access": [
"c1",
"c2"
],
"range": [ -- 这里的范围指的是可以用skip scan的列。由于条件没有涉及第一列c1,因此没有出现"prefix ranges"选项。
"c2 = 2"
],
"chosen": true
-- 这个例子mm tree有1个:40 <= c3,最后用的是c1和c2作为前缀。
greatsql> EXPLAIN SELECT c1, c2,c3 FROM t1 WHERE c3 >= 40;
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
| 1 | SIMPLE | t1 | NULL | range | i1_t1 | i1_t1 | 15 | NULL | 53 | 100.00 | Using where; Using index for skip scan |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
"skip_scan_range": {
"potential_skip_scan_indexes": [
{
"index": "i1_t1",
"tree_travel_cost": 0.4,
"num_groups": 10, -- 这里公式=160 / 17.777 + 1 = 9+1 = 10,因为没有前缀条件可以过滤条件
"rows": 53,
"cost": 16.2332
}
]
},
"best_skip_scan_summary": {
"type": "skip_scan",
"index": "i1_t1",
"key_parts_used_for_access": [
"c1",
"c2",
"c3"
],
"range": [
"40 <= c3"
],
"chosen": true
},
-- 下面的例子因为出现了联合索引的所有列c1和c2和c3,因此不符合skip scan要求,不能用skip scan。
greatsql> EXPLAIN SELECT c1, c2,c3 FROM t1 WHERE c3 >= 40 and c1=1 and c2=2;
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | t1 | NULL | range | i1_t1 | i1_t1 | 15 | NULL | 1 | 100.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
下面看一下用到 SKIP_SCAN 提示关键词的情况。下面这个例子没加关键字的时候因为 cost 没选择index skip scan
,加上关键字之后就会强制走index skip scan
greatsql> EXPLAIN SELECT /*+ SKIP_SCAN(t1) */ c1, c2,c3 FROM t1 WHERE c3 >= 40 and c1=5;
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
| 1 | SIMPLE | t1 | NULL | range | i1_t1 | i1_t1 | 15 | NULL | 1 | 100.00 | Using where; Using index for skip scan |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+----------------------------------------+
下面看一下用到 NO_SKIP_SCAN 提示关键词的情况。
greatsql> EXPLAIN SELECT /*+ NO_SKIP_SCAN(t1 i1_t1) */ c1, c2,c3 FROM t1 WHERE c3 >= 40;
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | t1 | NULL | index | NULL | i1_t1 | 15 | NULL | 160 | 33.33 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+-------+---------+------+------+----------+--------------------------+
从上面 skip index 的代码我们了解了跳跃扫描的使用条件和使用场景,以及使用规则,这个功能让一些以前用不到联合索引的查询也可以用到索引,提高了效率,但也有一定的局限和缺点,要看具体表的情况来决定。
合作电话:010-64087828
社区邮箱:greatsql@greatdb.com