GreatSQL社区

搜索

GreatSQL社区

函数索引触发的一个有趣的问题

GreatSQL社区 已有 39 次阅读2025-2-26 10:55 |系统分类:运维实战

函数索引触发的一个有趣的问题

导引

听同事提到一个有意思的事情,说在使用GreatSQL时,在navicat客户端和GreatSQL命令行客户端创建的函数索引不能共用,navicat客户端创建的函数索引,在navicat上执行SQL时可以使用,在GreatSQL命令行执行相同的SQL却用不上,反之,在GreatSQL命令行创建的函数索引,在navicat客户端无法使用,这究竟是怎么回事呢?

问题回放

创建测试表与测试数据

CREATE TABLE t1(id INT PRIMARY KEY AUTO_INCREMENT,c1 INT,c2 VARCHAR(100));

INSERT INTO t1(c1,c2) VALUES(20241209120000,'20241209120000');

查询表结构信息

greatsql> SHOW CREATE TABLE t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int NOT NULL AUTO_INCREMENT,
  `c1` bigint DEFAULT NULL,
  `c2` varchar(100) COLLATE utf8mb4_0900_bin DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin
1 row in set (0.00 sec)

navicat客户端增加c1列的函数索引idx_c1。

greatsql> ALTER TABLE t1 ADD INDEX idx_c1((SUBSTR(c1,1,8)));

navicat客户端执行如下SQL,发现可以使用idx_c1索引。

greatsql> EXPLAIN SELECT * FROM t1 WHERE SUBSTR(c1,1,8)  ='20241209';
+----+-------------+-------+------------+------+---------------+--------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key    | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+--------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | t1    | NULL       | ref  | idx_c1        | idx_c1 | 35      | const |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+--------+---------+-------+------+----------+-------+
1 row in set (0.03 sec)

GreatSQL命令行执行此SQL,发现索引idx_c1没有用上。

greatsql> EXPLAIN SELECT * FROM t1 WHERE SUBSTR(c1,1,8)  ='20241209';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

设置此索引不可见,在命令行创建索引idx_c1_1。

greatsql> ALTER TABLE t1 ALTER INDEX idx_c1 INVISIBLE;
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

greatsql> ALTER TABLE t1 ADD INDEX idx_c1_1 ((SUBSTR(c1,1,8)));
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

navicat执行SQL,发现不可以使用索引idx_c1_1;

greatsql> EXPLAIN SELECT * FROM t1 WHERE SUBSTR(c1,1,8)  ='20241209';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    2 |   100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set (0.26 sec)

GreatSQL命令行执行SQL,发现可以使用索引idx_c1_1;

greatsql> EXPLAIN SELECT * FROM t1 WHERE SUBSTR(c1,1,8)  ='20241209';
+----+-------------+-------+------------+------+---------------+----------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key      | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+----------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | t1    | NULL       | ref  | idx_c1_1      | idx_c1_1 | 35      | const |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+----------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

问题分析

什么情况下会导致索引失效呢?

我们很容易会想到以下几个可能,索引列上施加了函数或表达式计算,索引列上发生隐式转换,索引列跟比较值的character set不一致或collation不一致。

那么哪一种可能会引发上面的问题呢?

不知小伙伴们是否注意到一个细节,例子中t1表的c1列原本为bigint类型,取前8位数字创建函数索引,substr(c1,1,8),substr函数的返回值是varchar类型,上面例子中创建的函数索引其实是对字符数据substr(c1,1,8) 进行排序创建的,既然是varchar类型,就会涉及character set与collation的问题。一般来讲,对列施加字符函数,返回值应该与列的字符集和校验规则一致(本例中的c2列原本为varchar类型,对其增加函数索引,两种客户端都可以使用,读者可自行测试验证),但是问题在于本例中的c1列为bigint类型,不涉及字符集和校验规则。

那么对整型列上施加字符函数substr而创建的函数索引,它的character set和collation是依据什么呢?

针对上面的问题,很容易联想到很可能是因为两种客户端上character set和collation设置不同。

为了验证这个问题,首先查询一下两个客户端的character set与collation

navicat:

greatsql> SHOW VARIABLES LIKE 'character%';
+--------------------------+-----------------------------------+
| Variable_name            | Value                             |
+--------------------------+-----------------------------------+
| character_set_client     | utf8mb4                           |
| character_set_connection | utf8mb4                           |
| character_set_database   | utf8mb4                           |
| character_set_filesystem | binary                            |
| character_set_results    | utf8mb4                           |
| character_set_server     | utf8mb4                           |
| character_set_system     | utf8mb3                           |
| character_sets_dir       | /usr/bin/share/greatsql/charsets/ |
+--------------------------+-----------------------------------+
8 rows in set (0.03 sec)

greatsql> SHOW VARIABLES LIKE '%collation%';
+-------------------------------+--------------------+
| Variable_name                 | Value              |
+-------------------------------+--------------------+
| collation_connection          | utf8mb4_0900_ai_ci |
| collation_database            | utf8mb4_0900_bin   |
| collation_server              | utf8mb4_0900_bin   |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+
4 rows in set (0.03 sec)

GreatSQL命令行:

greatsql> SHOW VARIABLES LIKE 'character%';
+--------------------------+-----------------------------------+
| Variable_name            | Value                             |
+--------------------------+-----------------------------------+
| character_set_client     | utf8mb4                           |
| character_set_connection | utf8mb4                           |
| character_set_database   | utf8mb4                           |
| character_set_filesystem | binary                            |
| character_set_results    | utf8mb4                           |
| character_set_server     | utf8mb4                           |
| character_set_system     | utf8mb3                           |
| character_sets_dir       | /usr/bin/share/greatsql/charsets/ |
+--------------------------+-----------------------------------+
8 rows in set (0.01 sec)

greatsql> SHOW VARIABLES LIKE '%collation%';
+-------------------------------+--------------------+
| Variable_name                 | Value              |
+-------------------------------+--------------------+
| collation_connection          | utf8mb4_0900_bin   |
| collation_database            | utf8mb4_0900_bin   |
| collation_server              | utf8mb4_0900_bin   |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+
4 rows in set (0.00 sec)

对比发现,字符集设置是相同的,只有collation_connection参数存在不同。

那么是否是因为collation_connection的设置影响了函数索引的使用呢?我们继续前面的实验,设置navicat客户端的collation_connection参数 ,保持与命令行相同。

greatsql> SET collation_connection=utf8mb4_0900_bin;
Query OK, 0 rows affected (0.00 sec)

greatsql> EXPLAIN SELECT * FROM t1 WHERE substr(c1,1,8)  ='20241209';
+----+-------------+-------+------------+------+---------------+----------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key      | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+----------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | t1    | NULL       | ref  | idx_c1_1      | idx_c1_1 | 35      | const |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+----------+---------+-------+------+----------+-------+
1 row in set (0.03 sec)

发现通过改变navicat客户端collation_connection的设置,可以使用GreatSQL命令行创建的索引idx_c1_1了。这说明这个函数索引的collation跟collation_connection有关。

MySQL手册中对collation_connection参数说明如下:

The collation of the connection character set. collation_connection is important for comparisons of literal strings. For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence .

此参数对字符串常量的比较影响比较大,而对两个列比较影响不大,因为列有自己的collation规则,本例中恰好是函数索引列跟字符常量的比较。

为什么 navicat客户端的collation_connection跟GreatSQL命令行设置会不同呢

我们打开general log,看一下navicat在连接数据库时发生了什么?

2024-12-10T09:08:29.460226+08:00           10 Connect   greatsql@172.16.64.235 on  using TCP/IP
2024-12-10T09:08:29.463713+08:00           10 Query     SET NAMES utf8mb4
2024-12-10T09:08:29.468336+08:00           10 Query     SHOW VARIABLES LIKE 'lower_case_%';
2024-12-10T09:08:29.472745+08:00           10 Query     SHOW VARIABLES LIKE 'sql_mode';
2024-12-10T09:08:29.475088+08:00           10 Query     SELECT COUNT(*) AS support_ndb FROM information_schema.ENGINES WHERE Engine = 'ndbcluster'
2024-12-10T09:08:29.490589+08:00           10 Query     SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA

使用navicat打开数据库的连接,从general log可以看到数据库接收到“SET NAMES utf8mb4”。

set names语句的语法如下:

SET NAMES {'charset_name'
    [COLLATE 'collation_name'] | DEFAULT}

此语句会改变三个session级会话参数,character_set_client, character_set_connection, and character_set_results。

未指定COLLATE子句时,会将会话级的[collation_connection]参数设置为指定charset_name的默认的collation。从上面查询数据库的character set和collation相关参数设置时可以看到,参数default_collation_for_utf8mb4的值为utf8mb4_0900_ai_ci,所以字符集utf8mb4的默认collation是utf8mb4_0900_ai_ci。

而数据库在global级设置collation_connection的值为utf8mb4_0900_bin。GreatSQL命令行连接数据库并没有发出set names语句,继承了数据库global级的collation_connection参数设置,这才造成了navicat客户端与GreatSQL命令行客户端的session级collation_connection参数设置不同。

问题解决

1.在navicat客户端上涉及此问题时,使用set names语句修改会话级参数设置,与数据库global参数设置保持一致。

SET names utf8mb4 collate utf8mb4_0900_bin;

然后再创建函数索引,执行相关SQL。

2.从根本上解决,还是table设计要规范,本案例中之所以涉及这个问题,是因为时间数据用整型来存储,查询时又按照字符类型来查询一天的数据,所以要建substr函数索引来查询,兜兜转转的转换,还是挺累的。最好时间数据就用时间类型的字段来存储。即便用整型来存储了时间数据,也不一定非要靠建函数索引来解决此问题。直接在列上建索引,语句条件调整为 c1>=20241209000000 and c1<20241210000000, 这样才具备通用性,虽然数据库提供了建函数索引的功能,但其实通过建函数索引来解决问题实在是下下策,因为函数索引的使用具备局限性,换一种查法,可能就用不上了。


评论 (0 个评论)

facelist

您需要登录后才可以评论 登录 | 立即注册

合作电话:010-64087828

社区邮箱:greatsql@greatdb.com

社区公众号
社区小助手
QQ群
GMT+8, 2025-3-1 00:16 , Processed in 0.020133 second(s), 9 queries , Redis On.
返回顶部