|
听同事提到一个有意思的事情,说在使用GreatSQL时,在navicat客户端和GreatSQL命令行客户端创建的函数索引不能共用,navicat客户端创建的函数索引,在navicat上执行SQL时可以使用,在GreatSQL命令行执行相同的SQL却用不上,反之,在GreatSQL命令行创建的函数索引,在navicat客户端无法使用,这究竟是怎么回事呢?
创建测试表与测试数据
CREATE TABLE t1(id INT PRIMARY KEY AUTO_INCREMENT,c1 INT,c2 VARCHAR(100));
INSERT INTO t1(c1,c2) VALUES(20241209120000,'20241209120000');
查询表结构信息
greatsql> SHOW CREATE TABLE t1\G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`id` int NOT NULL AUTO_INCREMENT,
`c1` bigint DEFAULT NULL,
`c2` varchar(100) COLLATE utf8mb4_0900_bin DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin
1 row in set (0.00 sec)
navicat客户端增加c1列的函数索引idx_c1。
greatsql> ALTER TABLE t1 ADD INDEX idx_c1((SUBSTR(c1,1,8)));
navicat客户端执行如下SQL,发现可以使用idx_c1索引。
greatsql> EXPLAIN SELECT * FROM t1 WHERE SUBSTR(c1,1,8) ='20241209';
+----+-------------+-------+------------+------+---------------+--------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+--------+---------+-------+------+----------+-------+
| 1 | SIMPLE | t1 | NULL | ref | idx_c1 | idx_c1 | 35 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+--------+---------+-------+------+----------+-------+
1 row in set (0.03 sec)
GreatSQL命令行执行此SQL,发现索引idx_c1没有用上。
greatsql> EXPLAIN SELECT * FROM t1 WHERE SUBSTR(c1,1,8) ='20241209';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
设置此索引不可见,在命令行创建索引idx_c1_1。
greatsql> ALTER TABLE t1 ALTER INDEX idx_c1 INVISIBLE;
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
greatsql> ALTER TABLE t1 ADD INDEX idx_c1_1 ((SUBSTR(c1,1,8)));
Query OK, 0 rows affected (0.04 sec)
Records: 0 Duplicates: 0 Warnings: 0
navicat执行SQL,发现不可以使用索引idx_c1_1;
greatsql> EXPLAIN SELECT * FROM t1 WHERE SUBSTR(c1,1,8) ='20241209';
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------------+
1 row in set (0.26 sec)
GreatSQL命令行执行SQL,发现可以使用索引idx_c1_1;
greatsql> EXPLAIN SELECT * FROM t1 WHERE SUBSTR(c1,1,8) ='20241209';
+----+-------------+-------+------------+------+---------------+----------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+----------+---------+-------+------+----------+-------+
| 1 | SIMPLE | t1 | NULL | ref | idx_c1_1 | idx_c1_1 | 35 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+----------+---------+-------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)
什么情况下会导致索引失效呢?
我们很容易会想到以下几个可能,索引列上施加了函数或表达式计算,索引列上发生隐式转换,索引列跟比较值的character set不一致或collation不一致。
那么哪一种可能会引发上面的问题呢?
不知小伙伴们是否注意到一个细节,例子中t1表的c1列原本为bigint类型,取前8位数字创建函数索引,substr(c1,1,8),substr函数的返回值是varchar类型,上面例子中创建的函数索引其实是对字符数据substr(c1,1,8) 进行排序创建的,既然是varchar类型,就会涉及character set与collation的问题。一般来讲,对列施加字符函数,返回值应该与列的字符集和校验规则一致(本例中的c2列原本为varchar类型,对其增加函数索引,两种客户端都可以使用,读者可自行测试验证),但是问题在于本例中的c1列为bigint类型,不涉及字符集和校验规则。
那么对整型列上施加字符函数substr而创建的函数索引,它的character set和collation是依据什么呢?
针对上面的问题,很容易联想到很可能是因为两种客户端上character set和collation设置不同。
为了验证这个问题,首先查询一下两个客户端的character set与collation
navicat:
greatsql> SHOW VARIABLES LIKE 'character%';
+--------------------------+-----------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8mb3 |
| character_sets_dir | /usr/bin/share/greatsql/charsets/ |
+--------------------------+-----------------------------------+
8 rows in set (0.03 sec)
greatsql> SHOW VARIABLES LIKE '%collation%';
+-------------------------------+--------------------+
| Variable_name | Value |
+-------------------------------+--------------------+
| collation_connection | utf8mb4_0900_ai_ci |
| collation_database | utf8mb4_0900_bin |
| collation_server | utf8mb4_0900_bin |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+
4 rows in set (0.03 sec)
GreatSQL命令行:
greatsql> SHOW VARIABLES LIKE 'character%';
+--------------------------+-----------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8mb3 |
| character_sets_dir | /usr/bin/share/greatsql/charsets/ |
+--------------------------+-----------------------------------+
8 rows in set (0.01 sec)
greatsql> SHOW VARIABLES LIKE '%collation%';
+-------------------------------+--------------------+
| Variable_name | Value |
+-------------------------------+--------------------+
| collation_connection | utf8mb4_0900_bin |
| collation_database | utf8mb4_0900_bin |
| collation_server | utf8mb4_0900_bin |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+
4 rows in set (0.00 sec)
对比发现,字符集设置是相同的,只有collation_connection
参数存在不同。
那么是否是因为collation_connection
的设置影响了函数索引的使用呢?我们继续前面的实验,设置navicat客户端的collation_connection
参数 ,保持与命令行相同。
greatsql> SET collation_connection=utf8mb4_0900_bin;
Query OK, 0 rows affected (0.00 sec)
greatsql> EXPLAIN SELECT * FROM t1 WHERE substr(c1,1,8) ='20241209';
+----+-------------+-------+------------+------+---------------+----------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+----------+---------+-------+------+----------+-------+
| 1 | SIMPLE | t1 | NULL | ref | idx_c1_1 | idx_c1_1 | 35 | const | 1 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+----------+---------+-------+------+----------+-------+
1 row in set (0.03 sec)
发现通过改变navicat客户端collation_connection的设置,可以使用GreatSQL命令行创建的索引idx_c1_1了。这说明这个函数索引的collation跟collation_connection有关。
MySQL手册中对collation_connection参数说明如下:
The collation of the connection character set. collation_connection is important for comparisons of literal strings. For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence .
此参数对字符串常量的比较影响比较大,而对两个列比较影响不大,因为列有自己的collation规则,本例中恰好是函数索引列跟字符常量的比较。
为什么 navicat客户端的collation_connection跟GreatSQL命令行设置会不同呢?
我们打开general log,看一下navicat在连接数据库时发生了什么?
2024-12-10T09:08:29.460226+08:00 10 Connect greatsql@172.16.64.235 on using TCP/IP
2024-12-10T09:08:29.463713+08:00 10 Query SET NAMES utf8mb4
2024-12-10T09:08:29.468336+08:00 10 Query SHOW VARIABLES LIKE 'lower_case_%';
2024-12-10T09:08:29.472745+08:00 10 Query SHOW VARIABLES LIKE 'sql_mode';
2024-12-10T09:08:29.475088+08:00 10 Query SELECT COUNT(*) AS support_ndb FROM information_schema.ENGINES WHERE Engine = 'ndbcluster'
2024-12-10T09:08:29.490589+08:00 10 Query SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA
使用navicat打开数据库的连接,从general log可以看到数据库接收到“SET NAMES utf8mb4”。
set names语句的语法如下:
SET NAMES {'charset_name'
[COLLATE 'collation_name'] | DEFAULT}
此语句会改变三个session级会话参数,character_set_client, character_set_connection, and character_set_results。
未指定COLLATE子句时,会将会话级的[collation_connection
]参数设置为指定charset_name的默认的collation。从上面查询数据库的character set和collation相关参数设置时可以看到,参数default_collation_for_utf8mb4的值为utf8mb4_0900_ai_ci,所以字符集utf8mb4的默认collation是utf8mb4_0900_ai_ci。
而数据库在global级设置collation_connection
的值为utf8mb4_0900_bin。GreatSQL命令行连接数据库并没有发出set names语句,继承了数据库global级的collation_connection
参数设置,这才造成了navicat客户端与GreatSQL命令行客户端的session级collation_connection
参数设置不同。
1.在navicat客户端上涉及此问题时,使用set names语句修改会话级参数设置,与数据库global参数设置保持一致。
SET names utf8mb4 collate utf8mb4_0900_bin;
然后再创建函数索引,执行相关SQL。
2.从根本上解决,还是table设计要规范,本案例中之所以涉及这个问题,是因为时间数据用整型来存储,查询时又按照字符类型来查询一天的数据,所以要建substr函数索引来查询,兜兜转转的转换,还是挺累的。最好时间数据就用时间类型的字段来存储。即便用整型来存储了时间数据,也不一定非要靠建函数索引来解决此问题。直接在列上建索引,语句条件调整为 c1>=20241209000000 and c1<20241210000000, 这样才具备通用性,虽然数据库提供了建函数索引的功能,但其实通过建函数索引来解决问题实在是下下策,因为函数索引的使用具备局限性,换一种查法,可能就用不上了。
合作电话:010-64087828
社区邮箱:greatsql@greatdb.com