mysql 8.0 left join时主表每次循环相关联子查询,每次都循环3次
本帖最后由 leoxu8703 于 2024-5-10 17:00 编辑最近5.7遇到A left join B on a.id=(子查询),发现性能低效。通过慢日志里面row_examined指标发现扫描行数异常。为了验证具体扫描次数与行数,把数据导入到8.0数据库中,通过explain analyze模式.A与B关系是1:N关系,只要能关联到记录,子查询都被循环三次。所以导致性能下降。
for example:CREATE TABLE a (
id int NOT NULL DEFAULT ‘0’,
claim_status int NOT NULL,
claim_time datetime DEFAULT NULL,
type_id int DEFAULT NULL ,
create_time datetime NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
CREATE TABLE b (
id varchar(40) CHARACTER SET utf8 NOT NULL,
headless_package_id int DEFAULT NULL,
claim_time datetime DEFAULT NULL,
confirm_time datetime DEFAULT NULL,
PRIMARY KEY (id),
KEY index01 (headless_package_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
root@localhost : test 03:32:02> select * from A;
+----------+--------------+---------------------+---------+---------------------+
| id | claim_status | claim_time | type_id | create_time |
+----------+--------------+---------------------+---------+---------------------+
| 99358404 | 2 | 2022-01-08 10:05:43 | 0 | 2022-01-01 00:10:33 |
| 99358412 | 2 | 2022-01-10 14:56:02 | 0 | 2022-01-01 00:13:26 |
| 99358422 | 2 | 2022-01-19 10:50:43 | 0 | 2022-01-01 00:19:03 |
| 99358442 | 2 | 2022-01-06 19:27:27 | 0 | 2022-01-01 00:30:23 |
| 99358446 | 2 | 2022-01-07 18:22:02 | 0 | 2022-01-01 00:32:31 |
| 99358448 | 2 | 2022-01-07 10:51:45 | 0 | 2022-01-01 00:33:07 |
| 99358450 | 2 | 2022-01-04 11:03:58 | 0 | 2022-01-01 00:34:20 |
| 99358452 | 2 | 2022-01-07 09:43:12 | 0 | 2022-01-01 00:36:49 |
+----------+--------------+---------------------+---------+---------------------+
8 rows in set (0.00 sec)
root@localhost : test 03:32:42> select * from B;
+----------------------------------+---------------------+---------------------+---------------------+
| id | headless_package_id | claim_time | confirm_time |
+----------------------------------+---------------------+---------------------+---------------------+
| 443ec574f6c14539a549a5729be3e7f7 | 99358412 | 2022-01-10 14:56:02 | 2022-01-10 15:23:02 |
| 442f842b75e94d8bb62251900b3296e3 | 99358442 | 2022-01-06 19:27:28 | 2022-01-07 10:20:10 |
| 44f0ba8c7cdb4e2fa728bd2b7eb18244 | 99358448 | 2022-01-07 10:51:46 | 2022-01-07 11:11:06 |
| 44415ed44cd1480fb03cf3da68f87b68 | 99358450 | 2022-01-03 16:02:41 | 2022-01-03 17:47:38 |
| 44b867e336554c139bf30346c141f98d | 99358446 | 2022-01-07 18:22:03 | 2022-01-08 09:01:49 |
| 444d00e4cf324e42a3ef6e9f34c0599f | 99358404 | 2022-01-08 10:05:43 | 2022-01-08 13:36:53 |
| 4492fb67f3774c97a359fdbe11aa2bb2 | 99358452 | 2022-01-07 09:43:13 | 2022-01-07 10:36:26 |
| 44838565c02c4daf98cf9b3d333d2d2d | 99358452 | 2022-01-06 12:13:25 | 2022-01-06 13:24:58 |
| 4434a8c0141243adbff26406b5b484d6 | 99358422 | 2022-01-19 10:50:43 | 2022-01-19 11:01:39 |
| 44f57bc8ac4c4fbe9ec4289bb0d6c99e | 99358452 | 2022-01-03 09:52:19 | 2022-01-03 10:37:11 |
| 445f647b1075462f8ef7189ab75ea6cc | 99358450 | 2022-01-04 11:03:58 | 2022-01-04 18:13:29 |
+----------------------------------+---------------------+---------------------+---------------------+
11 rows in set (0.00 sec)
<div class="blockcode"><blockquote>example A : A has 8 records,B loop 24.
explain analyze select h.id as hid
from A h
LEFT JOIN B claim
ON claim.id =
(SELECT c_in.id
FROM B c_in
WHERE c_in.headless_package_id = h.id
ORDER BY c_in.claim_time DESC LIMIT 1);
| EXPLAIN |
| → Nested loop left join (cost=3.85 rows=8) (actual time=0.313…0.800 rows=8 loops=1)
→ Index scan on h using PRIMARY (cost=1.05 rows=8) (actual time=0.053…0.061 rows=8 loops=1)
→ Filter: (claim.id = (select #2)) (cost=0.26 rows=1) (actual time=0.091…0.091 rows=1 loops=8)
→ Single-row index lookup on claim using PRIMARY (id=(select #2)) (cost=0.26 rows=1) (actual time=0.074…0.074 rows=1 loops=8)
→ Select #2 (subquery in condition; dependent)
→ Limit: 1 row(s) (cost=0.48 rows=1) (actual time=0.023…0.023 rows=1 loops=24)
→ Sort: c_in.claim_time DESC, limit input to 1 row(s) per chunk (cost=0.48 rows=1) (actual time=0.023…0.023 rows=1 loops=24)
→ Index lookup on c_in using index01 (headless_package_id=h.id) (actual time=0.015…0.017 rows=1 loops=24)
例子二:当主表只有一条满足时,B循环三次(可能需要构造记录多点)
root@localhost : test 04:37:54> explain analyze SELECT COUNT(1)
-> FROM (select h.id as hid,claim.id,claim.headless_package_id, claim.claim_time
-> from A h
-> LEFT JOIN B claim
->ON claim.id =
->(SELECT c_in.id
->FROM B c_in
-> WHERE c_in.headless_package_id = h.id
->ORDER BY c_in.claim_time DESC LIMIT 1)
-> where h.id in (
-> 99358412
-> ))
-> TOTAL;
| EXPLAIN | -> Aggregate: count(1)(actual time=0.547..0.547 rows=1 loops=1)
-> Nested loop left join(cost=39126.25 rows=10) (actual time=0.537..0.540 rows=1 loops=1)
-> Index lookup on h using <auto_key0> (id=99358412 )(actual time=0.010..0.012 rows=1 loops=1)
-> Materialize(cost=39125.00..39125.00 rows=10) (actual time=0.252..0.254 rows=1 loops=1)
-> Limit: 10 row(s)(cost=39124.00 rows=10) (actual time=0.177..0.186 rows=10 loops=1)
-> Table scan on hn(cost=39124.00 rows=388360) (actual time=0.176..0.182 rows=10 loops=1)
-> Filter: (claim.id = (select #4))(cost=0.35 rows=1) (actual time=0.281..0.282 rows=1 loops=1)
-> Single-row index lookup on claim using PRIMARY (id=(select #4))(cost=0.35 rows=1) (actual time=0.258..0.258 rows=1 loops=1)
-> Select #4 (subquery in condition; dependent)
-> Limit: 1 row(s)(cost=0.37 rows=1) (actual time=0.077..0.077 rows=1 loops=3)
-> Sort: c_in.claim_time DESC, limit input to 1 row(s) per chunk(cost=0.37 rows=1) (actual time=0.076..0.076 rows=1 loops=3)
-> Index lookup on c_in using idex_headless_id (headless_package_id=h.id)(actual time=0.065..0.067 rows=1 loops=3)
麻烦重新整理下文中的代码段以提高可读性,手动用 "\代码段\" (实际使用时去掉 "\",我这里演示不这么加会被转义)这种方式包含起来,并且删掉里面的一些非数据库代码,例如 "" 这种 yejr 发表于 2024-5-10 16:47
麻烦重新整理下文中的代码段以提高可读性,手动用 "\代码段\" (实际使用时去掉 "\",我这 ...
已处理 leoxu8703 发表于 2024-5-10 17:01
已处理
b表的id列是varchar,而a表的id列是int,先把二者统一下再看看呢
请补充提供a\b表用于INSERT的SQL吧,方便复现测例
另外,你用的8.0具体什么版本? yejr 发表于 2024-5-11 08:22
b表的id列是varchar,而a表的id列是int,先把二者统一下再看看呢
请补充提供a\b表用于INSERT的SQL吧,方 ...
2个表不是通过id关联的。
页:
[1]