mysql 8.0 left join时主表每次循环相关联子查询，每次都循环3次

leoxu8703 发表于 2024-5-10 16:40:41

本帖最后由 leoxu8703 于 2024-5-10 17:00 编辑

最近5.7遇到A left join B on a.id=(子查询)，发现性能低效。通过慢日志里面row_examined指标发现扫描行数异常。为了验证具体扫描次数与行数，把数据导入到8.0数据库中，通过explain analyze模式.A与B关系是1:N关系，只要能关联到记录，子查询都被循环三次。所以导致性能下降。

for example:CREATE TABLE a (
id int NOT NULL DEFAULT ‘0’,
claim_status int NOT NULL,
claim_time datetime DEFAULT NULL,
type_id int DEFAULT NULL ,
create_time datetime NOT NULL,
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

CREATE TABLE b (
id varchar(40) CHARACTER SET utf8 NOT NULL,
headless_package_id int DEFAULT NULL,
claim_time datetime DEFAULT NULL,
confirm_time datetime DEFAULT NULL,
PRIMARY KEY (id),
KEY index01 (headless_package_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

root@localhost : test 03:32:02> select * from A;
+----------+--------------+---------------------+---------+---------------------+
| id    | claim_status | claim_time       | type_id | create_time       |
+----------+--------------+---------------------+---------+---------------------+
| 99358404 |          2 | 2022-01-08 10:05:43 |    0 | 2022-01-01 00:10:33 |
| 99358412 |          2 | 2022-01-10 14:56:02 |    0 | 2022-01-01 00:13:26 |
| 99358422 |          2 | 2022-01-19 10:50:43 |    0 | 2022-01-01 00:19:03 |
| 99358442 |          2 | 2022-01-06 19:27:27 |    0 | 2022-01-01 00:30:23 |
| 99358446 |          2 | 2022-01-07 18:22:02 |    0 | 2022-01-01 00:32:31 |
| 99358448 |          2 | 2022-01-07 10:51:45 |    0 | 2022-01-01 00:33:07 |
| 99358450 |          2 | 2022-01-04 11:03:58 |    0 | 2022-01-01 00:34:20 |
| 99358452 |          2 | 2022-01-07 09:43:12 |    0 | 2022-01-01 00:36:49 |
+----------+--------------+---------------------+---------+---------------------+
8 rows in set (0.00 sec)

root@localhost : test 03:32:42> select * from B;
+----------------------------------+---------------------+---------------------+---------------------+
| id                            | headless_package_id | claim_time       | confirm_time    |
+----------------------------------+---------------------+---------------------+---------------------+
| 443ec574f6c14539a549a5729be3e7f7 |          99358412 | 2022-01-10 14:56:02 | 2022-01-10 15:23:02 |
| 442f842b75e94d8bb62251900b3296e3 |          99358442 | 2022-01-06 19:27:28 | 2022-01-07 10:20:10 |
| 44f0ba8c7cdb4e2fa728bd2b7eb18244 |          99358448 | 2022-01-07 10:51:46 | 2022-01-07 11:11:06 |
| 44415ed44cd1480fb03cf3da68f87b68 |          99358450 | 2022-01-03 16:02:41 | 2022-01-03 17:47:38 |
| 44b867e336554c139bf30346c141f98d |          99358446 | 2022-01-07 18:22:03 | 2022-01-08 09:01:49 |
| 444d00e4cf324e42a3ef6e9f34c0599f |          99358404 | 2022-01-08 10:05:43 | 2022-01-08 13:36:53 |
| 4492fb67f3774c97a359fdbe11aa2bb2 |          99358452 | 2022-01-07 09:43:13 | 2022-01-07 10:36:26 |
| 44838565c02c4daf98cf9b3d333d2d2d |          99358452 | 2022-01-06 12:13:25 | 2022-01-06 13:24:58 |
| 4434a8c0141243adbff26406b5b484d6 |          99358422 | 2022-01-19 10:50:43 | 2022-01-19 11:01:39 |
| 44f57bc8ac4c4fbe9ec4289bb0d6c99e |          99358452 | 2022-01-03 09:52:19 | 2022-01-03 10:37:11 |
| 445f647b1075462f8ef7189ab75ea6cc |          99358450 | 2022-01-04 11:03:58 | 2022-01-04 18:13:29 |
+----------------------------------+---------------------+---------------------+---------------------+
11 rows in set (0.00 sec)

<div class="blockcode"><blockquote>example A : A has 8 records,B loop 24.
explain analyze select h.id as hid
from A h
LEFT JOIN B claim
ON claim.id =
(SELECT c_in.id
FROM B c_in
WHERE c_in.headless_package_id = h.id
ORDER BY c_in.claim_time DESC LIMIT 1);
| EXPLAIN |
| → Nested loop left join (cost=3.85 rows=8) (actual time=0.313…0.800 rows=8 loops=1)
→ Index scan on h using PRIMARY (cost=1.05 rows=8) (actual time=0.053…0.061 rows=8 loops=1)
→ Filter: (claim.id = (select #2)) (cost=0.26 rows=1) (actual time=0.091…0.091 rows=1 loops=8)
→ Single-row index lookup on claim using PRIMARY (id=(select #2)) (cost=0.26 rows=1) (actual time=0.074…0.074 rows=1 loops=8)
→ Select #2 (subquery in condition; dependent)
→ Limit: 1 row(s) (cost=0.48 rows=1) (actual time=0.023…0.023 rows=1 loops=24)
→ Sort: c_in.claim_time DESC, limit input to 1 row(s) per chunk (cost=0.48 rows=1) (actual time=0.023…0.023 rows=1 loops=24)
→ Index lookup on c_in using index01 (headless_package_id=h.id) (actual time=0.015…0.017 rows=1 loops=24)

例子二:当主表只有一条满足时，B循环三次(可能需要构造记录多点)
root@localhost : test 04:37:54> explain analyze SELECT COUNT(1)
-> FROM (select h.id as hid,claim.id,claim.headless_package_id, claim.claim_time
-> from A h
-> LEFT JOIN B claim
->ON claim.id =
->(SELECT c_in.id
->FROM B c_in
-> WHERE c_in.headless_package_id = h.id
->ORDER BY c_in.claim_time DESC LIMIT 1)
-> where h.id in (
-> 99358412
-> ))
-> TOTAL;
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | -> Aggregate: count(1)(actual time=0.547..0.547 rows=1 loops=1)
-> Nested loop left join(cost=39126.25 rows=10) (actual time=0.537..0.540 rows=1 loops=1)
   -> Index lookup on h using <auto_key0> (id=99358412 )(actual time=0.010..0.012 rows=1 loops=1)
         -> Materialize(cost=39125.00..39125.00 rows=10) (actual time=0.252..0.254 rows=1 loops=1)
            -> Limit: 10 row(s)(cost=39124.00 rows=10) (actual time=0.177..0.186 rows=10 loops=1)
               -> Table scan on hn(cost=39124.00 rows=388360) (actual time=0.176..0.182 rows=10 loops=1)
   -> Filter: (claim.id = (select #4))(cost=0.35 rows=1) (actual time=0.281..0.282 rows=1 loops=1)
         -> Single-row index lookup on claim using PRIMARY (id=(select #4))(cost=0.35 rows=1) (actual time=0.258..0.258 rows=1 loops=1)
         -> Select #4 (subquery in condition; dependent)
            -> Limit: 1 row(s)(cost=0.37 rows=1) (actual time=0.077..0.077 rows=1 loops=3)
               -> Sort: c_in.claim_time DESC, limit input to 1 row(s) per chunk(cost=0.37 rows=1) (actual time=0.076..0.076 rows=1 loops=3)
                     -> Index lookup on c_in using idex_headless_id (headless_package_id=h.id)(actual time=0.065..0.067 rows=1 loops=3)

yejr 发表于 2024-5-10 16:47:15

麻烦重新整理下文中的代码段以提高可读性，手动用 "\代码段\" （实际使用时去掉 "\"，我这里演示不这么加会被转义）这种方式包含起来，并且删掉里面的一些非数据库代码，例如 "" 这种

leoxu8703 发表于 2024-5-10 17:01:11

yejr 发表于 2024-5-10 16:47
麻烦重新整理下文中的代码段以提高可读性，手动用 "\代码段\" （实际使用时去掉 "\"，我这 ...

已处理

yejr 发表于 2024-5-11 08:22:37

leoxu8703 发表于 2024-5-10 17:01
已处理

b表的id列是varchar，而a表的id列是int，先把二者统一下再看看呢

请补充提供a\b表用于INSERT的SQL吧，方便复现测例

另外，你用的8.0具体什么版本？

leoxu8703 发表于 2024-5-11 09:16:06

yejr 发表于 2024-5-11 08:22
b表的id列是varchar，而a表的id列是int，先把二者统一下再看看呢

请补充提供a\b表用于INSERT的SQL吧，方 ...

2个表不是通过id关联的。

页: [1]

GreatSQL社区's Archiver

mysql 8.0 left join时主表每次循环相关联子查询，每次都循环3次