代码之家 › 专栏 › 技术社区 › jon skulski

左连接是我想要的,但是它们很慢?

join optimization sql

jon skulski · 技术社区 · 16 年前

概述:

我有三张表1)订户、bios和shirtsizes,我需要找到没有bio或shirtsizes的订户

桌子的布局如下

订户

| season_id |  user_id |

生物

| bio_id | user_id |

衬衫尺寸

| bio_id | shirtsize |

我需要找到在任何特定季节都没有Bio或Shirtsize(如果没有Bio,那么就没有关系的Shirtsize)的所有用户。

我最初写了一个查询,比如:

SELECT *
   FROM subscribers s 
   LEFT JOIN bio b ON b.user_id = subscribers.user_id 
   LEFT JOIN shirtsizes ON shirtsize.bio_id = bio.bio_id 
WHERE s.season_id = 185181 AND (bio.bio_id IS NULL OR shirtsize.size IS NULL);

但现在要花10秒钟才能完成。

我想知道如何重新构造查询(或者可能是问题),以便它能够合理地执行。

这是mysql的解释:(ogu=订户,b=bio,tn=shirtshize)

| id | select_type | table | type  | possible_keys | key     | key_len | ref         | rows   | Extra       |   
+----+-------------+-------+-------+---------------+---------+---------+-------------+--------+-------------+    
|  1 | SIMPLE      | ogu   | ref   | PRIMARY       | PRIMARY | 4       | const       |    133 | Using where |
|  1 | SIMPLE      | b     | index | NULL          | PRIMARY | 8       | NULL        | 187644 | Using index |
|  1 | SIMPLE      | tn    | ref   | nid           | nid     | 4       | waka2.b.nid |      1 | Using where |

上面的内容经过了很好的净化,下面是realz信息:

mysql> DESCRIBE subscribers
+-----------+---------+------+-----+---------+-------+
| Field     | Type    | Null | Key | Default | Extra |
+-----------+---------+------+-----+---------+-------+
| subscribers  | int(11) | NO   | PRI |         |       | 
| uid       | int(11) | NO   | PRI |         |       | 


mysql> DESCRIBE bio;
+-------+------------------+------+-----+---------+-------+
| Field | Type             | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+-------+
| bio_id   | int(10) unsigned | NO   | PRI | 0       |       | 
| uid   | int(10) unsigned | NO   | PRI | 0       |       | 


mysql> DESCRIBE shirtsize;
+-------+------------------+------+-----+---------+-------+
| Field | Type             | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+-------+
| bio_id   | int(10) unsigned | NO   | PRI | 0       |       | 
| shirtsize   | int(10) unsigned | NO   | PRI | 0       |       |

真正的查询如下:

SELECT ogu.nid, ogu.is_active, ogu.uid, b.nid AS bio_node, tn.nid AS size
                  FROM og_uid ogu
                  LEFT JOIN bio b ON b.uid = ogu.uid
                  LEFT JOIN term_node tn ON tn.nid = b.nid
                  WHERE ogu.nid = 185033 AND ogu.is_admin = 0
                  AND (b.nid IS NULL OR tn.tid IS NULL)

nid是季节性或生物性(有一种类型); 术语“节点”是指衬衫

9 回复 | 直到 16 年前

Tor Haugen 16 年前

查询应该是正常的。我将通过查询分析器运行它,并优化表上的索引。

Brian 16 年前

联接是可以对SQL查询执行的最昂贵的操作之一。虽然它应该能够在一定程度上自动优化您的查询,但是可以尝试重新构造它。首先,我将代替select*来指定您需要从哪个关系中选择哪些列。这会使事情加速很多。

如果您只需要用户ID,例如:

SELECT s.user_id
   FROM subscribers s 
   LEFT JOIN bio b ON b.user_id = subscribers.user_id 
   LEFT JOIN shirtsizes ON shirtsize.bio_id = bio.bio_id 
WHERE s.season_id = 185181 AND (bio.bio_id IS NULL OR shirtsize.size IS NULL);

这将使SQL数据库能够更有效地自行重组查询。

tvanfosson 16 年前

显然,我没有检查过这个问题,但似乎您要选择的是没有匹配的个人信息的任何订户,或者bios和shirtsizes之间的连接失败。我会考虑使用 NOT EXISTS 对于这种情况。您可能需要bio.user_id和shirtsizes.bio_id上的索引。

select *
from subscribers
where s.season_id = 185181
      and not exists (select *
                      from bio join shirtsizes on bio.bio_id = shirtsizes.bio_id
                      where bio.user_id = subscribers.user_id)

编辑 :

根据您的更新,您可能希望在每列上创建单独的键,而不是/除了具有复合主键之外。连接可能无法充分利用复合主索引,连接列上的索引本身可能会加快速度。

John Saunders 16 年前

是 bio_id BIOS的主键?是否真的有可能有一个bios行 b.user_id = subscribers.user_id 但与 b.bio_id 无效的?

有衬衫吗 shirtsize.bio_id 无效的?这些行是否有Shirtsize.Size不为空?

Jonathan Leffler 16 年前

在相关季节的订户列表和有bios和衬衫尺寸的季节的订户列表之间做一个区别会更快吗?

SELECT *
   FROM Subscribers
   WHERE season_id = 185181
     AND user_id NOT IN
         (SELECT DISTINCT s.user_id
             FROM subscribers s
             JOIN bios b ON s.user_id = b.user_id
             JOIN shirtsizes z ON b.bio_id = z.bio_id
             WHERE s.season_id = 185181
         )

这样可以避免外部连接,因为外部连接不如内部连接快,因此可能更快。另一方面,它可能正在创建两个大列表,它们之间的差异非常小。不清楚子查询中的distinct是否会提高或损害性能。它意味着一个排序操作(代价高昂),但如果mysql优化器支持这样的操作,则为合并联接铺平了道路。

可能还有其他符号可用-例如减号或差号。

SeanJA 16 年前

如果你定义了你要找的确切的东西而不是选择*它可能会加快一点…同样或不是要做的最快的查询,如果没有或可以重新编写它,则速度会更快。

也。。。你可以试试工会而不是左派?

SELECT s.user_id
   FROM subscribers s 
   LEFT JOIN bio b ON b.user_id = s.user_id 
   LEFT JOIN shirtsizes ON shirtsize.bio_id = bio.bio_id 
WHERE s.season_id = 185181 AND (bio.bio_id IS NULL OR shirtsize.size IS NULL);

会是这样的:

(SELECT s.user_id FROM subscribers s WHERE s.season_id = 185181)
UNION
(SELECT b.user_id, b.bio_id FROM bio b WHERE bio.bio_id IS NULL)
UNION
(SELECT shirtsizes.bio_id FROM shirtsizes WHERE shirtsizes.size is NULL)

(老实说,我觉得这不合适……但我从不使用 ~~加入或~~ 联接语法或联合…)

我会这样做:

SELECT *
FROM subscribers s, bio b, shirtsizes sh
WHERE s.season_id = 185181
AND shirtsize.bio_id = bio.bio_id 
AND b.user_id = s.user_id 
AND (bio.bio_id IS NULL 
     OR 
     shirtsize.size IS NULL);

Quassnoi 16 年前

正如现在编写的那样,您的查询计算所有 bio S和 term_node 如果它们存在,然后过滤掉它们。

但你想要的只是找到 og_uid 是不是没有 三结点 的(没有 生物 也意味着没有 三结点 )

所以你只想停止评估 生物 S和 三结点 一旦你发现第一个存在 三结点 :

SELECT  *
FROM    (
        SELECT  ogu.nid, ogu.is_active, ogu.uid,
                (
                SELECT  1
                FROM    bio b, term_node tn
                WHERE   b.uid = ogu.uid
                        AND tn.nid = b.nid
                LIMIT   1
                ) AS ex
        FROM    og_uid ogu
        WHERE   ogu.nid = 185033
                AND ogu.is_admin = 0
        ) ogu1
WHERE   ex IS NULL

这最多只能评估一个 生物 至多一个 三结点 对于每一个 奥格鲁伊德 而不是评估现有的数千个并过滤掉它们。

应该工作得更快。

Hafthor 16 年前

select * from subscribers where user_id not in (
  select user_id from bio where bio_id not in (
    select bio_id from shirt_sizes
  )
) and season_id=185181

SFA 16 年前

我假设您的“大表”是订户,而且那个季节的ID可能既不是选择性的,也不是索引的(如果它不是选择性的,索引就毫无意义了),这意味着无论如何,您必须完全扫描订户。分开时,我会(用一个内部连接)连接另外两个表-注意,如果衬衫尺寸中没有Bio_ID,您的查询与没有Bio完全相同。第一位:

select uid
from bio
     inner join shirtsizes
             on shirtsizes.bio_id = bio.bio_id

在这一点上,你要检查衬衫是否在Bio-ID上有索引。现在,您可以将外部联接此查询保留到订阅服务器:

select *
from subscribers s
     left outer join (select uid
                      from bio
                      inner join shirtsizes
                              on shirtsizes.bio_id = bio.bio_id) x
                  on x.uid = s.uid
where s.season_id = 185181
  and x.uid is null

如果Bio和Shirtsizes都不是巨大的,那么它的运行速度可能相当快。