代码之家  ›  专栏  ›  技术社区  ›  Kalane

数据集增加时SQL查询缓慢

  •  0
  • Kalane  · 技术社区  · 2 年前

    我正在数据库中进行SQL查询,以获取所需玩家玩得最多的所有玩家( match_players.team )。

    这很好,但当玩家有5000多场比赛时(这就是我发现问题的地方),查询可能需要10到20秒。我想知道是否有可能对其进行优化。

    我使用的是PostgreSQL(12.16)。

    这是我的表,我已经删除了该查询中所有未使用的字段。

    CREATE TABLE public.match_players (
        id integer NOT NULL,
        match_id character varying(15) NOT NULL,
        summoner_puuid character varying(78) NOT NULL,
        summoner_name character varying(30) NOT NULL,
        win smallint NOT NULL,
        loss smallint NOT NULL,
        remake smallint NOT NULL,
        team smallint NOT NULL
    );
    
    CREATE SEQUENCE public.match_players_id_seq
        AS integer
        START WITH 1
        INCREMENT BY 1
        NO MINVALUE
        NO MAXVALUE
        CACHE 1;
    
    ALTER SEQUENCE public.match_players_id_seq OWNED BY public.match_players.id;
    
    CREATE TABLE public.matches (
        id character varying(15) NOT NULL,
        gamemode smallint NOT NULL
    );
    
    ALTER TABLE ONLY public.match_players ALTER COLUMN id SET DEFAULT nextval('public.match_players_id_seq'::regclass);
    
    -- PK
    ALTER TABLE ONLY public.match_players
        ADD CONSTRAINT match_players_pkey PRIMARY KEY (id);
    
    ALTER TABLE ONLY public.matches
        ADD CONSTRAINT matches_pkey PRIMARY KEY (id);
    
    -- INDEXES
    CREATE INDEX match_players_summoner_puuid_index ON public.match_players USING btree (summoner_puuid);
    CREATE INDEX match_players_team_index ON public.match_players USING btree (team);
    CREATE INDEX matches_gamemode_index ON public.matches USING btree (gamemode);
    

    以下是查询:

    SELECT
        (array_agg(mates.summoner_name ORDER BY mates.match_id DESC)) [1] AS name,
        COUNT(match_players.id) AS count,
        SUM(match_players.win) AS wins,
        SUM(match_players.loss) AS losses
    FROM
        match_players
        INNER JOIN matches ON matches.id = match_players.match_id
        INNER JOIN match_players AS mates ON match_players.match_id = mates.match_id
            AND match_players.team = mates.team
    WHERE
        match_players.summoner_puuid = 'wBdgONoUnYZCPbh8jp9d47XL6gv9Fj4W8KugWh_Nr9FLESnT9zhGix2hliOPOxNmuhoI1v3gYk4BXA'
        AND match_players.remake = 0
        AND matches.gamemode NOT IN(800, 810, 820, 830, 840, 850, 2000, 2010, 2020)
    GROUP BY
        mates.summoner_puuid
    ORDER BY
        count DESC,
        wins DESC
    LIMIT 15
    

    这是 EXPLAIN ANALYZE 查询的:

    QUERY PLAN
    Limit  (cost=726349.74..726349.78 rows=15 width=134) (actual time=17124.387..17124.689 rows=15 loops=1)
      ->  Sort  (cost=726349.74..726451.94 rows=40880 width=134) (actual time=16659.572..16659.866 rows=15 loops=1)
            Sort Key: (count(match_players.id)) DESC, (sum(match_players.win)) DESC
            Sort Method: top-N heapsort  Memory: 29kB
            ->  GroupAggregate  (cost=724222.58..725346.78 rows=40880 width=134) (actual time=16567.953..16651.876 rows=18835 loops=1)
                  Group Key: mates.summoner_puuid
                  ->  Sort  (cost=724222.58..724324.78 rows=40880 width=111) (actual time=16567.899..16582.073 rows=28215 loops=1)
                        Sort Key: mates.summoner_puuid
                        Sort Method: external merge  Disk: 3448kB
                        ->  Gather  (cost=53749.23..718714.85 rows=40880 width=111) (actual time=1267.589..16496.245 rows=28215 loops=1)
                              Workers Planned: 2
                              Workers Launched: 2
                              ->  Parallel Hash Join  (cost=52749.23..713626.85 rows=17033 width=111) (actual time=1142.845..16434.330 rows=9405 loops=3)
                                    Hash Cond: (((mates.match_id)::text = (matches.id)::text) AND (mates.team = match_players.team))
                                    ->  Parallel Seq Scan on match_players mates  (cost=0.00..622960.94 rows=5046094 width=105) (actual time=0.397..13551.825 rows=4037261 loops=3)
                                    ->  Parallel Hash  (cost=52698.13..52698.13 rows=3407 width=40) (actual time=978.287..978.291 rows=1881 loops=3)
                                          Buckets: 8192  Batches: 1  Memory Usage: 576kB
                                          ->  Nested Loop  (cost=638.66..52698.13 rows=3407 width=40) (actual time=634.953..686.920 rows=1881 loops=3)
                                                ->  Parallel Bitmap Heap Scan on match_players  (cost=638.23..31290.55 rows=3488 width=25) (actual time=634.871..646.260 rows=1883 loops=3)
                                                      Recheck Cond: ((summoner_puuid)::text = 'wBdgONoUnYZCPbh8jp9d47XL6gv9Fj4W8KugWh_Nr9FLESnT9zhGix2hliOPOxNmuhoI1v3gYk4BXA'::text)
                                                      Filter: (remake = 0)
                                                      Rows Removed by Filter: 27
                                                      Heap Blocks: exact=3367
                                                      ->  Bitmap Index Scan on match_players_summoner_puuid_index  (cost=0.00..636.14 rows=8477 width=0) (actual time=0.699..0.700 rows=5730 loops=1)
                                                            Index Cond: ((summoner_puuid)::text = 'wBdgONoUnYZCPbh8jp9d47XL6gv9Fj4W8KugWh_Nr9FLESnT9zhGix2hliOPOxNmuhoI1v3gYk4BXA'::text)
                                                ->  Index Scan using matches_pkey on matches  (cost=0.43..6.14 rows=1 width=15) (actual time=0.019..0.019 rows=1 loops=5649)
                                                      Index Cond: ((id)::text = (match_players.match_id)::text)
                                                      Filter: (gamemode <> ALL ('{800,810,820,830,840,850,2000,2010,2020}'::integer[]))
                                                      Rows Removed by Filter: 0
    Planning Time: 0.903 ms
    JIT:
      Functions: 78
      Options: Inlining true, Optimization true, Expressions true, Deforming true
      Timing: Generation 17.883 ms, Inlining 385.880 ms, Optimization 1308.476 ms, Emission 672.236 ms, Total 2384.476 ms
    Execution Time: 17129.121 ms
    

    解释的图形版本: https://explain.dalibo.com/plan/27b41c259g9d3483

    互动示例: https://dbfiddle.uk/TcErJ6WL

    非常感谢,如果你有任何想法,我真的不知道该去哪里看了。

    EDIT-感谢@LSerni和@MatBailie评论的解决方案

    我添加了三个组合索引:

    CREATE INDEX match_players_combined_index1 ON match_players (summoner_puuid, remake, match_id);
    CREATE INDEX match_players_combined_index2 ON match_players (match_id, team, summoner_puuid, summoner_name);
    CREATE INDEX matches_combined_index1 ON matches (id, gamemode);
    

    我把 NOT IN 以及 array_agg 从查询中:

    SELECT
        (
        SELECT
          summoner_name
        FROM
          match_players
        WHERE
          summoner_puuid = mates.summoner_puuid
        ORDER BY
          match_id DESC
        LIMIT
          1
        ) AS name,
        COUNT(match_players.id) AS count,
        SUM(match_players.win) AS wins,
        SUM(match_players.loss) AS losses
    FROM
        match_players
        INNER JOIN matches ON matches.id = match_players.match_id
        INNER JOIN match_players AS mates ON match_players.match_id = mates.match_id
            AND match_players.team = mates.team
    WHERE
        match_players.summoner_puuid = 'wBdgONoUnYZCPbh8jp9d47XL6gv9Fj4W8KugWh_Nr9FLESnT9zhGix2hliOPOxNmuhoI1v3gYk4BXA'
        AND match_players.remake = 0
        AND (
        (matches.gamemode < 800 OR matches.gamemode > 899)
        AND
        (matches.gamemode < 2000 OR matches.gamemode > 2999)
    )
    GROUP BY
        mates.summoner_puuid
    ORDER BY
        count DESC,
        wins DESC
    LIMIT 15
    

    现在查询需要300-700ms,这真的更好!

    1 回复  |  直到 2 年前
        1
  •  1
  •   LSerni    2 年前

    让我们检查一下这个查询:

    SELECT
        (array_agg(mates.summoner_name ORDER BY mates.match_id DESC)) [1] AS name,
        COUNT(match_players.id) AS count,
        SUM(match_players.win) AS wins,
        SUM(match_players.loss) AS losses
    FROM
        match_players
        INNER JOIN matches ON matches.id = match_players.match_id
        INNER JOIN match_players AS mates ON match_players.match_id = mates.match_id
            AND match_players.team = mates.team
    WHERE
        match_players.summoner_puuid = 'wBdgONoUnYZCPbh8jp9d47XL6gv9Fj4W8KugWh_Nr9FLESnT9zhGix2hliOPOxNmuhoI1v3gYk4BXA'
        AND match_players.remake = 0
        AND matches.gamemode NOT IN(800, 810, 820, 830, 840, 850, 2000, 2010, 2020)
    GROUP BY
        mates.summoner_puuid
    ORDER BY
        count DESC,
        wins DESC
    LIMIT 15
    

    因此,我们想要一个针对固定WHERE条件的索引,并且可能首先使用一个简单的条件。所以我们的索引 match_players 应该从 列:重制版和caller_puuid 按这个顺序

    现在,我们需要match_players提供什么?JOIN使用match_id。我们应该把它放在指数中,放在第三位。出于同样的原因,我们需要“团队”。如果积分榜很大,我们可能会有所收获,也会有输赢,但可能不会。

    有了这四个索引列,RDBMS可以运行WHERE并获得match_, 从不碰桌子 ,然后可以继续进行其他联接。

    为此,我们需要一个使用id的比赛索引,该id也提供游戏模式(不过游戏模式的条件很糟糕。NOT IN是一场性能噩梦。如果你能把它重写为BETWEEN或类似的…)。

    我们还需要mates表上match_id的索引,这个索引还应该包含mates.team 可以 按住caller_name,但再说一遍,这可能不值得。你可以尝试一下。

    有了这种索引(或其改进),RDBMS应该能够快速有效地“集中”所需的数据。