代码之家  ›  专栏  ›  技术社区  ›  VladV

有效地为集合中的每个类别选择顶行

  •  2
  • VladV  · 技术社区  · 15 年前

    我需要从已知的集合中为每个类别选择一个顶行(有点类似于 this question )问题是,如何使这个查询对大量行有效。

    例如,让我们创建一个在多个地方存储温度记录的表。

    CREATE TABLE #t (
        placeId int,
        ts datetime,
        temp int,
        PRIMARY KEY (ts, placeId)
    )
    
    -- insert some sample data
    
    SET NOCOUNT ON
    
    DECLARE @n int, @ts datetime
    SELECT @n = 1000, @ts = '2000-01-01'
    
    WHILE (@n>0) BEGIN
        INSERT INTO #t VALUES (@n % 10, @ts, @n % 37)
        IF (@n % 10 = 0) SET @ts = DATEADD(hour, 1, @ts)
        SET @n = @n - 1
    END
    

    现在我需要为每个地方1,2,3获取最新的录音。

    这种方法效率很高,但不能很好地扩展(而且看起来很脏)。

    SELECT * FROM (
        SELECT TOP 1 placeId, temp
        FROM #t 
        WHERE placeId = 1
        ORDER BY ts DESC
    ) t1
    UNION ALL
    SELECT * FROM (
        SELECT TOP 1 placeId, temp
        FROM #t 
        WHERE placeId = 2
        ORDER BY ts DESC
    ) t2
    UNION ALL
    SELECT * FROM (
        SELECT TOP 1 placeId, temp
        FROM #t 
        WHERE placeId = 3
        ORDER BY ts DESC
    ) t3
    

    下面看起来更好,但工作效率要低得多(根据优化器,30%对70%)。

    SELECT placeId, ts, temp FROM (
        SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum
        FROM #t
        WHERE placeId IN (1, 2, 3)
    ) t
    WHERE rownum = 1
    

    问题是,在后一个查询执行计划中,对T执行聚集索引扫描,检索、排序、编号并过滤300行,只留下3行。对于前一个查询,取三次一行。

    有没有一种方法可以在没有大量联合的情况下高效地执行查询?

    3 回复  |  直到 15 年前
        1
  •  1
  •   Philip Kelley    15 年前

    我加载了100000行(这还不足以让事情慢下来),尝试了老式的方法:

    select t.*
     from #t t
      inner join (select placeId, max(ts) ts
                   from #t
                   where placeId in (1,2,3)
                   group by placeId) xx
       on xx.placeId = t.placeId
        and xx.ts = t.ts
    

    结果也差不多。

    然后我将索引中各列的顺序颠倒为

    CREATE TABLE #t ( 
        placeId int, 
        ts datetime, 
        temp int, 
        PRIMARY KEY (placeId, ts) 
    ) 
    

    而且,在所有的查询中,得到的页面读取和索引更少 寻求 而不是扫描。

    如果优化是您的目标,并且您可以修改索引,那么我将修改主键,或者添加一个覆盖索引。

        2
  •  2
  •   SQLMenace    15 年前

    不要只看执行计划,还要看 statistics io statistics time

    set statistics io on
    go
    SELECT * FROM (
        SELECT TOP 1 placeId, temp
        FROM #t 
        WHERE placeId = 1
        ORDER BY ts DESC
    ) t1
    UNION ALL
    SELECT * FROM (
        SELECT TOP 1 placeId, temp
        FROM #t 
        WHERE placeId = 2
        ORDER BY ts DESC
    ) t2
    UNION ALL
    SELECT * FROM (
        SELECT TOP 1 placeId, temp
        FROM #t 
        WHERE placeId = 3
        ORDER BY ts DESC
    ) t3
    
    SELECT placeId,  temp FROM (
        SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum
        FROM #t
        WHERE placeId IN (1, 2, 3)
    ) t
    WHERE rownum = 1
    
    set statistics io off
    go
    

    表“T00000000B99”。扫描计数3,逻辑读取6,物理读取0,预读读取0,LOB逻辑读取0,LOB物理读取0,LOB预读读取0。 表“T00000000B99”。扫描计数1,逻辑读取6,物理读取0,预读读取0,LOB逻辑读取0,LOB物理读取0,LOB预读读取0。

    set statistics time on
    go
    SELECT * FROM (
        SELECT TOP 1 placeId, temp
        FROM #t 
        WHERE placeId = 1
        ORDER BY ts DESC
    ) t1
    UNION ALL
    SELECT * FROM (
        SELECT TOP 1 placeId, temp
        FROM #t 
        WHERE placeId = 2
        ORDER BY ts DESC
    ) t2
    UNION ALL
    SELECT * FROM (
        SELECT TOP 1 placeId, temp
        FROM #t 
        WHERE placeId = 3
        ORDER BY ts DESC
    ) t3
    
    SELECT placeId,  temp FROM (
        SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum
        FROM #t
        WHERE placeId IN (1, 2, 3)
    ) t
    WHERE rownum = 1
    
    set statistics time on
    go
    

    对我来说,这两种方法没有真正的区别,加载更多的数据并再次比较

    另外,当您向两个查询添加order by时,它将下降到40%,而60%。

    SELECT * FROM (
        SELECT TOP 1 placeId, temp
        FROM #t 
        WHERE placeId = 1
        ORDER BY ts DESC
    ) t1
    UNION ALL
    SELECT * FROM (
        SELECT TOP 1 placeId, temp
        FROM #t 
        WHERE placeId = 2
        ORDER BY ts DESC
    ) t2
    UNION ALL
    SELECT * FROM (
        SELECT TOP 1 placeId, temp
        FROM #t 
        WHERE placeId = 3
        ORDER BY ts DESC
    ) t3
    ORDER BY placeId
    
    SELECT placeId,  temp FROM (
        SELECT placeId,  temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum
        FROM #t
        WHERE placeId IN (1, 2, 3)
    ) t
    WHERE rownum = 1
    ORDER BY placeId
    
        3
  •  0
  •   VladV    15 年前

    只是为了记录,另一个使用交叉应用的选项。
    在我的配置中,它的性能比前面提到的所有配置都要好。

    SELECT *
    FROM (VALUES (1),(2),(3)) t (placeId)
    CROSS APPLY (
        SELECT TOP 1 ts, temp
        FROM #t 
        WHERE placeId = t.placeId
        ORDER BY ts DESC
    ) tt
    

    我猜,值可以被转换到临时表或表变量,而不会有太大的差异。