代码之家  ›  专栏  ›  技术社区  ›  Andrew Cassidy

多个分析函数在大查询中消耗所有内存

  •  1
  • Andrew Cassidy  · 技术社区  · 6 年前

    我以为我在大查询中避免了多个左连接和分组,并使用了以下功能:

     WITH added_a_boolean_column AS (
          SELECT
            *,
            NOT (DATE(CodedDate) >= "2018-04-01"
              AND DATE(CodedDate) < "2018-04-14") AS train
          FROM
          `XXXXX` )
    
    
    SELECT 
       countif(train) OVER (PARTITION BY a) as a_counts,
       countif(train) OVER (PARTITION BY b) as b_counts,
       countif(train) OVER (PARTITION BY c) as c_counts,
       countif(train) OVER (PARTITION BY d) as d_counts,
       countif(train) OVER (PARTITION BY e) as e_counts,
       countif(train) OVER (PARTITION BY f) as f_counts,
       countif(train) OVER (PARTITION BY g) as g_counts,
       countif(train) OVER (PARTITION BY h) as h_counts,
       countif(train) OVER (PARTITION BY i) as i_counts
     FROM added_a_boolean_column
    

    然而这会导致以下错误:

    Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 152% of limit. Top memory consumer(s): sort operations used for analytic OVER() clauses: 99% other/unattributed: 1% .
    

    到底发生了什么?是:

     WITH added_a_boolean_column AS (
          SELECT
            *,
            NOT (DATE(CodedDate) >= "2018-04-01"
              AND DATE(CodedDate) < "2018-04-14") AS train
          FROM
          `XXXXX` ),
    
        a_count as (
            SELECT count(*) as a_counts, a FROM added_a_boolean_column WHERE train GROUP BY a),
        b_count as (.....
        ....
        ....
    
        i_count as (..
    
        SELECT * FROM added_a_boolean_column LEFT JOIN a_count.....
    

    更好的选择?

    0 回复  |  直到 6 年前