代码之家  ›  专栏  ›  技术社区  ›  Rob

BigQuery中日期序列的最小化表示

  •  0
  • Rob  · 技术社区  · 7 年前

    start_date|end_date
    1/1/2018|1/5/2018
    1/4/2018|1/10/2018
    1/9/2018|1/22/2018
    2/1/2018|2/1/2018
    1/31/2018|2/5/2018
    

    我想得到这些行所覆盖的所有日期范围。所以我想要一些回报,比如:

    1/1/2018|1/22/2018
    1/31/2018|2/5/2018
    

    BigQuery中是否有一个函数可以处理这个问题?

    1 回复  |  直到 7 年前
        1
  •  1
  •   Mikhail Berlyant    7 年前

    没有这样的函数-但您可以尝试以下操作(BigQuery标准SQL)

    #standardSQL
    WITH `project.dataset.table` AS (
      SELECT '1/1/2018' start_date, '1/5/2018' end_date UNION ALL
      SELECT '1/4/2018', '1/10/2018' UNION ALL
      SELECT '1/9/2018', '1/22/2018' UNION ALL
      SELECT '2/1/2018', '2/1/2018' UNION ALL
      SELECT '1/31/2018', '2/5/2018' 
    ), parsed_as_dates AS (
      SELECT PARSE_DATE('%m/%d/%Y', start_date) start_date, PARSE_DATE('%m/%d/%Y', end_date) end_date
      FROM `project.dataset.table`
    ), days AS (
      SELECT day FROM 
      (SELECT MIN(start_date) min_date, MAX(end_date) max_date FROM parsed_as_dates), 
      UNNEST(GENERATE_DATE_ARRAY(min_date, max_date)) day
    ), temp AS (
      SELECT day, SIGN(COUNTIF(day BETWEEN start_date AND end_date)) flag
      FROM days CROSS JOIN parsed_as_dates GROUP BY day
    )
    SELECT MIN(day) start_date, MAX(day) end_date
    FROM (
      SELECT day, flag, SUM(start) OVER(ORDER BY day) grp
      FROM (
        SELECT day, flag, ABS(flag - IFNULL(LAG(flag) OVER(ORDER BY day), 0)) start
        FROM temp
      )
    )
    WHERE flag = 1
    GROUP BY grp
    -- ORDER BY start_date
    

    Row start_date  end_date     
    1   2018-01-01  2018-01-22   
    2   2018-01-31  2018-02-05    
    

    只是“快速”的想法-你可能想重构它一点-因为它看起来有点设计过度:o)但至少它的工作

    推荐文章