代码之家  ›  专栏  ›  技术社区  ›  Fomalhaut

我如何请求Postgres计算某个时间序列中不同日期的计数?

  •  0
  • Fomalhaut  · 技术社区  · 7 年前

    我有一张这样的访问表:

    time                   | user_id
    --------------------------------
    2018-05-01 00:00:00+02 | 56
    2018-05-01 00:00:00+02 | 64
    2018-05-01 00:00:00+02 | 56
    2018-05-02 00:00:00+02 | 27
    2018-05-02 00:00:00+02 | 64
    ...
    

    我想请求Postgres数据库提供每个日期的活动用户数量。如果用户在过去30天内有10个不同日期的访问,则该用户处于活动状态。例如,对于2018-05-22日期的数量,查询将是:

    select count(*) from (
        select
            user_id,
            count(distinct time::date) as cnt
        from visit
        where
            time::date > '2018-05-22'::date - interval '30 days'
        group by user_id
        having count(distinct time::date) >= 10
        order by cnt desc
    ) t
    

    结果是一个数字。它工作正常。为了从某个时间序列中获取每个日期的数量,我应该在此查询中修改什么?所需结果如下:

    date       | quantity
    ---------------------
    2018-05-01 | 38
    2018-05-02 | 26
    2018-05-03 | 35
    2018-05-04 | 44
    ...
    
    2 回复  |  直到 7 年前
        1
  •  1
  •   MatBailie    7 年前

    首先,创建一个日历表,其中包含您需要的所有日期。

    说,从 '1900-01-01' '2099-12-31' ?

    那么它基本上是一个 JOIN

    SELECT
        calendar_date, count(*)
    FROM
    (
        SELECT
            CALENDAR_TABLE.calendar_date,
            visit.user_id,
            COUNT(DISTINCT visit.time::date) as cnt
        FROM
            CALENDAR_TABLE
        INNER JOIN
            visit
                ON  visit.time >= CALENDAR_TABLE.calendar_date - interval '30 days'
                AND visit.time <  CALENDAR_TABLE.calendar_date + interval '01 days'
        WHERE
            CALENDAR_TABLE.calendar_date BETWEEN '2018-05-01' AND '2018-05-22'
        GROUP BY
            CALENDAR_TABLE.calendar_date,
            visit.user_id
        HAVING
            COUNT(DISTINCT visit.time::date) >= 10
    ) t
    GROUP BY
        calendar_date
    

    或者…

    SELECT
        calendar_date, count(*)
    FROM
    (
        SELECT
            CALENDAR_TABLE.calendar_date,
            visit.user_id,
            COUNT(*) as cnt
        FROM
            CALENDAR_TABLE
        INNER JOIN
        (
            SELECT
               user_id,
               time::date   AS user_date
            FROM
               visit
            GROUP BY
               user_id,
               time::date
        )
            visit
                ON  visit.user_date >= CALENDAR_TABLE.calendar_date - interval '30 days'
                AND visit.user_date <  CALENDAR_TABLE.calendar_date + interval '01 days'
        WHERE
            CALENDAR_TABLE.calendar_date BETWEEN '2018-05-01' AND '2018-05-22'
        GROUP BY
            CALENDAR_TABLE.calendar_date,
            visit.user_id
        HAVING
            COUNT(*) >= 10
    ) t
    GROUP BY
        calendar_date
    

    这可能会减少内存开销,但可能会使联接和筛选速度变慢…

        2
  •  1
  •   Gordon Linoff    7 年前

    最简单的方法是 generate_series() :

    select g.dte, count(*)
    from (select g.dte, v.user_id, count(distinct v.time::date) as cnt
          from generate_series('2018-05-01'::date, '2018-05-22'::date, interval '1 day') g(dte) left join
               visit v
               on v.time::date <= g.dte and
                  v.time::date > '2018-05-22'::date - interval '30 days'
          group by g.dte, v.user_id
          having count(distinct v.time::date) >= 10
         ) vd
    group by g.dte
    order by g.dte;
    

    如果您有大量的数据,可能会有更快的方法。如果这是个问题,请再问一个问题。