代码之家  ›  专栏  ›  技术社区  ›  Caleb

MySQL多次加入同一个表使计数变得奇怪

  •  1
  • Caleb  · 技术社区  · 1 年前

    编辑以澄清和简化

    我有以下疑问

    select 
      Signups.user_id,
      Count(timeoutTable.user_id) as timeoutCount,
      Count(confirmedTable.user_id) as confirmedCount
    from Signups
    left join (
      select * from Confirmations where action = 'timeout'
    ) as timeoutTable on signups.user_id = timeoutTable.user_id
    left join (
      select * from Confirmations where action = 'confirmed'
    ) as confirmedTable on signups.user_id = confirmedTable.user_id
    group by 
      Signups.user_id,
      timeoutTable.user_id,
      confirmedTable.user_id
    

    使用以下输入运行时

    Signups Table
    
    | user_id | time_stamp          |
    | ------- | ------------------- |
    | 15      | 2020-07-31 18:26:35 |
    | 16      | 2021-05-20 01:38:09 |
    | 7       | 2020-08-02 08:45:14 |
    | 10      | 2020-06-24 17:13:14 |
    | 5       | 2020-06-27 17:59:29 |
    | 9       | 2021-11-08 03:05:14 |
    | 8       | 2021-12-13 03:38:58 |
    | 12      | 2020-09-16 11:17:39 |
    
    
    Confirmations Table
    
    | user_id | time_stamp          | action    |
    | ------- | ------------------- | --------- |
    | 7       | 2020-03-31 13:11:43 | timeout   |
    | 7       | 2021-03-25 07:40:25 | timeout   |
    | 8       | 2020-07-27 19:43:25 | confirmed |
    | 8       | 2021-03-07 19:48:06 | timeout   |
    | 7       | 2020-01-24 15:43:47 | confirmed |
    

    它输出:

    | user_id | timeoutCount | confirmedCount |
    | ------- | ------------ | -------------- |
    | 15      | 0            | 0              |
    | 16      | 0            | 0              |
    | 7       | 2            | 2              |
    | 10      | 0            | 0              |
    | 5       | 0            | 0              |
    | 9       | 0            | 0              |
    | 8       | 1            | 1              |
    | 12      | 0            | 0              |
    

    我试图使user_id 7的已确认值为1,超时值为2,但出于某种原因,它将两个值都设置为2。如有任何帮助,我们将不胜感激。

    2 回复  |  直到 1 年前
        1
  •  2
  •   Thorsten Kettner    1 年前

    您的问题是,您将所有超时都加入到所有确认中。因此,对于具有2个超时和3个确认的注册,您将获得2 x 3=6行,然后进行聚合。

    聚合多个表时,请在加入前进行聚合:

    select 
      user_id, 
      round(case when total_messages = 0 then 0 else total_confirmed / total_messages end, 2) as confirmation_rate
    from
    (
      select 
        s.user_id, 
        t.total_timeouts,
        c.total_confirmed,
        coalesce(t.timeoutCount,0) + coalesce(c.total_confirmed, 0) as total_messages
      from signups s
      left join 
      (
        select user_id, count(*) as total_timeouts
        from confirmations
        where action = 'timeout'
        group by user_id
      ) as t on t.user_id = s.user_id
      left join 
      (
        select user_id, count(*) as total_confirmed
        from confirmations
        where action = 'confirmed'
        group by user_id
      ) as c on c.user_id = s.user_id
    )
    order by user_id;
    

    另一种选择是这里的条件聚合:

    select 
      user_id, 
      round(case when total_messages = 0 then 0 else total_confirmed / total_messages end, 2) as confirmation_rate
    from
    (
      select 
        s.user_id, 
        sum(c.action = 'timeout') as total_timeouts,
        sum(c.action = 'confirmed') as total_confirmed,
        sum(c.action = 'timeout') + sum(c.action = 'confirmed') as total_messages
      from signups s
      left join confirmations c on c.user_id = s.user_id
      group by s.user_id
    )
    order by user_id;
    
        2
  •  0
  •   Schwern    1 年前

    要查看发生了什么,请查看不带分组依据的子查询。

    select 
      Signups.user_id,
      timeoutTable.*,
      confirmedTable.*
    from Signups
    left join (
      select * from Confirmations where action = 'timeout'
    ) as timeoutTable on Signups.user_id = timeoutTable.user_id
    left join (
      select * from Confirmations where action = 'confirmed'
    ) as confirmedTable on Signups.user_id = confirmedTable.user_id
    
                timeoutTable                                    confirmedTable
    user_id     user_id     time_stamp              action      user_id     time_stamp              action
    15          null        null                    null        null        null                    null
    16          null        null                    null        null        null                    null
    7           7           2021-03-25 07:40:25     timeout     7           2020-01-24 15:43:47     confirmed
    7           7           2020-03-31 13:11:43     timeout     7           2020-01-24 15:43:47     confirmed
    10          null        null                    null        null        null                    null
    5           null        null                    null        null        null                    null
    9           null        null                    null        null        null                    null
    8           8           2021-03-07 19:48:06     timeout     8           2020-07-27 19:43:25     confirmed
    12          null        null                    null        null        null                    null
    

    请注意user_id 7有2行,每行都有一个超时 确认。这些行被计算为 二者都 超时和确认。这是因为您要同时连接两个表。

    如果我们先进行聚合,然后再加入,计数就会计算出来。

    select 
      Signups.user_id,
      timeoutTable.*,
      confirmedTable.*
    from Signups
    left join (
      select user_id, count(user_id) as timeouts
      from Confirmations
      where action = 'timeout'
      group by user_id
    ) as timeoutTable on Signups.user_id = timeoutTable.user_id
    left join (
      select user_id, count(user_id) as confirmations
      from Confirmations
      where action = 'confirmed'
      group by user_id
    ) as confirmedTable on Signups.user_id = confirmedTable.user_id
    
                timeoutTable            confirmedTable
    user_id     user_id     timeouts    user_id     confirmations
    15          null        null        null        null
    16          null        null        null        null
    7           7           2           7           1
    10          null        null        null        null
    5           null        null        null        null
    9           null        null        null        null
    8           8           1           8           1
    12          null        null        null        null
    

    Demonstration


    注:

    left join (
      select * from Confirmations where action = 'timeout'
    ) as timeoutTable on Signups.user_id = timeoutTable.user_id
    

    最好写成没有子查询。

    left join Confirmations as timeoutTable
      on Signups.user_id = timeoutTable.user_id
     and where timeoutTable.action = 'timeout'
    

    注意:当列名与SQL关键字冲突或非常接近时,这很好地表明您的名称过于通用。例如 time_stamp 。什么时候盖章?使用更具描述性的内容,例如 created_at action_at confirmed_at

    这就引出了每一行确认代表什么的问题。它们不是确认,而是确认尝试。也许该表应该称为ConfirmationAttempts,时间戳为 attempted_at

    推荐文章