代码之家  ›  专栏  ›  技术社区  ›  Chris

MySQL:连接关键字表,但只返回最流行的

  •  2
  • Chris  · 技术社区  · 14 年前

    我有一张“产品”表。我还有一个用户标签的产品关键字表。我想带回每个产品的顶部关键字根据有多少。

    关键字表基本上由关键字、主键和链接到Products表的外键组成。

    这是我已经拥有的SQL—它现在只会返回任何关键字,而不是最上面的关键字。

    SELECT product_name,keyword_keyword 
    FROM products 
    LEFT JOIN keywords ON keyword_pid = product_id
    GROUP BY product_id
    
    2 回复  |  直到 14 年前
        1
  •  0
  •   unutbu    14 年前

    下面是我用来得到我建议的解决方案的SQLs的进展(以及示例结果):

    SELECT k.*,
           COUNT(k.keyword_keyword)
    FROM   keywords k
    GROUP  BY k.keyword_pid,
              k.keyword_keyword  
    
    +------------+-------------+-----------------+--------------------------+
    | keyword_id | keyword_pid | keyword_keyword | count(k.keyword_keyword) |
    +------------+-------------+-----------------+--------------------------+
    |          3 |           1 | red             |                        3 | 
    |          1 |           1 | widgety         |                        3 | 
    |          9 |           2 | curve           |                        1 | 
    |         10 |           2 | red             |                        2 | 
    |          6 |           2 | screwy          |                        3 | 
    |         12 |           3 | red             |                        1 | 
    |          7 |           3 | spike           |                        2 | 
    +------------+-------------+-----------------+--------------------------+
    

    我们需要找到每个的最大值 (keyword_pid,keyword_keyword) 一对。 tried and true idiom 为此:

    SELECT t1.*,
           t2.*
    FROM   (SELECT k.*,
                   COUNT(k.keyword_keyword) cnt
            FROM   keywords k
            GROUP  BY k.keyword_pid,
                      k.keyword_keyword) t1
           LEFT JOIN (SELECT k.*,
                             COUNT(k.keyword_keyword) cnt
                      FROM   keywords k
                      GROUP  BY k.keyword_pid,
                                k.keyword_keyword) t2
             ON t1.keyword_pid = t2.keyword_pid
                AND t1.cnt < t2.cnt  
    

    注意,上面,我重复了同样的话 SELECT 选择 如果我错了,我希望有人能解除我的信仰。

    +------------+-------------+-----------------+-----+------------+-------------+-----------------+------+
    | keyword_id | keyword_pid | keyword_keyword | cnt | keyword_id | keyword_pid | keyword_keyword | cnt  |
    +------------+-------------+-----------------+-----+------------+-------------+-----------------+------+
    |          3 |           1 | red             |   3 |       NULL |        NULL | NULL            | NULL | 
    |          1 |           1 | widgety         |   3 |       NULL |        NULL | NULL            | NULL | 
    |          9 |           2 | curve           |   1 |         10 |           2 | red             |    2 | 
    |          9 |           2 | curve           |   1 |          6 |           2 | screwy          |    3 | 
    |         10 |           2 | red             |   2 |          6 |           2 | screwy          |    3 | 
    |          6 |           2 | screwy          |   3 |       NULL |        NULL | NULL            | NULL | 
    |         12 |           3 | red             |   1 |          7 |           3 | spike           |    2 | 
    |          7 |           3 | spike           |   2 |       NULL |        NULL | NULL            | NULL | 
    +------------+-------------+-----------------+-----+------------+-------------+-----------------+------+
    

    t2.cnt is NULL 是包含每个行的最大计数的行 (关键字pid,关键字)

    SELECT t1.*
    FROM   (SELECT k.*,
                   COUNT(k.keyword_keyword) cnt
            FROM   keywords k
            GROUP  BY k.keyword_pid,
                      k.keyword_keyword) t1
           LEFT JOIN (SELECT k.*,
                             COUNT(k.keyword_keyword) cnt
                      FROM   keywords k
                      GROUP  BY k.keyword_pid,
                                k.keyword_keyword) t2
             ON t1.keyword_pid = t2.keyword_pid
                AND t1.cnt < t2.cnt
    WHERE  t2.cnt IS NULL  
    
    +------------+-------------+-----------------+-----+
    | keyword_id | keyword_pid | keyword_keyword | cnt |
    +------------+-------------+-----------------+-----+
    |          3 |           1 | red             |   3 | 
    |          1 |           1 | widgety         |   3 | 
    |          6 |           2 | screwy          |   3 | 
    |          7 |           3 | spike           |   2 | 
    +------------+-------------+-----------------+-----+
    

    剩下的相对容易。首先,我们加入products表,这样我们就可以看到哪个产品与哪个关键字相关联:

    SELECT p.*,
           t1.*
    FROM   (SELECT k.*,
                   COUNT(k.keyword_keyword) cnt
            FROM   keywords k
            GROUP  BY k.keyword_pid,
                      k.keyword_keyword) t1
           LEFT JOIN (SELECT k.*,
                             COUNT(k.keyword_keyword) cnt
                      FROM   keywords k
                      GROUP  BY k.keyword_pid,
                                k.keyword_keyword) t2
             ON t1.keyword_pid = t2.keyword_pid
                AND t1.cnt < t2.cnt
           LEFT JOIN product p
             ON p.product_id = t1.keyword_pid
    WHERE  t2.cnt IS NULL  
    
    +------------+--------------+------------+-------------+-----------------+-----+
    | product_id | product_name | keyword_id | keyword_pid | keyword_keyword | cnt |
    +------------+--------------+------------+-------------+-----------------+-----+
    |          1 | widget       |          3 |           1 | red             |   3 | 
    |          1 | widget       |          1 |           1 | widgety         |   3 | 
    |          2 | screw        |          6 |           2 | screwy          |   3 | 
    |          3 | nail         |          7 |           3 | spike           |   2 | 
    +------------+--------------+------------+-------------+-----------------+-----+
    

    GROUP BY :

    SELECT p.*,
           t1.*
    FROM   (SELECT k.*,
                   COUNT(k.keyword_keyword) cnt
            FROM   keywords k
            GROUP  BY k.keyword_pid,
                      k.keyword_keyword) t1
           LEFT JOIN (SELECT k.*,
                             COUNT(k.keyword_keyword) cnt
                      FROM   keywords k
                      GROUP  BY k.keyword_pid,
                                k.keyword_keyword) t2
             ON t1.keyword_pid = t2.keyword_pid
                AND t1.cnt < t2.cnt
           LEFT JOIN product p
             ON p.product_id = t1.keyword_pid
    WHERE  t2.cnt IS NULL
    GROUP  BY p.product_id  
    
    +------------+--------------+------------+-------------+-----------------+-----+
    | product_id | product_name | keyword_id | keyword_pid | keyword_keyword | cnt |
    +------------+--------------+------------+-------------+-----------------+-----+
    |          1 | widget       |          3 |           1 | red             |   3 | 
    |          2 | screw        |          6 |           2 | screwy          |   3 | 
    |          3 | nail         |          7 |           3 | spike           |   2 | 
    +------------+--------------+------------+-------------+-----------------+-----+
    
        2
  •  1
  •   Fosco    14 年前

    我知道这可能会有不同的方法,而且可能会更有效率,但这就是我的大脑如何分解它的原因:

    select a.product_name, b.keyword_keyword, count(*) as keyword_count 
    into #temp1
    from products a 
    join keywords b on a.product_id = b.keyword_pid 
    group by a.product_name, b.keyword_keyword
    
    select x.product_name, x.keyword_keyword
    from #temp1 x
    where x.keyword_count = (select MAX(keyword_count) from #temp1 
                             where product_name = x.product_name)