代码之家  ›  专栏  ›  技术社区  ›  Regressor

对BigQuery中的一个数组列相对于另一个数组列进行排序

  •  1
  • Regressor  · 技术社区  · 6 年前

    WITH results AS
      (SELECT 1 as customerid, ["apples", "bananas", "grapes","orange"] as fruit_array, [0.1,0.4,0.3,0.2] as probability
      UNION ALL
      SELECT 2 as customerid, ["apples", "bananas", "grapes","orange"] as fruit_array, [0.2,0.1,0.6,0.1] as probability
      UNION ALL
      SELECT 3 as customerid, ["apples", "bananas", "grapes","orange"] as fruit_array, [0.5,0.05,0.35,0.1] as probability
      )
     select * from results
    

    top 2 每位顾客的水果及其对应的 probabilities 购买日期。

    customerid, fruits, probability
    1, bananas, 0.4
    1, grapes, 0.3
    ..
    

    在上述最终结果中 customerid 1 bananas grapes 因为这两种水果的购买概率最高(从 [0.1,0.4,0.3,0.2]

    在BiqQuery中有什么函数可以用来实现这一点吗?

    1 回复  |  直到 6 年前
        1
  •  1
  •   Mikhail Berlyant    6 年前

    下面是BigQuery标准SQL

    #standardSQL
    WITH results AS (
      SELECT 1 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.1,0.4,0.3,0.2] AS probability   UNION ALL
      SELECT 2 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.2,0.1,0.6,0.1] AS probability   UNION ALL
      SELECT 3 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.5,0.05,0.35,0.1] AS probability
    )
    SELECT customerid, fruit, probability
    FROM (
      SELECT customerid, ARRAY_AGG(STRUCT(fruit, probability) ORDER BY probability DESC LIMIT 2) top
      FROM results, 
        UNNEST(probability) probability WITH OFFSET off1
        JOIN UNNEST(fruit_array) fruit WITH OFFSET off2
        ON off1 = off2
      GROUP BY customerid
    ), UNNEST(top)  
    

    有结果的

    Row customerid  fruit   probability  
    1   1           bananas 0.4  
    2   1           grapes  0.3  
    3   2           grapes  0.6  
    4   2           apples  0.2  
    5   3           apples  0.5  
    6   3           grapes  0.35     
    

    或者可能是更好的选择

    #standardSQL
    WITH results AS (
      SELECT 1 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.1,0.4,0.3,0.2] AS probability   UNION ALL
      SELECT 2 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.2,0.1,0.6,0.1] AS probability   UNION ALL
      SELECT 3 AS customerid, ["apples", "bananas", "grapes","orange"] AS fruit_array, [0.5,0.05,0.35,0.1] AS probability
    )
    SELECT customerid, fruit, probability
    FROM (
      SELECT customerid, 
        (
          SELECT ARRAY_AGG(STRUCT(fruit, probability) ORDER BY probability DESC LIMIT 2) 
          FROM   UNNEST(probability) probability WITH OFFSET off1
          JOIN UNNEST(fruit_array) fruit WITH OFFSET off2
          ON off1 = off2
        ) top
      FROM results
    ), UNNEST(top)
    

    同样的结果