代码之家  ›  专栏  ›  技术社区  ›  Ani

bigquery:是否使用标准SQL中的重复/array结构字段联接?

  •  0
  • Ani  · 技术社区  · 7 年前

    我基本上有两张桌子, Orders Items . 由于这些表是从Google云数据存储备份文件中导入的,因此引用不是由简单的ID字段进行的,而是由 <STRUCT> 对于一对一的关系, id 字段表示要匹配的实际唯一ID。对于一对多关系(重复),架构使用 <结构 .

    我可以用左外部联接查询一对一关系,我也知道如何在非重复结构和重复字符串或int上联接,但是我很难用 重复结构 .

    一份订单 一个项目 :

    #standardSQL
    WITH Orders AS (
      SELECT 1 AS __oid__, STRUCT(STRUCT(2 AS id, "default" AS ns) AS key) AS item UNION ALL 
      SELECT 2 AS __oid__, STRUCT(STRUCT(4 AS id, "default" AS ns) AS key) AS item UNION ALL 
      SELECT 3 AS __oid__, STRUCT(STRUCT(6 AS id, "default" AS ns) AS key) AS item
    ),
    Items AS (
      SELECT STRUCT(1 AS id, "default" AS ns) AS key, "#1.1" AS title UNION ALL
      SELECT STRUCT(2 AS id, "default" AS ns) AS key, "#1.2" AS title UNION ALL
      SELECT STRUCT(3 AS id, "default" AS ns) AS key, "#1.3" AS title UNION ALL
      SELECT STRUCT(4 AS id, "default" AS ns) AS key, "#1.4" AS title UNION ALL
      SELECT STRUCT(5 AS id, "default" AS ns) AS key, "#1.5" AS title UNION ALL
      SELECT STRUCT(6 AS id, "default" AS ns) AS key, "#1.6" AS title
    )
    
    SELECT
       __oid__
      ,Order_item AS item
    FROM Orders  
    
    LEFT OUTER JOIN(
      SELECT
         key
        ,title
      FROM Items
    ) Order_item
    ON Order_item.key.id = item.key.id
    

    结果(按预期工作):

    +-----+---------+--------------+-------------+------------+
    | Row | __oid__ |  item.key.id | item.key.ns | item.title |
    +-----+---------+--------------+-------------+------------+
    |   1 |       1 |            2 |     default |       #1.2 |
    +-----+---------+--------------+-------------+------------+
    |   2 |       2 |            4 |     default |       #1.4 |
    +-----+---------+--------------+-------------+------------+
    |   3 |       3 |            6 |     default |       #1.6 |
    +-----+---------+--------------+-------------+------------+
    

    类似的查询,但这次有一个订单 许多的 项目:

    #standardSQL
    WITH Orders AS (
      SELECT 1 AS __oid__, ARRAY[STRUCT(STRUCT(1 AS id, "default" AS ns) AS key), STRUCT(STRUCT(2 AS id, "default" AS ns) AS key)] AS items UNION ALL 
      SELECT 2 AS __oid__, ARRAY[STRUCT(STRUCT(3 AS id, "default" AS ns) AS key), STRUCT(STRUCT(4 AS id, "default" AS ns) AS key)] AS items UNION ALL 
      SELECT 3 AS __oid__, ARRAY[STRUCT(STRUCT(5 AS id, "default" AS ns) AS key), STRUCT(STRUCT(6 AS id, "default" AS ns) AS key)] AS items
    ),
    Items AS (
      SELECT STRUCT(1 AS id, "default" AS ns) AS key, "#1.1" AS title UNION ALL
      SELECT STRUCT(2 AS id, "default" AS ns) AS key, "#1.2" AS title UNION ALL
      SELECT STRUCT(3 AS id, "default" AS ns) AS key, "#1.3" AS title UNION ALL
      SELECT STRUCT(4 AS id, "default" AS ns) AS key, "#1.4" AS title UNION ALL
      SELECT STRUCT(5 AS id, "default" AS ns) AS key, "#1.5" AS title UNION ALL
      SELECT STRUCT(6 AS id, "default" AS ns) AS key, "#1.6" AS title
    )
    
    SELECT
       __oid__
      ,Order_items AS items
    FROM Orders  
    
    LEFT OUTER JOIN(
      SELECT
         key
        ,title
      FROM Items
    ) Order_items
    ON Order_items.key.id IN (SELECT item.key.id FROM UNNEST(items) AS item)
    

    错误: 在联接谓词内部不支持In子查询。

    我真的期望这个结果:

    +-----+---------+--------------+-------------+------------+
    | Row | __oid__ |  item.key.id | item.key.ns | item.title |
    +-----+---------+--------------+-------------+------------+
    |   1 |       1 |            1 |     default |       #1.1 |
    |     |         |            2 |     default |       #1.2 |
    +-----+---------+--------------+-------------+------------+
    |   2 |       2 |            3 |     default |       #1.3 |
    |     |         |            4 |     default |       #1.4 |
    +-----+---------+--------------+-------------+------------+
    |   3 |       3 |            5 |     default |       #1.5 |
    |     |         |            6 |     default |       #1.6 |
    +-----+---------+--------------+-------------+------------+
    

    如何更改第二个查询以获得预期的结果?

    2 回复  |  直到 7 年前
        1
  •  2
  •   Mikhail Berlyant    7 年前

    另一种选择是交叉联接而不是左联接

    #standardSQL
    WITH Orders AS (
      SELECT 1 AS __oid__, ARRAY[STRUCT(STRUCT(1 AS id, "default" AS ns) AS key), STRUCT(STRUCT(2 AS id, "default" AS ns) AS key)] AS items UNION ALL 
      SELECT 2 AS __oid__, ARRAY[STRUCT(STRUCT(3 AS id, "default" AS ns) AS key), STRUCT(STRUCT(4 AS id, "default" AS ns) AS key)] AS items UNION ALL 
      SELECT 3 AS __oid__, ARRAY[STRUCT(STRUCT(5 AS id, "default" AS ns) AS key), STRUCT(STRUCT(6 AS id, "default" AS ns) AS key)] AS items
    ),
    Items AS (
      SELECT STRUCT(1 AS id, "default" AS ns) AS key, "#1.1" AS title UNION ALL
      SELECT STRUCT(2 AS id, "default" AS ns) AS key, "#1.2" AS title UNION ALL
      SELECT STRUCT(3 AS id, "default" AS ns) AS key, "#1.3" AS title UNION ALL
      SELECT STRUCT(4 AS id, "default" AS ns) AS key, "#1.4" AS title UNION ALL
      SELECT STRUCT(5 AS id, "default" AS ns) AS key, "#1.5" AS title UNION ALL
      SELECT STRUCT(6 AS id, "default" AS ns) AS key, "#1.6" AS title
    )
    
    SELECT
       __oid__
      ,ARRAY_AGG(Order_items) AS items
    FROM Orders  
    
    CROSS JOIN(
      SELECT
         key
        ,title
      FROM Items
    ) Order_items
    WHERE Order_items.key.id IN (SELECT item.key.id FROM UNNEST(items) AS item)
    GROUP BY __oid__
    
        2
  •  1
  •   Elliott Brossard    7 年前

    问题是bigquery不能对两边的连接键进行哈希分区(因为连接被表示为in条件)。您可以通过展开左侧的数组,然后从右侧聚合项来实现此操作:

    #standardSQL
    WITH Orders AS (
      SELECT 1 AS __oid__, ARRAY[STRUCT(STRUCT(1 AS id, "default" AS ns) AS key), STRUCT(STRUCT(2 AS id, "default" AS ns) AS key)] AS items UNION ALL 
      SELECT 2 AS __oid__, ARRAY[STRUCT(STRUCT(3 AS id, "default" AS ns) AS key), STRUCT(STRUCT(4 AS id, "default" AS ns) AS key)] AS items UNION ALL 
      SELECT 3 AS __oid__, ARRAY[STRUCT(STRUCT(5 AS id, "default" AS ns) AS key), STRUCT(STRUCT(6 AS id, "default" AS ns) AS key)] AS items
    ),
    Items AS (
      SELECT STRUCT(1 AS id, "default" AS ns) AS key, "#1.1" AS title UNION ALL
      SELECT STRUCT(2 AS id, "default" AS ns) AS key, "#1.2" AS title UNION ALL
      SELECT STRUCT(3 AS id, "default" AS ns) AS key, "#1.3" AS title UNION ALL
      SELECT STRUCT(4 AS id, "default" AS ns) AS key, "#1.4" AS title UNION ALL
      SELECT STRUCT(5 AS id, "default" AS ns) AS key, "#1.5" AS title UNION ALL
      SELECT STRUCT(6 AS id, "default" AS ns) AS key, "#1.6" AS title
    )
    
    SELECT
       __oid__
      ,ARRAY_AGG(Order_items) AS items
    FROM Orders,
    UNNEST(items) AS item
    
    LEFT OUTER JOIN(
      SELECT
         key
        ,title
      FROM Items
    ) Order_items
    ON Order_items.key.id = item.key.id
    GROUP BY __oid__
    

    在任何情况下,这看起来都是您想要的,因为您的原始查询 items 就像一个结构而不是一个结构数组。