代码之家  ›  专栏  ›  技术社区  ›  Ray

BigQuery优化查询,对结构字段的嵌套数组进行过滤并分组返回

  •  1
  • Ray  · 技术社区  · 7 年前

    我试图找出如何编写GQL(Google SQL)查询来过滤深度嵌套的结构,然后再次嵌套,并将结构属性的第一条记录与数组保留在同一级别。

    我准备了一个模式示例

     WITH
          Sale AS (
          SELECT
            "1" AS _id,
            STRUCT("11" AS _id,
              "SERVICE" AS feedbackType,
              DATE(TIMESTAMP("2017-01-20 14:05:51.655")) AS createDate) AS serviceFeedback,
            [STRUCT("host" AS key,
              "localhost" AS value),
            STRUCT("location" AS key,
              "Paris" AS value)] AS tags,
            TRUE AS reviewed,
            [STRUCT("1" as saleId, STRUCT("101" AS _id,
                "PRODUCT" AS feedbackType,
                DATE(TIMESTAMP("2017-01-20 14:05:51.655")) AS createDate) AS productFeedback),
            STRUCT("1" as saleId, STRUCT("102" AS _id,
                "PRODUCT" AS feedbackType,
                DATE(TIMESTAMP("2017-01-20 14:06:51.655")) AS createDate) AS productFeedback) ] AS saleItems,
            DATE(TIMESTAMP("2017-01-20 14:05:51.655")) AS latestFeedbackDate )
    

    过滤需要一个压平所有嵌套字段的源过滤器查询。

    SELECT
      saleId,
      serviceFeedback,
      saleTags,
      reviewed,
      saleItems,
      latestFeedbackDate
    FROM (
      SELECT
        sale._id AS saleId,
        serviceFeedback,
        sale.tags AS saleTags,
        reviewed,
        saleItems,
        latestFeedbackDate
      FROM
        `Sale` AS sale,
        sale.saleItems AS saleItems
      WHERE
        reviewed = TRUE
        AND serviceFeedback.createDate >= DATE(TIMESTAMP("2017-01-18 14:05:51.655"))
        AND serviceFeedback._id IS NOT NULL
        AND saleItems.productFeedback.createDate >= DATE(TIMESTAMP("2017-01-18 14:05:51.655")))
    ORDER BY
      latestFeedbackDate DESC
    LIMIT
      20
    

    主要问题是,在此筛选之后,您希望将所有 saleItems 通过 sale._id (返回初始结构)并检索 serviceFeedback 类型为STRUCT的字段。

    JSON格式的预期结果是:

    {
        "saleId":"1",
        "serviceFeedback":{"_id":"11","feedbackType":"SERVICE","createDate":"2017-01-20"},
        "saleTags":[{"key":"host","value":"localhost"},{"key":"location","value":"Paris"}],
        "reviewed":"true",
        "saleItems":[
            {"saleId":"1","productFeedback":{"_id":"101","feedbackType":"PRODUCT","createDate":"2017-01-20"},
            {"saleId":"1","productFeedback":{"_id":"102","feedbackType":"PRODUCT","createDate":"2017-01-20"},
        ],
        "latestFeedbackDate":"2017-01-20"
    }
    

    我写了我脑海中出现的最简单的查询想法。它会产生正确的结果。但也许可以更有效地重写它,

    SELECT
      saleId,
      serviceFeedback,
      latestFeedbackDate,
      subQuery.saleItems as saleItems
    FROM
      sale
    RIGHT JOIN (
      SELECT
        saleId,
        ARRAY_AGG(saleItems) as saleItems
      FROM (
        SELECT
          saleId,
          saleItems
        FROM (
          SELECT
            sale._id AS saleId,
            latestFeedbackDate,
            saleItems
          FROM
            `Sale` AS sale,
            sale.saleItems AS saleItems
          WHERE
            reviewed = TRUE
            AND serviceFeedback.createDate >= DATE(TIMESTAMP("2017-01-18 14:05:51.655"))
            AND serviceFeedback._id IS NOT NULL
            AND saleItems.productFeedback.createDate >= DATE(TIMESTAMP("2017-01-18 14:05:51.655")))
        ORDER BY
          latestFeedbackDate DESC)
      GROUP BY
        saleId
        ) AS subQuery
    ON
      sale._id = subQuery.saleId
    

    你能给我一个更好的解决方案来达到同样的效果吗?

    1 回复  |  直到 7 年前
        1
  •  2
  •   Mikhail Berlyant    7 年前

    你能给我一个更好的解决方案来达到同样的效果吗?

    下面生成与原始表完全相同的架构,并将所需的筛选器应用于 saleItems

    #standardSQL
    SELECT * REPLACE(
      ARRAY(
        SELECT saleItems FROM UNNEST(saleItems) saleItems 
        WHERE reviewed = TRUE
          AND serviceFeedback.createDate >= DATE(TIMESTAMP("2017-01-18 14:05:51.655"))
          AND serviceFeedback._id IS NOT NULL
          AND saleItems.productFeedback.createDate >= DATE(TIMESTAMP("2017-01-18 14:05:51.655"))
      ) AS saleItems)
    FROM sale
    

    如果您只需要字段的子集,请使用下面的示例

    #standardSQL
    SELECT 
      _id saleId,
      serviceFeedback,
      ARRAY(
        SELECT saleItems FROM UNNEST(saleItems) saleItems 
        WHERE reviewed = TRUE
          AND serviceFeedback.createDate >= DATE(TIMESTAMP("2017-01-18 14:05:51.655"))
          AND serviceFeedback._id IS NOT NULL
          AND saleItems.productFeedback.createDate >= DATE(TIMESTAMP("2017-01-18 14:05:51.655"))
      ) AS saleItems
    FROM sale