代码之家  ›  专栏  ›  技术社区  ›  sam

如何取消将BigQuery字段存储为字符串?

  •  0
  • sam  · 技术社区  · 7 年前

    我试图取消一个字段的连接,但我的查询有问题。

    表中的示例数据

    '1234', '{ "id" : "123" , "items" : [ { "quantity" : 1 , "product" : { "id" : "p1" , "categories" : [ "cat1","cat2","cat3"]  }}] }'
    

    数据集中有两个字段:row_id和parts,其中parts是一个字典对象,其中包含列表项(类别),但parts的数据类型是string。我希望每个类别的输出是单独的行。

    这是我试过的,但我没有得到任何结果。

    #standardSQL
    with t as (
    select "1234" as row_id, '{ "id" : "123" , "items" : [ { "quantity" : 1 , "product" : { "id" : "p1" , "categories" : [ "cat1","cat2","cat3"]  }}] }' as parts 
    )
    select row_id, _categories
    from t,
    UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(parts, '$.items'), r'"categories":"(.+?)"')) _categories
    

    预期结果

    id, _categories
    1234, cat1
    1234, cat2
    1234, cat3
    
    1 回复  |  直到 7 年前
        1
  •  2
  •   Mikhail Berlyant    7 年前

    下面是BigQuery标准SQL

    #standardSQL
    WITH t AS (
      SELECT "1234" AS row_id, '{ "id" : "123" , "items" : [ { "quantity" : 1 , "product" : { "id" : "p1" , "categories" : [ "cat1","cat2","cat3"]  }}] }' AS parts 
    )
    SELECT row_id, REPLACE(_categories, '"', '') _categories
    FROM t, UNNEST(SPLIT(REGEXP_EXTRACT(
      JSON_EXTRACT(parts, '$.items'), 
      r'"categories":\[(.+?)]'))
    ) _categories 
    

    并产生预期结果

    Row row_id  _categories  
    1   1234    cat1     
    2   1234    cat2     
    3   1234    cat3      
    

    更新

    上述解决方案主要集中于修复提取中使用的regexp,但没有解决具有多个产品的更一般的情况。下面的解决方案解决了这种更一般的情况

    #standardSQL
    WITH t AS (
      SELECT "1234" AS row_id, '''{ "id" : "123" , "items" : [ 
          { "quantity" : 1 , "product" : { "id" : "p1" , "categories" : [ "cat1","cat2","cat3"]  }},
          { "quantity" : 2 , "product" : { "id" : "p2" , "categories" : [ "cat4","cat5","cat6"]  }}
        ] }''' AS parts 
    )
    SELECT row_id, REPLACE(category, '"', '') category
    FROM t, UNNEST(REGEXP_EXTRACT_ALL(parts, r'"categories" : \[(.+?)]')) categories,
    UNNEST(SPLIT(categories)) category  
    

    以结果

    Row row_id  category     
    1   1234    cat1     
    2   1234    cat2     
    3   1234    cat3     
    4   1234    cat4     
    5   1234    cat5     
    6   1234    cat6