代码之家  ›  专栏  ›  技术社区  ›  hack-is-art

应用于每个组属性的Pig拉丁极限运算符

  •  0
  • hack-is-art  · 技术社区  · 8 年前

    根据每个州的人口,我试图只返回五个最大的地方。我还试图按州名对结果进行排序,每个州的地方按人口递减顺序排列。目前,我只得到了前五个州和 最大的五个地方 对于每个州 .

    -- Groups places by state name.
    group_by_state_name_populated_place_name =
        GROUP project_using_state_name
        BY (state::name, place::name);
    
    -- Counts population for each place in every state.
    count_population_for_each_place_in_every_state =
        FOREACH group_by_state_name_populated_place_name
        GENERATE group.state::name AS state_name,
                 group.place::name AS name,
                 COUNT(project_using_state_name.population) AS population;
    
    -- Orders population in each group found above to enable the use of limit.
    order_groups_of_states_and_population =
        ORDER count_population_for_each_place_in_every_state 
        BY state_name ASC, population DESC, name ASC;
    
    -- Limit the top 5 population for each state BUT currently returning just the first 5 tuples of the previous one and not 5 of each state.
    limit_population =
        LIMIT order_groups_of_states_and_population 5;
    
    1 回复  |  直到 8 年前
        1
  •  2
  •   Murali Rao    8 年前

    下面的代码片段可能会有所帮助

    inp_data = load 'input_data.csv' using PigStorage(',') AS (state:chararray,place:chararray,population:long);
    
    req_stats = FOREACH(GROUP inp_data BY state) {
        ordered = ORDER inp_data BY population DESC;
        required = LIMIT ordered 5;
        GENERATE FLATTEN(required);
    };
    
    req_stats_ordered = ORDER req_stats BY state, population DESC;
    
    DUMP req_stats_ordered;