代码之家  ›  专栏  ›  技术社区  ›  dakir08

在python中生成反向索引

  •  0
  • dakir08  · 技术社区  · 8 年前

    您好,我是python新手,我想学习Hadoop Mapreduce。我有这样的数据

    Vancouver-1 35.5
    Vancouver-2 34.6
    Vancouver-3 37.6
    

    显示州月份和最高温度 所以我想做一个像这样的反向数据

    35 Vancouver-2
    36 Vancouver-2 Vancouver-1
    37 Vancouver-2 Vancouver-1
    38 Vancouver-2 Vancouver-1 Vancouver-3
    

    数字是从10到50的D度,下一部分是具有等于或低于D度的状态的列表

    我的映射程序文件:

    %%writefile mapper.py
    #!/usr/bin/env python
    
    import sys
    import math
    
    QueryMaxTemp = 50;
    
    for line in sys.stdin:
        line = line.rstrip('\n')
        lfields = line.split('\t');
        city_month = lfields[0];
        maxtemp = math.ceil(float(lfields[1]));
    for i in QueryMaxTemp:// I think this is wrong
        print ('{}\t{}\t{}'.format(i,city_month,maxtemp))
    

    我的reducer文件

    %%writefile reducer.py
    #!/usr/bin/env python
    
    import sys
    
    def emit(maxtemp, city_month_list):
        print('{}\t{}'.format(maxtemp,city_month_list))
    
    last_maxtemp = ''
    last_city_month_list = ''
    for line in sys.stdin:
        line = line.rstrip('\n')
        maxtemp, city_month_lists = line.split('\t', 1)
        if last_maxtemp == maxtemp:
            last_city_month_list = last_city_month_list + max(maxtemp, last_maxtemp) // I think this is wrong
        else:
            if last_maxtemp:
                emit(last_maxtemp, last_city_month_list)
            last_maxtemp = maxtemp
            last_city_month_list = city_month_lists
    
    if last_maxtemp:
            emit(last_maxtemp, last_city_month_list)
    

    我试着修复它,但不知道,有没有解决的办法?我想制作一个反转的数据,如下面的示例。谢谢

    2 回复  |  直到 8 年前
        1
  •  0
  •   galaxyan    8 年前

    您可以使用dict对数据进行排序。该键将是最高温度,然后将数据(城市月)添加到列表中。

    res_dict = {}
    for line in sys.stdin:
        line = line.rstrip('\n')
        lfields = line.split('\t')
        city_month = lfields[0]
        maxtemp = math.ceil(float(lfields[1]))
        if maxtemp not in res_dict:
            res_dict[maxtemp] = []
        res_dict[maxtemp].append(city_month)
    for maxtemp, city_month in res_dict.iteritems()
        print ('\t{}\t{}'.format(city_month,maxtemp))
    
    
    
    
    import sys
    
    def emit(res_dict):
        for maxtemp, city_month in res_dict.iteritems()
           print ('\t{}\t{}'.format(city_month,maxtemp))
    
    res_dict
    for line in sys.stdin:
        line = line.rstrip('\n')
        maxtemp, city_month_lists = line.split('\t', 1)
        if maxtemp not in res_dict:
            res_dict[maxtemp] = []
        res_dict[maxtemp].append(city_month)
    
    emit(res_dict)
    
        2
  •  0
  •   bobrobbob    8 年前

    抱歉,现在这是reduce

    import math
    
    data = """Vancouver-1 35.5
    Vancouver-2 34.6
    Vancouver-3 37.6"""
    
    lines = data.split('\n')
    mapped_data = list()
    for line in lines:
        city_month, maxtemp = line.split()
        maxtemp = math.ceil(float(maxtemp))
        mapped_data.append([city_month, maxtemp])
    
    sorted_data = sorted(mapped_data, key=lambda x: x[1])
    
    res = ''
    cities_str = ''
    for temp in range(10, 51):
        if sorted_data and sorted_data[0][1] < temp:
            cities_str += sorted_data.pop(0)[0]+' '
        res += str(temp)+' '+cities_str+'\n'
    
    print(res)