代码之家  ›  专栏  ›  技术社区  ›  James Black

如何基于列删除CSV文件中重复的行

  •  0
  • James Black  · 技术社区  · 1 年前

    我基本上想删除CSV文件第二列中所有单元格重复的行:

    Skufnoo,222228888444,-6026769894509215039,ВупÑень пупÑень â¤ï¸â€ðŸ©¹ðŸ’—,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,4,True,False,0
    mAtkmb,5213786988,4161254730445748607,ДаниÑль Блинов,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,False,False,False,0
    Ethan58,222228888444,7737583697013043644,Ethan,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,4,True,False,0
    sheluvjoseph,1421438213,8544915453690665435,អន សំអុល,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,5,True,False,0
    

    并将它们写入一个新的CSV文件,如下所示:

    Skufnoo,222228888444,-6026769894509215039,ВупÑень пупÑень â¤ï¸â€ðŸ©¹ðŸ’—,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,4,True,False,0
    mAtkmb,5213786988,4161254730445748607,ДаниÑль Блинов,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,False,False,False,0
    sheluvjoseph,1421438213,8544915453690665435,អន សំអុល,AA2888 ចាក់បាល់និងកាសុីណូអនឡាញ (070645555),1746008070,False,False,5,True,False,0
    

    我尝试过以下代码,但不起作用:

    import csv
    
    with open('members.csv', 'r', encoding="utf8") as in_file, open('members2.csv', 'w', encoding="utf8") as out_file:
        writer=csv.writer(out_file)
        tracks = set()
        for row in in_file:
            key = row[1]
            if key not in tracks:
                writer.writerow(row)
                tracks.add(key)
    

    非常感谢您的帮助。

    2 回复  |  直到 1 年前
        1
  •  2
  •   kritserv    1 年前

    您忘记用csv.reader读取输入csv文件

    in_data = csv.reader(in_file, delimiter=',')
    

    代码中每隔一行似乎都可以。

    完整代码:

    import csv
    
    with open('members.csv', 'r', encoding="utf8") as in_file, open('members2.csv', 'w', encoding="utf8") as out_file:
        in_data = csv.reader(in_file, delimiter=',')
    
        writer=csv.writer(out_file)
    
        tracks = set()
    
        for row in in_data:
            key = row[1]
            if key not in tracks:
                writer.writerow(row)
                tracks.add(key)
    
        2
  •  0
  •   SIGHUP    1 年前

    如果您不介意将整个输入CSV存储在内存中,那么您可以简单地使用字典,如下所示:

    import csv
    
    with open("members.csv", newline="") as in_file, open("members2.csv", "w", newline="") as out_file:
        d = {row[1]: row for row in csv.reader(in_file)}
        csv.writer(out_file).writerows(d.values())
    

    笔记

    尽管这满足了(删除重复项)的要求,但结果将与 设置 技巧你明白为什么吗?