代码之家 › 专栏 › 技术社区 › physlexic

计数并显示一行在文件中重复的次数

find duplicates search python

1

physlexic · 技术社区 · 6 年前

我已经在bash中看到了这个问题的解决方案(即问题6712437),但我还没有为python找到任何解决方案。

我试图搜索一个文件,找到重复的行并输出它被复制的次数。

输入

foo
bar
foo
foo
bar
foobar

输出

foo     3
bar     2
foobar  1

3 回复 | 直到 6 年前

1

Ralf 6 年前

collections.Counter 似乎是个不错的选择。

要计算文件中每行出现的次数,可以尝试:

import collections

with open('myfile.txt') as f:
    c = collections.Counter(f.readlines())

然后,为了获得好的输出(就像您在这个答案的注释中要求的那样),您可以使用:

# sorted by value (number of occurences, but descending order)
for k, v in c.most_common():
    print(k, v)

# sorted by value (number of occurences, ascending order)
for k, v in sorted(c.items(), key=lambda x: x[1]):
    print(k, v)

# sorted by key (line of the file)
for k, v in sorted(c.items(), key=lambda x: x[0]):
    print(k, v)

2

3

mad_ 6 年前

最简单的解决方案是 collections.Counter . 但是,如果您不想包括一个额外的库,那么

d={}
with open('test.txt') as f:
    for i in f:
        d[i]=d.get(i,0)+1

    sorted_items = sorted(d.items(),key=lambda (k,v): (v,k),reverse=True)
    #iterate to save or play around with tuple values

3

1

Mike C. 6 年前

我的解决方案是:

lines = [] #List of line items
itemcounts = {}  #dictionary of items with counts
with open('myfile.txt') as f: 
    for item in f:
        lines.append(item)
for i in lines:
    c = lines.count(i) 
    itemcounts.update({i:c})
#print items and counts
for i in itemcounts: 
    print i, itemcounts[i]