代码之家 › 专栏 › 技术社区 › Rishabh Kumar

如何在保留原始关键字顺序的同时按值对字数词典进行排序?[副本]

sorting dictionary python

Rishabh Kumar · 技术社区 · 7 月前

我试图计算文本中的单词频率,然后按频率降序排列。因此,如果两个单词的频率相同,它们应该保持在原始文本中首次出现的顺序(而不是字母顺序)。请参阅以下示例以了解:

Hi I live in America I love cooking

转换为小写后,预期输出为:

{"i": 2, "hi": 1, "live": 1, "in": 1, "america": 1, "love": 1, "cooking": 1}

当前方法:

words = input().lower().split()
d = {i: words.count(i) for i in words}
dk, dv, sd = list(d.keys()), sorted(d.values())[::-1], sorted(d.items())
nd = dict()
for i in dv:
    for j in dk:
        if d[j] == i:
                nd[j] = i
print(nd)

words.count() for每个元素都会使时间复杂度非常低,嵌套循环会使情况变得更糟。我无法找到更好的算法来取代它。

3 回复 | 直到 7 月前

deceze 7 月前

collections.Counter 做你想做的:

>>> from collections import Counter
>>> s = 'Hi I live in America I love cooking'
>>> Counter(s.lower().split())
Counter({'i': 2, 'hi': 1, 'live': 1, 'in': 1, 'america': 1, 'love': 1, 'cooking': 1})

在3.7版本中更改:作为 dict 子类, Counter

字典 OrderedDict 然后,算法只是插入遇到的项目并对其进行处理;在你的情况下,倒计时:

count = {}
for word in s.lower().split():
    count[word] = count.get(word, 0) + 1

A. defaultdict 这里也有帮助:

count = defaultdict(int)
for word in s.lower().split():
    count[word] += 1

然后你可以对结果进行排序,因为排序是稳定的,会保持顺序:

>>> from operator import itemgetter
>>> sorted(count.items(), key=itemgetter(1), reverse=True)
[('i', 2), ('hi', 1), ('live', 1), ('in', 1), ('america', 1), ('love', 1), ('cooking', 1)]
>>> dict(sorted(count.items(), key=itemgetter(1), reverse=True))
{'i': 2, 'hi': 1, 'live': 1, 'in': 1, 'america': 1, 'love': 1, 'cooking': 1}

内置 sorted()

https://docs.python.org/3/library/functions.html#sorted

SomeDude 7 月前

import pandas as pd
s = "Hi I live in America I love cooking"
print( pd.Series(s.lower().split()).value_counts().to_dict() )

{'i': 2, 'hi': 1, 'live': 1, 'in': 1, 'america': 1, 'love': 1, 'cooking': 1}

Booboo 7 月前

使用 collections.Counter 方便地将每个单词的计数作为变量 counters dict 它记住插入顺序。
如果你列举你得到一个生成器,当迭代时,它会产生元组,例如 (0, ('hi', 1)) 其中第一元素是第二元素的插入顺序,第二元素是包含单词及其频率计数的元组。
按字数降序和插入顺序升序对生成的元组进行排序(以打破联系),并选择第二个元素。

from collections import Counter

s = 'Hi I live in America I love cooking'

counts = Counter()
for word in s.lower().split():
    counts[word] += 1

d = dict(
    [
        tpl[1]
        for tpl in sorted(
            enumerate(counts.items()),
            key=lambda elem: (-elem[1][1], elem[0])
        )
    ]
)

print(d)

打印:

{'i': 2, 'hi': 1, 'live': 1, 'in': 1, 'america': 1, 'love': 1, 'cooking': 1}