代码之家 › 专栏 › 技术社区 › lawful_neutral

重新分配字典值

dictionary python

lawful_neutral · 技术社区 · 7 年前

我有一本字典

{'A': 0, 'B': 1, 'C': 2, 'D': 3, etc}

如果字典没有排序,如何在不创建值间隙的情况下从字典中删除元素?

例如:

我有一个很大的矩阵,其中行表示单词,列表示遇到这些单词的文档。我把这些单词及其相应的索引存储为字典。例如,对于这个矩阵

字典看起来像:

words = {'apple': 0, 'orange': 1, 'banana': 2, 'pear': 3}

如果我去掉这些词 'apple' 和 'banana' ,矩阵将只包含两行。所以 'orange' 在字典里现在应该相等了 0 而不是 1 ,以及 'pear' 应该是 一 而不是 3 .

在python 3.6+中,字典是有序的,因此我可以编写类似这样的代码来重新分配值:

i = 0
for k, v in words.items():
  v = i
  i += 1

或者,或者

words = dict(zip(terms.keys(), range(0, matrix.shape[0])))

我认为,这远不是改变值的最有效的方法,而且对于无序的字典来说,这是行不通的。如何有效地做到这一点?如果字典没有排序,有没有什么方法可以轻松地重新分配值?

5 回复 | 直到 7 年前

Aran-Fey Kevin 7 年前

将dict转换为排序列表,然后在不删除要删除的单词的情况下生成新dict:

import itertools

to_remove = {'apple', 'banana'}

# Step 1: sort the words
ordered_words = [None] * len(words)
for word, index in words.items():
    ordered_words[index] = word
# ordered_words: ['apple', 'orange', 'banana', 'pear']

# Step 2: Remove unwanted words and create a new dict
counter = itertools.count()
words = {word: next(counter) for word in ordered_words if word not in to_remove}
# result: {'orange': 0, 'pear': 1}

这有一个o(n)的运行时,因为使用索引操作手动排序列表是一个线性操作,而不是 sorted 也就是o(n logn)。

另请参见文档 itertools.count 和 next .

iacob 7 年前

您可以使用现有逻辑,使用已排序字典的表示:

import operator

words = {'apple': 0, 'orange': 1, 'banana': 2, 'pear': 3}
sorted_words = sorted(words.items(), key=operator.itemgetter(1))

for i, (k, v) in enumerate(sorted_words):
    words[k] = i

user9885031 7 年前

最初我们有

words = {'apple': 0, 'orange': 1, 'banana': 2, 'pear': 3}

要根据从最小到最大的顺序重新排序,可以使用 sorted 和字典理解。

std = sorted(words, key=lambda x: words[x])

newwords = { word : std.index(word) for word in std }

这样可以吗?

RoadRunner 7 年前

您可以始终保留一个将索引映射到单词的反向字典,并将其用作保持原始字典顺序的参考。然后,您可以删除这些单词,然后重新生成词典:

words = {'apple': 0, 'orange': 1, 'banana': 2, 'pear': 3}

# reverse dict for index -> word mappings
inverted = {i: word for word, i in words.items()}

remove = {'apple', 'banana'}

# sort/remove the words
new_words = [inverted[i] for i in range(len(inverted)) if inverted[i] not in remove]

# rebuild new dictionary
new_dict = {word: i for i, word in enumerate(new_words)}

print(new_dict)

哪些输出:

{'orange': 0, 'pear': 1}

注: 就像公认的答案,这也是 O(n) .

Uri Goren 7 年前

你用错工具了( dict )对于这个工作,你应该使用 list

class vocabulary:
    def __init__(self, *words):
        self.words=list(words)
    def __getitem__(self, key):
        try:
             return self.words.index(key)
        except ValueError:
            print (key + " is not in vocabulary")
    def remove(self, word):
        if type(word)==int:
           del self.words[word]
           return
        return self.remove(self[word])

words = vocabulary("apple" ,"banana", "orange")
print (words["banana"]) # outputs 1
words.remove("apple")
print (words["banana"]) # outputs 0

关于复杂性的注记

我有几条评论提到 双关语 因为它的查找时间是 O(1) 以及 列表 是 O(n) .

这很简单 不是真的 在这种情况下。

这个 O(1) 哈希表的保证( 双关语 在python中),是一个分散的复杂度的结果,这意味着您平均了查找表的常见用法,即 生成一次 ,假设哈希函数是平衡的。

这个摊销的计算没有考虑删除整个字典和每次删除项时重新生成它,正如其他一些答案所建议的那样。

这个 列表 实施和 双关语 实现具有与 o(n) .

然而, 列表 可以使用两行python优化实现( bisect )最坏情况的复杂性 O(log(n))