代码之家 › 专栏 › 技术社区 › Emanuele Paolini

numpy:累积多重性计数

numpy performance python

Emanuele Paolini · 技术社区 · 7 年前

我有一个排序的int数组,其中可能有重复。我想计算连续相等的值,当一个值与前一个值不同时,从零重新开始。这是使用简单python循环实现的预期结果:

import numpy as np

def count_multiplicities(a):
    r = np.zeros(a.shape, dtype=a.dtype)
    for i in range(1, len(a)):
        if a[i] == a[i-1]:
            r[i] = r[i-1]+1
        else:
            r[i] = 0
    return r

a = (np.random.rand(20)*5).astype(dtype=int)
a.sort()

print "given sorted array: ", a
print "multiplicity count: ", count_multiplicities(a)

given sorted array:  [0 0 0 0 0 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4]
multiplicity count:  [0 1 2 3 4 0 1 2 0 1 2 3 0 1 2 3 0 1 2 3]

如何使用numpy以有效的方式获得相同的结果?数组很长,但重复次数很少(比如不超过十次)。

2 回复 | 直到 7 年前

Divakar 7 年前

这里有一个 cumsum

def count_multiplicities_cumsum_vectorized(a):      
    out = np.ones(a.size,dtype=int)
    idx = np.flatnonzero(a[1:] != a[:-1])+1
    out[idx[0]] = -idx[0] + 1
    out[0] = 0
    out[idx[1:]] = idx[:-1] - idx[1:] + 1
    np.cumsum(out, out=out)
    return out

In [58]: a
Out[58]: array([0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4])

In [59]: count_multiplicities(a) # Original approach
Out[59]: array([0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 0, 1, 2])

In [60]: count_multiplicities_cumsum_vectorized(a)
Out[60]: array([0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 0, 1, 2])

In [66]: a = (np.random.rand(200000)*1000).astype(dtype=int)
    ...: a.sort()
    ...: 

In [67]: a
Out[67]: array([  0,   0,   0, ..., 999, 999, 999])

In [68]: %timeit count_multiplicities(a)
10 loops, best of 3: 87.2 ms per loop

In [69]: %timeit count_multiplicities_cumsum_vectorized(a)
1000 loops, best of 3: 739 Âµs per loop

Related post .

max9111 7 年前

我会用麻木来解决这些问题

import numba
nb_count_multiplicities = numba.njit("int32[:](int32[:])")(count_multiplicities)
X=nb_count_multiplicities(a)

在完全不重写代码的情况下,它比Divakar的矢量化解决方案快约50%。

推荐文章

Google User · Django管理员在`list_display中未显示`creation_date`字段`

3 月前

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

3 月前

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

3 月前

user29715306 · from_users=和chats=电视节目中的差异

3 月前

Redshoe · 当执行numpy.genfromtxt()时,python是否会读取文件的所有行?

4 月前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

4 月前

prayner · 更新嵌套字典包含列表中的项

4 月前

Bringo Jr · 我可以在O(n)中解决这个问题吗?

4 月前

Dave · 如何在for循环中修改列表值

4 月前

Shukurullox Komiljonov · 从记录中获得相互和解。使用SQL

4 月前