通常,我可以在使用赛通时与numba的表现相匹配。然而,在这个例子中,我没有做到这一点——numba比我的cython版本快4倍。
这里是赛通版本:
%%cython -c=-march=native -c=-O3
cimport numpy as np
import numpy as np
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def cy_where(double[::1] df):
cdef int i
cdef int n = len(df)
cdef np.ndarray[dtype=double] output = np.empty(n, dtype=np.float64)
for i in range(n):
if df[i]>0.5:
output[i] = 2.0*df[i]
else:
output[i] = df[i]
return output
这里是numba版本:
import numba as nb
@nb.njit
def nb_where(df):
n = len(df)
output = np.empty(n, dtype=np.float64)
for i in range(n):
if df[i]>0.5:
output[i] = 2.0*df[i]
else:
output[i] = df[i]
return output
测试时,赛通的版本与numpy的一样。
where
但明显不如Numba:
#Python3.6 + Cython 0.28.3 + gcc-7.2
import numpy
np.random.seed(0)
n = 10000000
data = np.random.random(n)
assert (cy_where(data)==nb_where(data)).all()
assert (np.where(data>0.5,2*data, data)==nb_where(data)).all()
%timeit cy_where(data) # 179ms
%timeit nb_where(data) # 49ms (!!)
%timeit np.where(data>0.5,2*data, data) # 278 ms
Numba表现的原因是什么?在使用Cython时如何匹配?
正如@max9111所建议的,使用连续内存视图来消除步幅,这并不能显著提高性能:
@cython.boundscheck(False)
@cython.wraparound(False)
def cy_where_cont(double[::1] df):
cdef int i
cdef int n = len(df)
cdef np.ndarray[dtype=double] output = np.empty(n, dtype=np.float64)
cdef double[::1] view = output # view as continuous!
for i in range(n):
if df[i]>0.5:
view[i] = 2.0*df[i]
else:
view[i] = df[i]
return output
%timeit cy_where_cont(data) # 165 ms