代码之家 › 专栏 › 技术社区 › Brad Solomon

Cython:为什么NumPy数组需要类型转换为object?

cython numpy python

4

Brad Solomon · 技术社区 · 7 年前

我在电影里见过几次这样的东西 Pandas source :

def nancorr(ndarray[float64_t, ndim=2] mat, bint cov=0, minp=None):
    # ...
    N, K = (<object> mat).shape

这意味着 ndarray 打电话 mat 是 type-casted 到Python对象。 ^*

经进一步检查,这似乎是因为如果不是编译错误,就会出现编译错误。我的问题是: 为什么首先需要这种类型的转换 ?

这里有几个例子。 This 答案很简单,元组打包在Cython中不像在Python中那样工作,但它似乎不是元组解包问题。(不管怎样,这是一个很好的答案,我不想挑刺。)

取下面的脚本, shape.pyx . 它将在编译时失败,并显示“无法将'npy_intp*'转换为Python对象”

from cython cimport Py_ssize_t
import numpy as np
from numpy cimport ndarray, float64_t
cimport numpy as cnp
cnp.import_array()

def test_castobj(ndarray[float64_t, ndim=2] arr):

    cdef:
        Py_ssize_t b1, b2

    # Tuple unpacking - this will fail at compile
    b1, b2 = arr.shape
    return b1, b2

但同样,问题似乎不是元组拆包 ,就其本身而言。同样的错误也会导致失败。

def test_castobj(ndarray[float64_t, ndim=2] arr):

    cdef:
        # Py_ssize_t b1, b2
        ndarray[float64_t, ndim=2] zeros

    zeros = np.zeros(arr.shape, dtype=np.float64)
    return zeros

看起来,这里没有元组解包。元组是第一个参数 np.zeros .

def test_castobj(ndarray[float64_t, ndim=2] arr):
    """This works"""
    cdef:
        Py_ssize_t b1, b2
        ndarray[float64_t, ndim=2] zeros

    b1, b2 = (<object> arr).shape
    zeros = np.zeros((<object> arr).shape, dtype=np.float64)
    return b1, b2, zeros

这同样有效(也许是最令人困惑的):

def test_castobj(object[float64_t, ndim=2] arr):
    cdef:
        tuple shape = arr.shape
        ndarray[float64_t, ndim=2] zeros
    zeros = np.zeros(shape, dtype=np.float64)
    return zeros

例子:

>>> from shape import test_castobj
>>> arr = np.arange(6, dtype=np.float64).reshape(2, 3)

>>> test_castobj(arr)
(2, 3, array([[0., 0., 0.],
        [0., 0., 0.]]))

_{*也许这和

arr

成为一个记忆视图?但那是在黑暗中拍摄的。}

另一个例子是Cython docs :

cpdef int sum3d(int[:, :, :] arr) nogil:
    cdef size_t i, j, k
    cdef int total = 0
    I = arr.shape[0]
    J = arr.shape[1]
    K = arr.shape[2]

在这种情况下,简单地标引 arr.shape[i] 防止我觉得奇怪的错误。

这也适用于:

def test_castobj(object[float64_t, ndim=2] arr):
    cdef ndarray[float64_t, ndim=2] zeros
    zeros = np.zeros(arr.shape, dtype=np.float64)
    return zeros

1 回复 | 直到 7 年前

1

ead 7 年前

你说得对,这和Cython下的元组解包无关。

原因是 cnp.ndarray 不是通常的numpy数组(这意味着具有python已知接口的numpy数组),而是 Cython wrapper 对于 PyArrayObject (即 np.array 在Python中):

ctypedef class numpy.ndarray [object PyArrayObject]:
    cdef __cythonbufferdefaults__ = {"mode": "strided"}

    cdef:
        # Only taking a few of the most commonly used and stable fields.
        # One should use PyArray_* macros instead to access the C fields.
        char *data
        int ndim "nd"
        npy_intp *shape "dimensions"
        npy_intp *strides
        dtype descr
        PyObject* base

shape 现实中的地图 dimensions -field ( npy_intp *shape "dimensions" 而不是简单的 npy_intp *dimensions )关于潜在的C-stuct。这是个骗局,所以你可以写

mat.shape[0]

它的外观(在某种程度上还有感觉)就好像numpy的python属性一样 形状 被称为。但在现实中,直接通向潜在C-stuct的捷径被采用了。

顺便说一下调用python- 形状 代价很高:必须创建一个元组并用 尺寸 ,然后访问第0个元素。另一方面,Cython的方法要便宜得多——只要访问正确的元素。

但是,如果您还想访问数组的python属性,则必须将其转换为一个普通的python对象(即,忘记这是 ndarray )然后 形状 通过常用的Python机制解析为元组属性调用。

因此,基本上,即使这样做很方便,您也不希望像在pandas代码中那样在一个紧密的循环中访问numpy数组的维度,而是希望执行更详细的性能变体:

...
N=mat.shape[0]
K=mat.shape[1]
...

为什么你能写 object[cnp.float64_t] 或者类似的函数签名让我觉得很奇怪-参数显然被解释为一个简单的对象。也许这只是个虫子。