代码之家 › 专栏 › 技术社区 › danielvdende

维度不可知(通用)笛卡尔乘积[重复]

cartesian-product numpy arrays python

danielvdende · 技术社区 · 11 年前

我希望生成大量阵列的笛卡尔积,以跨越高维网格。由于高维度,不可能将笛卡尔乘积计算的结果存储在内存中;而是将其写入硬盘。由于这种限制,我需要在生成中间结果时访问它们。到目前为止,我一直在做的是:

for x in xrange(0, 10):
    for y in xrange(0, 10):
        for z in xrange(0, 10):
            writeToHdd(x,y,z)

除了非常讨厌之外,它是不可扩展的(即,它需要我编写与维度一样多的循环)。我已尝试使用建议的解决方案 here ,但这是一个递归解决方案,因此很难在生成结果时立即获得结果。除了每个维度有一个硬编码循环之外,还有什么“整洁”的方法吗?

3 回复 | 直到 9 年前

Gareth Rees 11 年前

在普通Python中,可以使用 itertools.product .

>>> arrays = range(0, 2), range(4, 6), range(8, 10)
>>> list(itertools.product(*arrays))
[(0, 4, 8), (0, 4, 9), (0, 5, 8), (0, 5, 9), (1, 4, 8), (1, 4, 9), (1, 5, 8), (1, 5, 9)]

在Numpy中,您可以组合 numpy.meshgrid (通过 sparse=True 以避免在内存中扩展产品) numpy.ndindex :

>>> arrays = np.arange(0, 2), np.arange(4, 6), np.arange(8, 10)
>>> grid = np.meshgrid(*arrays, sparse=True)
>>> [tuple(g[i] for g in grid) for i in np.ndindex(grid[0].shape)]
[(0, 4, 8), (0, 4, 9), (1, 4, 8), (1, 4, 9), (0, 5, 8), (0, 5, 9), (1, 5, 8), (1, 5, 9)]

user2379410 user2379410 11 年前

我想我找到了一个使用内存映射文件的好方法:

def carthesian_product_mmap(vectors, filename, mode='w+'):
    '''
    Vectors should be a tuple of `numpy.ndarray` vectors. You could
    also make it more flexible, and include some error checking
    '''        
    # Make a meshgrid with `copy=False` to create views
    grids = np.meshgrid(*vectors, copy=False, indexing='ij')

    # The shape for concatenating the grids from meshgrid
    shape = grid[0].shape + (len(vectors),)

    # Find the "highest" dtype neccesary
    dtype = np.result_type(*vectors)

    # Instantiate the memory mapped file
    M = np.memmap(filename, dtype, mode, shape=shape)

    # Fill the memmap with the grids
    for i, grid in enumerate(grids):
        M[...,i] = grid

    # Make sure the data is written to disk (optional?)
    M.flush()

    # Reshape to put it in the right format for Carthesian product
    return M.reshape((-1, len(vectors)))

但我想知道你是否真的需要存储整个Cartesian产品(有很多数据重复)。在需要的时候在产品中生成行不是一个选项吗?

NoDataDumpNoContribution 11 年前

似乎您只想在任意数量的维度上循环。我的通用解决方案是使用索引字段和增量索引以及处理溢出。

例子:

n = 3 # number of dimensions
N = 1 # highest index value per dimension

idx = [0]*n
while True:
    print(idx)
    # increase first dimension
    idx[0] += 1
    # handle overflows
    for i in range(0, n-1):
        if idx[i] > N:
            # reset this dimension and increase next higher dimension
            idx[i] = 0
            idx[i+1] += 1
    if idx[-1] > N:
        # overflow in the last dimension, we are finished
        break

给予:

[0, 0, 0]
[1, 0, 0]
[0, 1, 0]
[1, 1, 0]
[0, 0, 1]
[1, 0, 1]
[0, 1, 1]
[1, 1, 1]

Numpy有类似的内置功能: ndenumerate .