代码之家  ›  专栏  ›  技术社区  ›  konstantin

随机删除numpy数组中30%的值

  •  0
  • konstantin  · 技术社区  · 7 年前

    我有一个包含我的值的二维numpy数组(其中一些值可以是NaN)。我想删除30%的非NaN值,并用数组的平均值替换它们。我该怎么做?我到目前为止所做的尝试:

    def spar_removal(array, mean_value, sparseness):
        array1 = deepcopy(array)
        array2 = array1
        spar_size = int(round(array2.shape[0]*array2.shape[1]*sparseness))
        for i in range (0, spar_size):
            index = np.random.choice(np.where(array2 != mean_value)[1])
            array2[0, index] = mean_value
        return array2
    

    但这只是选择数组的同一行。如何从整个阵列中移除?似乎选择只适用于一个维度。我想我要的是计算 (x, y) 将其值替换为 mean_value .

    2 回复  |  直到 7 年前
        1
  •  3
  •   jedwards    7 年前

    import numpy as np
    
    x = np.array([[1,2,3,4],
                  [1,2,3,4],
                  [np.NaN, np.NaN, np.NaN, np.NaN],
                  [1,2,3,4]])
    
    # Get a vector of 1-d indexed indexes of non NaN elements
    indices = np.where(np.isfinite(x).ravel())[0]
    
    # Shuffle the indices, select the first 30% (rounded down with int())
    to_replace = np.random.permutation(indices)[:int(indices.size * 0.3)]
    
    # Replace those indices with the mean (ignoring NaNs)
    x[np.unravel_index(to_replace, x.shape)] = np.nanmean(x)
    
    print(x)
    

    [[ 2.5  2.   2.5  4. ]
     [ 1.   2.   3.   4. ]
     [ nan  nan  nan  nan]
     [ 2.5  2.   3.   4. ]]
    

        2
  •  1
  •   Ernie Yang    7 年前

    def spar_removal(array, mean_value, sparseness):
    
        array1 = copy.deepcopy(array)
        array2 = array1
        spar_size = int(round(array2.shape[0]*array2.shape[1]*sparseness))
        # This is used to filtered out nan
        indexs = np.where(array2==array2)
        indexsL = len(indexs[0])
    
        for i in np.random.choice(indexsL,spar_size,replace=False):
            indexX = indexs[0][i]
            indexY = indexs[1][i]
            array2[indexX,indexY] = mean_value
    
    return array2