代码之家  ›  专栏  ›  技术社区  ›  tmfmnk

用行替换缺少的值意味着如果每行正好有n个缺少的值

  •  1
  • tmfmnk  · 技术社区  · 7 年前

    我有一个数据矩阵,每行有不同数量的缺失值。我想要的是用row替换缺少的值,这意味着如果每行缺少的值的数目为n(假设为1)。

    我已经为这个问题创建了一个解决方案,但这是一个非常不雅的问题,所以我正在寻找其他的解决方案。

    我的解决方案:

    #SAMPLE DATA
    
    a <- c(rep(c(1:4, NA), 2))
    b <- c(rep(c(1:3, NA, 5), 2))
    c <- c(rep(c(1:3, NA, 5), 2))
    
    df <- as.matrix(cbind(a,b,c), ncol = 3, nrow = 10)
    
    #CALCULATING THE NUMBER OF MISSING VALUES PER ROW
    
    miss_row <- rowSums(apply(as.matrix(df), c(1,2), function(x) {
      sum(is.na(x)) +
      sum(x == "", na.rm=TRUE)
    }) )
    
    df <- cbind(df, miss_row)
    
    #CALCULATING THE ROW MEANS FOR ROWS WITH 1 MISSING VALUE
    
    row_mean <- ifelse(df[,4] == 1, rowMeans(df[,1:3], na.rm = TRUE), NA)
    
    df <- cbind(df, row_mean)
    
    2 回复  |  直到 7 年前
        1
  •  5
  •   Cath    7 年前

    以下是我在评论中提到的方式,包括更多细节:

    # create your matrix
    df <- cbind(a, b, c) # already a matrix, you don't need as.matrix there
    
    # Get number of missing values per row (is.na is vectorised so you can apply it directly on the entire matrix)
    nb_NA_row <- rowSums(is.na(df))
    
    # Replace missing values row-wise by the row mean when there is N NA in the row
    N <- 1 # the given example
    df[nb_NA_row==N] <- rowMeans(df, na.rm=TRUE)[nb_NA_row==N]
    
    # check df
    
    df
    #      a  b  c
    # [1,] 1  1  1
    # [2,] 2  2  2
    # [3,] 3  3  3
    # [4,] 4 NA NA
    # [5,] 5  5  5
    # [6,] 1  1  1
    # [7,] 2  2  2
    # [8,] 3  3  3
    # [9,] 4 NA NA
    #[10,] 5  5  5
    
        2
  •  1
  •   moodymudskipper    7 年前
    df <- data.frame(df)
    df$miss_row <- rowSums(is.na(df))
    df$row_mean <- NA
    df$row_mean[df$miss_row == 1] <- rowMeans(df[df$miss_row == 1,1:3],na.rm = TRUE)
    #     a  b  c miss_row row_mean
    # 1   1  1  1        0       NA
    # 2   2  2  2        0       NA
    # 3   3  3  3        0       NA
    # 4   4 NA NA        2       NA
    # 5  NA  5  5        1        5
    # 6   1  1  1        0       NA
    # 7   2  2  2        0       NA
    # 8   3  3  3        0       NA
    # 9   4 NA NA        2       NA
    # 10 NA  5  5        1        5
    

    (这给出了您期望的输出,它似乎与您的文本不完全一致,但对此,请参阅注释和重复链接)