代码之家  ›  专栏  ›  技术社区  ›  carozimm

列表中的空行作为R中data.frame中的NA值

  •  1
  • carozimm  · 技术社区  · 11 年前

    我的数据框架如下:

    hospital <- c("PROVIDENCE ALASKA MEDICAL CENTER", "ALASKA REGIONAL HOSPITAL", "FAIRBANKS MEMORIAL HOSPITAL", 
              "CRESTWOOD MEDICAL CENTER", "BAPTIST MEDICAL CENTER EAST", "ARKANSAS HEART HOSPITAL", 
              "MEDICAL CENTER NORTH LITTLE ROCK", "CRITTENDEN MEMORIAL HOSPITAL")
    state <- c("AK", "AK", "AK", "AL", "AL", "AR", "AR", "AR")
    rank <- c(1,2,3,1,2,1,2,3)
    df <- data.frame(hospital, state, rank)
    df
    
                                     hospital    state     rank
        1   PROVIDENCE ALASKA MEDICAL CENTER        AK        1
        2   ALASKA REGIONAL HOSPITAL                AK        2
        3   FAIRBANKS MEMORIAL HOSPITAL             AK        3
        4   CRESTWOOD MEDICAL CENTER                AL        1
        5   BAPTIST MEDICAL CENTER EAST             AL        2
        6   ARKANSAS HEART HOSPITAL                 AR        1
        7   MEDICAL CENTER NORTH LITTLE ROCK        AR        2
        8   CRITTENDEN MEMORIAL HOSPITAL            AR        3
    

    我想创建一个函数rankall,它以等级为参数,并返回每个州的该等级医院,如果该州没有与给定等级匹配的医院,则返回NA。例如,我希望rankall(rank=3)的输出如下所示:

                               hospital     state 
        AK  FAIRBANKS MEMORIAL HOSPITAL        AK    
        AL                         <NA>        AL
        AR CRITTENDEN MEMORIAL HOSPITAL        AR    
    

    我尝试过:

    rankall <- function(rank) {
    split_by_state <- split(df, df$state)
    ranked_hospitals <- lapply(split_by_state, function (x) {
        x[(x$rank==rank), ]
    })
    combined_ranked_hospitals <- do.call(rbind, ranked_hospitals)
    return(combined_ranked_hospitals[ ,1:2])
    }
    

    但rankall(rank=3)返回:

                                     hospital     state     
        AK       FAIRBANKS MEMORIAL HOSPITAL         AK                        
        AR       CRITTENDEN MEMORIAL HOSPITAL        AR             
    

    这忽略了我需要跟踪的NA值。R是否有办法将函数中列表对象中的空行识别为NA,而不是空行?除了lapply之外,还有其他功能对这项任务更有用吗?

    [注:此数据框架来自Coursera R编程课程。这也是我在Stackoverflow上的第一篇文章,也是我第一次学习编程。感谢所有提供解决方案和建议的人,这个论坛太棒了。]

    4 回复  |  直到 11 年前
        1
  •  1
  •   Jthorpe    11 年前

    您只需要在函数中添加in/else:

    rankall <- function(rank) {
        split_by_state <- split(df, df$state)
        ranked_hospitals <- lapply(split_by_state, function (x) {
            indx <- x$rank==rank
            if(any(indx)){
                return(x[indx, ])
            else{
                out = x[1, ]
                out$hospital = NA
                return(out)
            }
        }
    }
    
        2
  •  1
  •   lukeA    11 年前

    以下是另一种方法:

    rankall <- function(rank) {  
      do.call(rbind, lapply(split(df, df$state), function(df) { 
        tmp <- df[df$rank == rank, 1:2]   
        if (!nrow(tmp)) return(transform(df[1, 1:2], hospital = NA)) else return(tmp) 
      })) 
    }
    rankall(3)
    #   hospital state
    #   AK  FAIRBANKS MEMORIAL HOSPITAL    AK
    #   AL                         <NA>    AL
    #   AR CRITTENDEN MEMORIAL HOSPITAL    AR
    
        3
  •  1
  •   jazzurro    11 年前

    这是另一个 dplyr 方法

    fun1 <- function(x) {
                group_by(df, state) %>%
                summarise(hospital = hospital[x],
                          rank = nth(rank, x))
            }
    
    # fun1(3)
    #Source: local data frame [3 x 3]
    #
    #  state                     hospital rank
    #1    AK  FAIRBANKS MEMORIAL HOSPITAL    3
    #2    AL                           NA   NA
    #3    AR CRITTENDEN MEMORIAL HOSPITAL    3
    
        4
  •  0
  •   Alex Coppock    11 年前

    我认为这是对 dplyr 。唯一奇怪的是,当我使用 NA 而不是 "NA" 有人想知道为什么吗?

    library(dplyr)
    rankall <- function(chosen_rank){
      group_by(df, state) %>%
        summarize(hospital = ifelse(length(hospital[rank==chosen_rank])!=0,
                                    as.character(hospital[rank==chosen_rank]), "NA"),
                  rank = chosen_rank)
    }
    
    rankall(1)
    rankall(2)
    rankall(3)