代码之家  ›  专栏  ›  技术社区  ›  mariachi

使用名称作为R中列的值,列表到dataframe

  •  0
  • mariachi  · 技术社区  · 8 年前

    我有88个标签分隔的文件需要导入到R中。

    • 研究:研究名称
    • 1: 受试者id
    • [1] 2:实验日(1或2)
    • 1[2]:试验(1或2)

    每一项中的数据如下所示:

    START: dd.mm.yyy hh:mm:ss
    
    WAITING 3780    ms      REACTION    1230  ms
    
    WAITING 9700    ms      REACTION    377 ms
    
    
    WAITING 5538    ms      REACTION    310 ms
    
    WAITING 4599    ms      REACTION    361 ms
    
    WAITING 9579    ms      REACTION    338 ms
    END: dd.mm.yyy hh:mm:ss
    

    到目前为止,我将所有这些都导入了一个列表,并对每个列表进行了总结,因此最终结果是一个表,其中有两列“等待”和“反应”,两列都有一个平均值。

    # Load filepaths and names
    filepath <- list.files(path = "rawdata/", pattern = "*.dat", all.files = TRUE, full.names = TRUE) # Load full path
    filenames <- list.files(path = "rawdata/", pattern = "*.dat", all.files = TRUE, full.names = FALSE) # load names of files
    
    # load all files into list with named col headers
    ldf <- lapply(filepath, function(x) read_tsv(file = x, skip = 1,
                  col_names = c("waiting", "valueW", "ms", "ws", "reaction", "valueR", "ms1")))
    
    names(ldf) <- filenames # rename items in list
    
    # select only relevant cols and do the math
    ldf <- lapply(ldf, function(x) x %>% 
                    select(waiting, valueW, reaction, valueR) %>%
                    filter(waiting == "WAITING") %>%
                    summarise(waiting = mean(valueW), reaction = mean(valueR))
                  )
    

    现在我要做的是创建一个数据框架,其中包含基于文件名的列(如上:study-1-12):

    • id:第一个1
    • 实验:1或2
    • 试验:1或2
    • 等待:列表中每个数据帧的值
    • 反应:列表中每个数据帧的值

    有没有办法在R中做到这一点?

    1 回复  |  直到 8 年前
        1
  •  1
  •   hrbrmstr    8 年前
    library(purrr)
    library(stringi)
    
    fils <- list.files("~/Data/so", full.names=TRUE)
    
    fils
    ## [1] "/Some/path/to/data/studyA-1-12"  "/Some/path/to/data/studyB-30-31"
    
    map_df(fils, function(x) {
    
      stri_match_all_regex(x, "([[:alnum:]]+)-([[:digit:]]+)-([[:digit:]])([[:digit:]])")[[1]] %>%
        as.list() %>%
        .[2:5] %>%
        set_names(c("study_name", "subject_id", "experiment_day", "trial")) -> meta
    
      readLines(x) %>%
        grep("WAITING", ., value=TRUE) %>%
        map(~scan(text=., quiet=TRUE,
                  what=list(character(), double(), character(),
                                    character(), double(), character()))[c(2,5)]) %>%
        map_df(~set_names(as.list(.), c("waiting", "reaction"))) -> df
    
      df$study_name <- meta$study_name
      df$subject_id <- meta$subject_id
      df$experiment_day <- meta$experiment_day
      df$trial <- meta$trial
    
      df
    
    })
    ## # A tibble: 10 × 6
    ##    waiting reaction study_name subject_id experiment_day trial
    ##      <dbl>    <dbl>      <chr>      <chr>          <chr> <chr>
    ## 1     3780     1230     studyA          1              1     2
    ## 2     9700      377     studyA          1              1     2
    ## 3     5538      310     studyA          1              1     2
    ## 4     4599      361     studyA          1              1     2
    ## 5     9579      338     studyA          1              1     2
    ## 6     3780     1230     studyB         30              3     1
    ## 7     9700      377     studyB         30              3     1
    ## 8     5538      310     studyB         30              3     1
    ## 9     4599      361     studyB         30              3     1
    ## 10    9579      338     studyB         30              3     1