代码之家 › 专栏 › 技术社区 › mariachi

使用名称作为R中列的值,列表到dataframe

data-cleaning tidyr dplyr r

mariachi · 技术社区 · 8 年前

我有88个标签分隔的文件需要导入到R中。

研究:研究名称
1: 受试者id
[1] 2:实验日(1或2)
1[2]:试验(1或2)

每一项中的数据如下所示:

START: dd.mm.yyy hh:mm:ss

WAITING 3780    ms      REACTION    1230  ms

WAITING 9700    ms      REACTION    377 ms


WAITING 5538    ms      REACTION    310 ms

WAITING 4599    ms      REACTION    361 ms

WAITING 9579    ms      REACTION    338 ms
END: dd.mm.yyy hh:mm:ss

到目前为止,我将所有这些都导入了一个列表,并对每个列表进行了总结,因此最终结果是一个表,其中有两列“等待”和“反应”,两列都有一个平均值。

# Load filepaths and names
filepath <- list.files(path = "rawdata/", pattern = "*.dat", all.files = TRUE, full.names = TRUE) # Load full path
filenames <- list.files(path = "rawdata/", pattern = "*.dat", all.files = TRUE, full.names = FALSE) # load names of files

# load all files into list with named col headers
ldf <- lapply(filepath, function(x) read_tsv(file = x, skip = 1,
              col_names = c("waiting", "valueW", "ms", "ws", "reaction", "valueR", "ms1")))

names(ldf) <- filenames # rename items in list

# select only relevant cols and do the math
ldf <- lapply(ldf, function(x) x %>% 
                select(waiting, valueW, reaction, valueR) %>%
                filter(waiting == "WAITING") %>%
                summarise(waiting = mean(valueW), reaction = mean(valueR))
              )

现在我要做的是创建一个数据框架,其中包含基于文件名的列(如上:study-1-12):

id:第一个1
实验:1或2
试验:1或2
等待:列表中每个数据帧的值
反应:列表中每个数据帧的值

有没有办法在R中做到这一点?

1 回复 | 直到 8 年前

hrbrmstr 8 年前

library(purrr)
library(stringi)

fils <- list.files("~/Data/so", full.names=TRUE)

fils
## [1] "/Some/path/to/data/studyA-1-12"  "/Some/path/to/data/studyB-30-31"

map_df(fils, function(x) {

  stri_match_all_regex(x, "([[:alnum:]]+)-([[:digit:]]+)-([[:digit:]])([[:digit:]])")[[1]] %>%
    as.list() %>%
    .[2:5] %>%
    set_names(c("study_name", "subject_id", "experiment_day", "trial")) -> meta

  readLines(x) %>%
    grep("WAITING", ., value=TRUE) %>%
    map(~scan(text=., quiet=TRUE,
              what=list(character(), double(), character(),
                                character(), double(), character()))[c(2,5)]) %>%
    map_df(~set_names(as.list(.), c("waiting", "reaction"))) -> df

  df$study_name <- meta$study_name
  df$subject_id <- meta$subject_id
  df$experiment_day <- meta$experiment_day
  df$trial <- meta$trial

  df

})
## # A tibble: 10 Ã 6
##    waiting reaction study_name subject_id experiment_day trial
##      <dbl>    <dbl>      <chr>      <chr>          <chr> <chr>
## 1     3780     1230     studyA          1              1     2
## 2     9700      377     studyA          1              1     2
## 3     5538      310     studyA          1              1     2
## 4     4599      361     studyA          1              1     2
## 5     9579      338     studyA          1              1     2
## 6     3780     1230     studyB         30              3     1
## 7     9700      377     studyB         30              3     1
## 8     5538      310     studyB         30              3     1
## 9     4599      361     studyB         30              3     1
## 10    9579      338     studyB         30              3     1