代码之家 › 专栏 › 技术社区 › user6156267

关联两列,排序监视器ID#,返回向量列出R中的相关性

vector r

0

user6156267 · 技术社区 · 8 年前

我对R很陌生,遇到了麻烦。我知道其他人也问过这个问题,但我正在努力让我的代码正常工作,希望能理解出什么问题-

The prompt is as follows: Write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greater than the threshold. The function should return a vector of correlations for the monitors that meet the threshold requirement. If no monitors meet the threshold requirement, then the function should return a numeric vector of length 0. A prototype of this function follows: 
        corr <- function(directory, threshold = 0) {
            ## 'directory' is a character vector of length 1 indicating the location of
            ## the CSV files

            ## 'threshold' is a numeric vector of length 1 indicating the number of
            ## completely observed observations (on all variables) required to compute
            ## the correlation between nitrate and sulfate; the default is 0

            ## Return a numeric vector of correlations

            spectdata<- list.files(pattern= ".csv") #creates vector with list of filenames
            corr<-function(directory,threshold =0, id = 1:332){
            combined<-data.frame() #creates empty data frame
            output<-data.frame()
            output1<-data.frame()
              for(i in id){
                combined<-rbind(read.csv(directory[i], header=TRUE))
                output<-rbind(output,combined) #will open the CVS files and append the tables together 
                output1<-output[complete.cases(output), ] #??gets rid of NA in files
                sulfate<-output1["sulfate"] # ?? I think this will be a vector that is a subset of output1 that matches the "sulfate" column 
                nitrate<-output1["nitrate"]# ?? I think this will be a vector that is a subset of output1 that matches the "nitrate" column 

                }
             ok<-complete.cases(combined) #counts the number of complete cases
             if (threshold>= ok){ 
               correlation<-cor(data.frame(nitrate,sulfate))
               return(correlation)}
              else {
               print ("nothing!") }
        }
            cr<-corr(spectdata,threshold =150)     
            head(cr)

        **I'm getting:** 
                > cr<-corr(spectdata,threshold =150)     
                Warning message:
                In if (threshold >= ok) { :
                the condition has length > 1 and only the first element will be used
               > head(cr)
                       nitrate    sulfate
            nitrate 1.00000000 0.06243369
            sulfate 0.06243369 1.00000000

    The answer for this particular problem where threshold = 150, should be: 
        source("corr.R")
        source("complete.R")
        cr <- corr("specdata", 150)
        head(cr)
         ## [1] -0.01895754 -0.14051254 -0.04389737 -0.06815956 -0.12350667 -0.07588814
   so it looks like the answer I have is completely wrong ha 
       Please feel free to provide any insight to 1) how to get a correctly sized vector, any other syntax or verbiage that might be helpful

1 回复 | 直到 8 年前

1

0

Evan Friedland 8 年前

请在下面找到更清晰的代码。我很乐意回答任何问题。

corr <- function(directory, threshold = 0) {
  # set the working directory
  setwd(dir = directory)
  # creates vector of filenames within the directory
  spectdata <- list.files(pattern = ".csv") 
  # for each spectdata, read the sulfate and nitrate columns 
  L1 <- lapply(spectdata, function(x) read.csv(x, header = TRUE)[,c("sulfate","nitrate")])
  # for each csv that was read, removes rows that have NA
  L2 <- lapply(L1, function(x) x[complete.cases(x),])
  # removes csv from list if not greater than or are equal to the threshold
  L3 <- Filter(function(x) nrow(x) >= threshold, L2)
  # if the list still has a csv results after Filter (length of list > 0) then:
  if(length(L3) > 0) {
    # for each csv in list, calculate and save correlation between sulfare and nitrate
    Correlation <- lapply(L3, function(x) cor(x[,"sulfate"], x[,"nitrate"]))
    # change list output to a vector output
    unlist(Correlation) 
  } else {
    # return a zero length vector
    numeric(0)
  }
}

corr(directory = "C:/Users/Evan Friedland/Desktop/DIRECTORY", threshold = 100)

corr <- function(directory, threshold = 0) {
  setwd(dir = directory)
  spectdata <- list.files(pattern = ".csv") 
  L1 <- lapply(spectdata, function(x) read.csv(x, header = TRUE)[,c("sulfate","nitrate")])
  L2 <- lapply(L1, function(x) x[complete.cases(x),])
  L3 <- Filter(function(x) nrow(x) >= threshold, L2)
  if(length(L3) > 0) {
    Correlation <- lapply(L3, function(x) cor(x[,"sulfate"], x[,"nitrate"]))
    unlist(Correlation) 
  } else {
    numeric(0)
  }
}