代码之家  ›  专栏  ›  技术社区  ›  user6156267

关联两列,排序监视器ID#,返回向量列出R中的相关性

  •  0
  • user6156267  · 技术社区  · 8 年前

    我对R很陌生,遇到了麻烦。我知道其他人也问过这个问题,但我正在努力让我的代码正常工作,希望能理解出什么问题-

    The prompt is as follows: Write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greater than the threshold. The function should return a vector of correlations for the monitors that meet the threshold requirement. If no monitors meet the threshold requirement, then the function should return a numeric vector of length 0. A prototype of this function follows: 
            corr <- function(directory, threshold = 0) {
                ## 'directory' is a character vector of length 1 indicating the location of
                ## the CSV files
    
                ## 'threshold' is a numeric vector of length 1 indicating the number of
                ## completely observed observations (on all variables) required to compute
                ## the correlation between nitrate and sulfate; the default is 0
    
                ## Return a numeric vector of correlations
    

                spectdata<- list.files(pattern= ".csv") #creates vector with list of filenames
                corr<-function(directory,threshold =0, id = 1:332){
                combined<-data.frame() #creates empty data frame
                output<-data.frame()
                output1<-data.frame()
                  for(i in id){
                    combined<-rbind(read.csv(directory[i], header=TRUE))
                    output<-rbind(output,combined) #will open the CVS files and append the tables together 
                    output1<-output[complete.cases(output), ] #??gets rid of NA in files
                    sulfate<-output1["sulfate"] # ?? I think this will be a vector that is a subset of output1 that matches the "sulfate" column 
                    nitrate<-output1["nitrate"]# ?? I think this will be a vector that is a subset of output1 that matches the "nitrate" column 
    
                    }
                 ok<-complete.cases(combined) #counts the number of complete cases
                 if (threshold>= ok){ 
                   correlation<-cor(data.frame(nitrate,sulfate))
                   return(correlation)}
                  else {
                   print ("nothing!") }
            }
                cr<-corr(spectdata,threshold =150)     
                head(cr)
    
            **I'm getting:** 
                    > cr<-corr(spectdata,threshold =150)     
                    Warning message:
                    In if (threshold >= ok) { :
                    the condition has length > 1 and only the first element will be used
                   > head(cr)
                           nitrate    sulfate
                nitrate 1.00000000 0.06243369
                sulfate 0.06243369 1.00000000
    
        The answer for this particular problem where threshold = 150, should be: 
            source("corr.R")
            source("complete.R")
            cr <- corr("specdata", 150)
            head(cr)
             ## [1] -0.01895754 -0.14051254 -0.04389737 -0.06815956 -0.12350667 -0.07588814
       so it looks like the answer I have is completely wrong ha 
           Please feel free to provide any insight to 1) how to get a correctly sized vector, any other syntax or verbiage that might be helpful 
    

    1 回复  |  直到 8 年前
        1
  •  0
  •   Evan Friedland    8 年前

    请在下面找到更清晰的代码。我很乐意回答任何问题。

    corr <- function(directory, threshold = 0) {
      # set the working directory
      setwd(dir = directory)
      # creates vector of filenames within the directory
      spectdata <- list.files(pattern = ".csv") 
      # for each spectdata, read the sulfate and nitrate columns 
      L1 <- lapply(spectdata, function(x) read.csv(x, header = TRUE)[,c("sulfate","nitrate")])
      # for each csv that was read, removes rows that have NA
      L2 <- lapply(L1, function(x) x[complete.cases(x),])
      # removes csv from list if not greater than or are equal to the threshold
      L3 <- Filter(function(x) nrow(x) >= threshold, L2)
      # if the list still has a csv results after Filter (length of list > 0) then:
      if(length(L3) > 0) {
        # for each csv in list, calculate and save correlation between sulfare and nitrate
        Correlation <- lapply(L3, function(x) cor(x[,"sulfate"], x[,"nitrate"]))
        # change list output to a vector output
        unlist(Correlation) 
      } else {
        # return a zero length vector
        numeric(0)
      }
    }
    
    corr(directory = "C:/Users/Evan Friedland/Desktop/DIRECTORY", threshold = 100)
    

    corr <- function(directory, threshold = 0) {
      setwd(dir = directory)
      spectdata <- list.files(pattern = ".csv") 
      L1 <- lapply(spectdata, function(x) read.csv(x, header = TRUE)[,c("sulfate","nitrate")])
      L2 <- lapply(L1, function(x) x[complete.cases(x),])
      L3 <- Filter(function(x) nrow(x) >= threshold, L2)
      if(length(L3) > 0) {
        Correlation <- lapply(L3, function(x) cor(x[,"sulfate"], x[,"nitrate"]))
        unlist(Correlation) 
      } else {
        numeric(0)
      }
    }
    
    推荐文章