代码之家  ›  专栏  ›  技术社区  ›  Hydro

如何累计数据帧在R里?

  •  0
  • Hydro  · 技术社区  · 6 年前

    A to Z 在里面 myData maximum,minimum, median, upper and lower quartile 所有这些年的平均值。到目前为止,这是我费劲的代码,但不知道如何继续下去-事实上,当前的代码也没有给我我想要的。

    library(tidyverse)
    
    mydate <- as.data.frame(seq(as.Date("2000-01-01"), to= as.Date("2019-12-31"), by="day"))
    colnames(mydate) <- "Date"
    Data <- data.frame(A = runif(7305,0,10), 
                       J = runif(7305,0,8), 
                       X = runif(7305,0,12), 
                       Z = runif(7305,0,10))
    DF <- data.frame(mydate, Data)
    
    myData <- DF %>% separate(Date, into = c("Year","Month","Day")) %>% 
       sapply(as.numeric) %>% 
       as.data.frame() %>% 
       mutate(Date = DF$Date) %>% 
       filter(Month > 4 & Month < 11) %>% 
       mutate(DOY = format(Date, "%j")) %>% 
       group_by(Year) %>% 
       mutate(cumulativeSum = accumulate(DOY))
    

    我想得到一个像下面这样的数字 A, J, X, Z

    更新(编辑)

    我的问题很让人困惑,所以我决定用excel把问题分解成几个步骤。这里我只使用了一个变量,在这个例子中是 A cumulative sum . 第二步 (Step-2) step-3 ,我把我之前提到的一年中每一天的统计数据都拿来。我尽可能多地澄清,但这个问题可能有点奇怪。 enter image description here

    终极数字 这是一个例子,我想从这个练习中得到一个数字。

    enter image description here

    1 回复  |  直到 6 年前
        1
  •  1
  •   Community Mohan Dere    5 年前

    所以,如果我理解得很好的话,你要画出2000年到2019年5月到10月期间每个变量的累计值的统计描述图。

    所以这里有一个可能的解决方案来计算每个变量的第一个描述性统计(使用 dplyr , lubridate , tiydr 包)-为了理解所有步骤,我鼓励您将此代码分为几个部分。

    基本上,我隔离了日期的月份和年份,然后,将数据帧转换成一个较长的格式,过滤只在关注的期间(五月到十月)保存值,计算按变量和年份分组的值的累计和。然后,我创建一个假日期(通过粘贴一个与实际月份和日期一致的年份),以便计算这个日期和变量的描述性统计数据。

    总而言之,它给出了这样的结论:

    library(lubridate)
    library(dplyr)
    library(tidyr)
    
    mydata <- DF %>% mutate(Year = year(Date), Month = month(Date)) %>%
      pivot_longer(-c(Date,Year,Month), names_to = "variable", values_to = "values") %>% 
      filter(between(Month,5,10)) %>% 
      group_by(Year, variable) %>% 
      mutate(Cumulative = cumsum(values)) %>%
      mutate(NewDate = ymd(paste("2020", Month,day(Date), sep = "-"))) %>%
      ungroup() %>%
      group_by(variable, NewDate) %>%
      summarise(Median = median(Cumulative),
                Maximum = max(Cumulative),
                Minimum = min(Cumulative),
                Upper = quantile(Cumulative,0.75),
                Lower = quantile(Cumulative, 0.25))
    

    library(ggplot2)
    ggplot(mydata, aes(x = NewDate))+
      geom_ribbon(aes(ymin = Lower, ymax = Upper), color = "grey", alpha =0.5)+
      geom_line(aes(y = Median), color = "darkblue")+
      geom_line(aes(y = Maximum), color = "red", linetype = "dashed", size = 1.5)+
      geom_line(aes(y = Minimum), color ="red", linetype = "dashed", size = 1.5)+
      facet_wrap(~variable, scales = "free")+
      scale_x_date(date_labels = "%b", date_breaks = "month", name = "Month")+
      ylab("Daily Cumulative Precipitation (mm)")
    

    enter image description here

    它看起来像你想要达到的目标吗?


    编辑:添加图例

    在这里添加图例并不容易,因为您正在使用不同的 geom (色带、线条)不同颜色、形状。。。

    geom公司 然后做:

    mydata %>% pivot_longer(cols = c(Median, Minimum,Maximum), names_to = "Statistic",values_to = "Value") %>%
      ggplot(aes(x = NewDate))+
      geom_ribbon(aes(ymin = Lower, ymax = Upper, fill = "Upper / Lower"), alpha =0.5)+
      geom_line(aes(y = Value, color = Statistic, linetype = Statistic, size = Statistic))+
      facet_wrap(~variable, scales = "free")+
      scale_x_date(date_labels = "%b", date_breaks = "month", name = "Month")+
      ylab("Daily Cumulative Precipitation (mm)")+
      scale_size_manual(values = c(1.5,1,1.5))+
      scale_linetype_manual(values = c("dashed","solid","dashed"))+
      scale_color_manual(values = c("red","darkblue","red"))+
      scale_fill_manual(values = "grey", name = "")
    

    enter image description here

    另一个解决方案是在最后一个日期添加图例作为标签。为此,您可以通过仅将第一个数据帧的最后一个日期子集来创建第二个数据帧:

    mydata_label <- mydata %>% filter(NewDate == max(NewDate)) %>% 
      pivot_longer(cols = Median:Lower, names_to = "Stat",values_to = "val")
    

    ggplot(mydata, aes(x = NewDate))+
      geom_ribbon(aes(ymin = Lower, ymax = Upper), alpha =0.5)+
      geom_line(aes(y = Median), color = "darkblue")+
      geom_line(aes(y = Maximum), color = "red", linetype = "dashed", size = 1.5)+
      geom_line(aes(y = Minimum), color ="red", linetype = "dashed", size = 1.5)+
      facet_wrap(~variable, scales = "free")+
      scale_x_date(date_labels = "%b", date_breaks = "month", name = "Month", limits = c(min(mydata$NewDate),max(mydata$NewDate)+25))+
      ylab("Daily Cumulative Precipitation (mm)")+
      geom_text(data = mydata_label, 
                aes(x = NewDate+5, y = val, label = Stat, color = Stat), size = 2, hjust = 0, show.legend = FALSE)+
      scale_color_manual(values = c("Median" = "darkblue","Maximum" = "red","Minimum" = "red","Upper" = "black", "Lower" = "black"))
    

    enter image description here

    由于空间问题,我特意缩小了文本标签的大小,以便您可以看到所有的文本标签。但根据你所附的数字,你应该有足够的空间让它发挥作用。