代码之家  ›  专栏  ›  技术社区  ›  Derek Corcoran

Lubridate无法正确分析包含工作日/月/日/年的日期

  •  0
  • Derek Corcoran  · 技术社区  · 6 年前

    问题

    我从一个网站下载了一个数据库,大冶专栏的格式如下:

    x <- c("Fri, Mar 1, 2019", "Sat, Mar 2, 2019", "Sun, Mar 3, 2019", "Mon, Mar 4, 2019", "Tue, Mar 5, 2019", "Wed, Mar 6, 2019", "Thu, Mar 7, 2019", "Fri, Mar 8, 2019", "Sat, Mar 9, 2019", "Sun, Mar 10, 2019", "Mon, Mar 11, 2019", "Tue, Mar 12, 2019", "Wed, Mar 13, 2019", "Thu, Mar 14, 2019", "Fri, Mar 15, 2019", "Sat, Mar 16, 2019", "Sun, Mar 17, 2019", "Mon, Mar 18, 2019", "Tue, Mar 19, 2019", "Wed, Mar 20, 2019", "Thu, Mar 21, 2019", "Fri, Mar 22, 2019", "Sat, Mar 23, 2019", "Sun, Mar 24, 2019", "Mon, Mar 25, 2019",  "Tue, Mar 26, 2019", "Wed, Mar 27, 2019", "Thu, Mar 28, 2019", "Fri, Mar 29, 2019", "Sat, Mar 30, 2019", "Sun, Mar 31, 2019")
    

    它包含从三月一日到三十一日的日期。我想把它转换成日期格式,所以我用了y ,dy 润滑功能:

    library("lubridate")
    mdy(x)
    

    导致以下向量:

     [1] "2019-03-01" "2019-03-02" "2019-03-20" "2019-04-20" "2019-05-20" "2019-03-06"
     [7] "2019-03-07" "2019-03-08" "2019-03-09" "2019-10-20" "2019-11-20" "2019-12-20"
    [13] "2019-03-13" "2019-03-14" "2019-03-15" "2019-03-16" "2019-03-17" "2019-03-18"
    [19] "2019-03-19" "2019-03-20" "2019-03-21" "2019-03-22" "2019-03-23" "2019-03-24"
    [25] "2019-03-25" "2019-03-26" "2019-03-27" "2019-03-28" "2019-03-29" "2019-03-30"
    [31] "2019-03-31"
    

    正如你所看到的,大多数日期都是正确的,但它不适用于每月的第4、5、10、11和12天,在那里它会像读月份一样读当天。我一直在尝试几种解决办法,但到目前为止都没有奏效

    一些可能的解决方案没有奏效

    使用regex从字符向量中移除工作日:

    我认为解决这个问题的一种方法是删除字符串的平日部分,所以我尝试删除逗号之前的所有内容,但我做得不完美:

    library(stringr)
    y <- str_extract(Dt,",.*$")
    y 
     [1] ", Mar 1, 2019"  ", Mar 2, 2019"  ", Mar 3, 2019"  ", Mar 4, 2019" 
     [5] ", Mar 5, 2019"  ", Mar 6, 2019"  ", Mar 7, 2019"  ", Mar 8, 2019" 
     [9] ", Mar 9, 2019"  ", Mar 10, 2019" ", Mar 11, 2019" ", Mar 12, 2019"
     [13] ", Mar 13, 2019" ", Mar 14, 2019" ", Mar 15, 2019" ", Mar 16, 2019"
     [17] ", Mar 17, 2019" ", Mar 18, 2019" ", Mar 19, 2019" ", Mar 20, 2019"
     [21] ", Mar 21, 2019" ", Mar 22, 2019" ", Mar 23, 2019" ", Mar 24, 2019"
     [25] ", Mar 25, 2019" ", Mar 26, 2019" ", Mar 27, 2019" ", Mar 28, 2019"
     [29] ", Mar 29, 2019" ", Mar 30, 2019" ", Mar 31, 2019"
    

    但现在当我使用 mdy 头12天我都搞错了。

    mdy(y)
    
    [1] "2019-01-20" "2019-02-20" "2019-03-20" "2019-04-20" "2019-05-20" "2019-06-20"
    [7] "2019-07-20" "2019-08-20" "2019-09-20" "2019-10-20" "2019-11-20" "2019-12-20"
    [13] "2019-03-13" "2019-03-14" "2019-03-15" "2019-03-16" "2019-03-17" "2019-03-18"
    [19] "2019-03-19" "2019-03-20" "2019-03-21" "2019-03-22" "2019-03-23" "2019-03-24"
    [25] "2019-03-25" "2019-03-26" "2019-03-27" "2019-03-28" "2019-03-29" "2019-03-30"
    [31] "2019-03-31"
    

    会话信息

    我按要求添加了SessionInfo

    R version 3.4.4 (2018-03-15) 
    Platform: x86_64-pc-linux-gnu (64-bit)
    Running under: Ubuntu 16.04.5 LTS
    
    Matrix products: default
    BLAS: /usr/lib/libblas/libblas.so.3.6.0
    LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
    
    locale:
     [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
     [3] LC_TIME=es_CL.UTF-8        LC_COLLATE=en_US.UTF-8    
     [5] LC_MONETARY=es_CL.UTF-8    LC_MESSAGES=en_US.UTF-8   
     [7] LC_PAPER=es_CL.UTF-8       LC_NAME=C                 
     [9] LC_ADDRESS=C               LC_TELEPHONE=C            
    [11] LC_MEASUREMENT=es_CL.UTF-8 LC_IDENTIFICATION=C       
    
    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base     
    
    other attached packages:
    [1] stringr_1.3.1   dplyr_0.7.6     rvest_0.3.2     xml2_1.2.0      XML_3.98-1.16  
    [6] lubridate_1.7.4
    
    loaded via a namespace (and not attached):
     [1] Rcpp_0.12.18     rstudioapi_0.7   knitr_1.20       bindr_0.1.1     
     [5] magrittr_1.5     tidyselect_0.2.4 R6_2.2.2         rlang_0.2.2     
     [9] httr_1.3.1       tools_3.4.4      pacman_0.4.6     selectr_0.4-1    
     [13] htmltools_0.3.6  yaml_2.2.0       rprojroot_1.3-2  digest_0.6.17   
     [17] assertthat_0.2.0 tibble_1.4.2     crayon_1.3.4     bindrcpp_0.2.2    
     [21] purrr_0.2.5      curl_3.2         glue_1.3.0       evaluate_0.11    
     [25] rmarkdown_1.10   stringi_1.2.4    pillar_1.3.0     compiler_3.4.4  
     [29] backports_1.1.2  pkgconfig_2.0.2 
    
    1 回复  |  直到 6 年前
        1
  •  2
  •   Derek Corcoran    6 年前

    正如@duckmayr认为这是一个区域设置问题一样,如上所示,在我的sessioninfo中,我的区域设置如下:

    locale:
     [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
     [3] LC_TIME=es_CL.UTF-8        LC_COLLATE=en_US.UTF-8    
     [5] LC_MONETARY=es_CL.UTF-8    LC_MESSAGES=en_US.UTF-8   
     [7] LC_PAPER=es_CL.UTF-8       LC_NAME=C                 
     [9] LC_ADDRESS=C               LC_TELEPHONE=C            
    [11] LC_MEASUREMENT=es_CL.UTF-8 LC_IDENTIFICATION=C  
    

    当我把LC\u TIME改成en\u US.UTF-8时,一切都被修复了,当我这么做的时候:

    Sys.setlocale("LC_TIME", 'en_US.UTF-8')
    

    然后使用 mdy 效果不错。希望这对以后有类似问题的人有所帮助