代码之家  ›  专栏  ›  技术社区  ›  Prasanna Nandakumar

正态分布范围内列的概率

  •  3
  • Prasanna Nandakumar  · 技术社区  · 7 年前

    我想换一个新专栏 duration_probablity 得到了一个值在6到12小时之间的概率。 P(6 < Origin_Duration ≤ 12)

     dput(df)
    structure(list(CRD_NUM = c(1000120005478330, 1000130009109199, 
    1000140001635234, 1000140002374747, 1000140003618308, 1000140007236959, 
    1000140015078086, 1000140026268650, 1000140027281272, 1000148000012215
    ), Origin_Duration = c("10:48:38", "07:41:34", "11:16:41", "09:19:35", 
    "17:09:19", "08:59:05", "11:27:28", "12:17:41", "10:45:42", "12:19:05"
    )), .Names = c("CRD_NUM", "Origin_Duration"), class = c("data.table", 
    "data.frame"), row.names = c(NA, -10L))
    
                CRD_NUM Origin_Duration
     1: 1000120005478330        10:48:38
     2: 1000130009109199        07:41:34
     3: 1000140001635234        11:16:41
     4: 1000140002374747        09:19:35
     5: 1000140003618308        17:09:19
     6: 1000140007236959        08:59:05
     7: 1000140015078086        11:27:28
     8: 1000140026268650        12:17:41
     9: 1000140027281272        10:45:42
    10: 1000148000012215        12:19:05
    

    例如,对于持续时间11:16:41,输出为0.96

    我的CDF应该是这样的- P(6 <X≤ 12) = Φ((12−μ)/σ)−Φ((6−μ)/σ)

    1 回复  |  直到 7 年前
        1
  •  6
  •   KenHBS    7 年前

    从你的问题来看,不清楚你是否已经知道均值和方差。我将讨论这两种情况。此外,我假设你们有理由相信持续时间实际上是正态分布的。

    如果给定了预先指定的均值和方差。说 mu = 11 sigma = 3 P(6 < X ≤ 12) = P(X ≤ 12) - P(X ≤ 6) . 基R函数 pnorm() 能够计算:

    mu    <- 11
    sigma <- 3
    pnorm(12, mu, sigma) - pnorm(6, mu, sigma)
    # 0.5827683
    

    P(6 < X < 12) student t-distribution df$Origin_Duration 从字符到某个时间类型:

    df$Origin_Duration <- as.POSIXct(df$Origin_Duration, format = "%H:%M:%S")
    
    mu          <- mean(df$Origin_Duration)       # "2017-09-04 11:12:28 CEST"
    df$demeaned <- df$Origin_Duration - mu
    sigma       <- var(df$demeaned)^0.5           # 153.68 
    

    我们将使用 pt 计算概率的函数 P(X ≤ 12) - P(X ≤ 6) . 为此,我们需要一个标准化/比例化/标准化版本的 12 6 . 也就是说,我们必须减去平均值,然后除以标准差:

    x6  <- as.numeric(difftime("2017-09-04 06:00:00", mu), unit = "mins")/sigma
    x12 <- as.numeric(difftime("2017-09-04 12:00:00", mu), unit = "mins")/sigma
    
    deg_fr <- length(df$demeaned)-1
    
    p_x_smaller_than12 <- pt( x12, df = deg_fr )    #  0.6178973
    p_x_smaller_than6  <- pt( x6,  df = deg_fr )    #  0.03627651
    p_x_smaller_than12 - p_x_smaller_than6
    # [1] 0.5816208
    

    在回复注释时添加:未知参数,所有条目:

    # scale gives the distance from the mean in terms of standard deviations:
    df$scaled <- scale(df$Origin_Duration)
    
    pt(df$scaled, df = deg_fr)
    # [1,] 0.4400575
    # [2,] 0.1015886
    # [3,] 0.5106114
    # [4,] 0.2406431
    # [5,] 0.9773264
    # [6,] 0.2039751
    # [7,] 0.5377728
    # [8,] 0.6593331
    # [9,] 0.4327620
    # [10,] 0.6625280