代码之家  ›  专栏  ›  技术社区  ›  BogdanC

数据表-分组实验中j表达的有效评价

  •  2
  • BogdanC  · 技术社区  · 7 年前

    有没有一种方法可以使用/计算字符串表达式而不是 eval(parse(text = ...))
    据我所知, eval(parse()) i 以及 j by 在使用中。我曾经 verbose=TRUE GForce 在使用表达式时不使用,所以我想问题是如何同时使用两者 G力

    library(data.table)
    
    N = 10**5
    DT = data.table(x1 = sample(1:1000, size = N , replace = TRUE),
                    x2 = sample(1:500, size = N , replace = TRUE),
                    y1 = runif(N,1,10),
                    y2 = runif(N,0,1))
    
    system.time({
      DT_agg = DT[, .(sum_y1 = sum(y1)), by = c('x1','x2'), verbose=TRUE]
    })
    # Making each group and running j (GForce TRUE)
    # user  system elapsed 
    # 0.02    0.00    0.02 
    
    expr = "sum_y1 = sum(y1)"
    system.time({
      DT_agg = DT[, .(eval(parse(text = expr))), by = c('x1','x2'), verbose=TRUE]
    })
    # Making each group and running j (GForce FALSE)
    # user  system elapsed 
    # 27.72    0.00   28.11 
    
    1 回复  |  直到 7 年前
        1
  •  2
  •   Frank    7 年前

    问题是如何使用GForce和表达式。

    如果您(1)将整个j参数放在表达式中,并(2)事先对其进行解析,它将起作用:

    > expr2 = quote(.(sum_y1 = sum(y1)))
    > DT[, eval(expr2), by=c("x1", "x2"), verbose=TRUE]
    Detected that j uses these columns: y1 
    Finding groups using forderv ... 0.000sec 
    Finding group sizes from the positions (can be avoided to save RAM) ... 0.020sec 
    Getting back original order ... 0.000sec 
    lapply optimization is on, j unchanged as 'list(sum(y1))'
    GForce optimized j to 'list(gsum(y1))'
    Making each group and running j (GForce TRUE) ... 0.010sec 
            x1  x2   sum_y1
        1: 377 368 1.293758
        2: 233 276 1.613304
        3: 190  97 3.432189
        4: 200 373 3.573958
        5: 924 345 5.535074
       ---                 
    90538: 316 155 5.067798
    90539: 960 180 5.788466
    90540: 777 466 9.949981
    90541: 520  43 3.815545
    90542: 977 498 3.839360