代码之家  ›  专栏  ›  技术社区  ›  rmahesh

使用因子和数值预测值执行套索正则化?

  •  0
  • rmahesh  · 技术社区  · 6 年前

    我有一个数据集,我想执行套索特征消除。由于我是R的新手,所以我目前正在R的在线指导下学习。数据存储在一个数据帧中。目标已从数据帧中删除,并存储在它自己的单列数据帧中。这是一个回归问题,目标是数字。下面是我尝试运行的代码:

    library(glmnet)
    
    lasso_model <- cv.glmnet(
                      x = as.matrix(train),
                      y = train_target,
                      alpha = 1)
    

    以下是有关数据集的信息:

    'data.frame':   9798 obs. of  55 variables:
    $ acres: num  0.186 2.991 0.144 0.218 0.173 ...
    $ above: int  1754 3030 1531 834 1022 1528 768 1184 2026 3176 ...
    $ basement: int  0 1811 500 440 0 476 0 0 732 0 ...
    $ baths: Factor w/ 7 levels "0","1","2","3",..: 3 4 3 3 2 3 2 2 3 3 ...
    $ toilets: Factor w/ 5 levels "0","1","2","3",..: 1 3 2 1 1 2 1 1 2 2    ...
    $ fireplaces: Factor w/ 6 levels "0","1","2","3",..: 2 2 2 2 1 1 1 2 2  2 ...
    $ beds: Factor w/ 7 levels "1","2","3","4",..: 4 5 2 2 2 3 2 2 3 5 ...
    $ rooms: Factor w/ 15 levels "0","1","2","3",..: 5 5 5 4 5 3 3 3 4 6 ...
    $ age: int  103 17 13 46 116 12 93 93 42 100 ...
    $ yearsfromsale: Factor w/ 3 levels "2","3","4": 2 2 2 1 2 2 3 3 1 1 ...
    $ car: Factor w/ 4 levels "0","1","2","3": 1 4 3 1 1 3 1 1 4 1 ...
    $ city_DES.MOINES: Factor w/ 2 levels "0","1": 2 1 1 2 2 1 2 2 2 2 ...
    $ city_JOHNSTON: Factor w/ 2 levels "0","1": 1 2 2 1 1 1 1 1 1 1 ...
    $ city_WEST.DES.MOINES: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ city_CLIVE: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ city_URBANDALE: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ city_ALTOONA: Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 1 1 1 ...
    $ city_BONDURANT: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ city_CROCKER.TWNSHP: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ city_GRIMES: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ city_POLK.CITY: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ city_PLEASANT.HILL: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ city_WINDSOR.HEIGHTS: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ zip_50315: Factor w/ 2 levels "0","1": 1 1 1 2 1 1 1 1 1 1 ...
    $ zip_50321: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 2 1 ...
    $ zip_50320: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ zip_50312: Factor w/ 2 levels "0","1": 2 1 1 1 1 1 1 1 1 2 ...
    $ zip_50314: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ zip_50311: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ zip_50309: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ zip_50316: Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 2 1 1 ...
    $ zip_50317: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ zip_50313: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 1 1 1 ...
    $ zip_50310: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ zip_50322: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ zip_50131: Factor w/ 2 levels "0","1": 1 1 2 1 1 1 1 1 1 1 ...
    $ zip_50111: Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
    $ zip_50265: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ zip_50266: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ zip_50325: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ zip_50323: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ zip_50009: Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 1 1 1 ...
    $ zip_50035: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ zip_50023: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ zip_50226: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ zip_50021: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ zip_50327: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ zip_50324: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
    $ walkout_0: Factor w/ 2 levels "0","1": 2 1 2 2 2 1 2 2 2 2 ...
    $ walkout_1: Factor w/ 2 levels "0","1": 1 2 1 1 1 2 1 1 1 1 ...
    $ condition_Normal: Factor w/ 2 levels "0","1": 1 2 2 1 1 2 1 1 1 1 ...
    $ condition_Above.Normal: Factor w/ 2 levels "0","1": 2 1 1 2 2 1 2 1 1 2 ...
    $ condition_Below.Normal: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 2 1 ...
    $ AC_1: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 1 ...
    

    在试着运行 lasso_model 行,这是我得到的错误:

    Error in cbind2(1, newx) %*% nbeta : 
    invalid class 'NA' to  dup_mMatrix_as_dgeMatrix
    

    1 回复  |  直到 6 年前
        1
  •  1
  •   liori    6 年前

    好吧,这是一个强烈的怀疑。

    as.matrix 将它们转换为字符串,而不是数字,然后 glmnet 不知道怎么处理它们:

    > df <- data.frame(a=as.factor(c('0', '1', '2')), b=as.factor(c('0', '0', '1')))
    > df
      a b
    1 0 0
    2 1 0
    3 2 1
    > as.matrix(df)
         a   b  
    [1,] "0" "0"
    [2,] "1" "0"
    [3,] "2" "1"
    

    尝试显式地将它们转换回数字(有点迂回,但应该可行):

    > as.matrix(data.frame(lapply(df, function(x) as.numeric(as.character(x)))))
         a b
    [1,] 0 0
    [2,] 1 0
    [3,] 2 1