代码之家  ›  专栏  ›  技术社区  ›  Benjamin Cintix

用栅格包进行随机森林分类的R问题

  •  4
  • Benjamin Cintix  · 技术社区  · 14 年前

    我对randomForest和光栅软件包有问题。首先,我创建分类器:

    library(raster)
    library(randomForest)
    
    # Set some user variables
    fn = "image.pix"
    outraster = "classified.pix"
    training_band = 2
    validation_band = 1
    original_classes = c(125,126,136,137,151,152,159,170)
    reclassd_classes = c(122,122,136,137,150,150,150,170)
    
    # Get the training data
    myraster = stack(fn)
    training_class = subset(myraster, training_band)
    
    # Reclass the training data classes as required
    training_class = subs(training_class, data.frame(original_classes,reclassd_classes))
    
    # Find pixels that have training data and prepare the data used to create the classifier
    is_training = Which(training_class != 0, cells=TRUE)
    training_predictors = extract(myraster, is_training)[,3:nlayers(myraster)]
    training_response = as.factor(extract(training_class, is_training))
    remove(is_training)
    
    # Create and save the forest, use odd number of trees to avoid breaking ties at random
    r_tree = randomForest(training_predictors, y=training_response, ntree = 201, keep.forest=TRUE) # Runs out of memory, does not allow more trees than this...
    remove(training_predictors, training_response)
    

    到目前为止,一切都很好。通过查看错误率、混淆矩阵等,我可以看出林是正确创建的。但是,当我尝试对一些数据进行分类时,遇到了以下问题,它返回了 predictions :

    # Classify the whole image
    predictor_data = subset(myraster, 3:nlayers(myraster))
    layerNames(predictor_data) = layerNames(myraster)[3:nlayers(myraster)]
    predictions = predict(predictor_data, r_tree, type='response', progress='text')
    

    并发出警告:

    Warning messages:
    1: In `[<-.factor`(`*tmp*`, , value = c(1, 1, 1, 1, 1, 1,  ... :
      invalid factor level, NAs generated
    (keeps going like this)...
    

    但是,调用predict.randomForest可以直接正常工作并返回预期的 预言 (这对我来说不是一个好的选择,因为图像很大,我无法将整个矩阵存储在内存中):

    # Classify the whole image and write it to file
    predictor_data = subset(myraster, 3:nlayers(myraster))
    layerNames(predictor_data) = layerNames(myraster)[3:nlayers(myraster)]
    predictor_data = extract(predictor_data, extent(predictor_data))
    predictions = predict(r_tree, newdata=predictor_data)
    

    如何使它直接与“光栅”版本一起工作? 我知道这是可能的,如 predict{raster} .

    1 回复  |  直到 14 年前
        1
  •  0
  •   Adam Smith    14 年前

    您可以尝试在write raster函数中嵌套predict.randomForest,并根据光栅包中包含的pdf将矩阵作为光栅分块写入。在此之前,请在光栅函数中调用predict时尝试参数“na.rm=TRUE”。您还可以在predict raster中将虚拟值分配给NAs,然后使用raster包中的函数将其重写为NAs。

    至于调用RFs时的内存问题,我在处理brt时遇到了大量内存问题。它们在磁盘和内存上都是巨大的!(模型是否应该比数据更复杂?)我没有让它们在32位计算机(WinXp或Linux)上可靠运行。有时,调整应用程序的Windows内存分配有帮助,而迁移到Linux有更多帮助,但我从64位Windows或Linux机器中获得最多,因为它们对应用程序可以占用的内存量施加了更高(或没有)的限制。通过这样做,您可以增加可以使用的树的数量。