代码之家  ›  专栏  ›  技术社区  ›  annabednarska

RMOA包错误

  •  1
  • annabednarska  · 技术社区  · 9 年前

    我开始使用RMOA包,我遇到了一个问题…Iris数据集的第一个代码有效…UCI扑克数据集的第二个代码在预测函数中抛出了“尝试应用非函数”错误。我检查了数据集是否正确读取,并且似乎正常。 这里怎么了? 提前向你求助。

    它起作用:

    ## Hoeffdingtree
    hdt <- HoeffdingTree(numericEstimator = "GaussianNumericAttributeClassObserver")
    
    data(iris)
    iris <- factorise(iris)
    irisdatastream <- datastream_dataframe(data=iris)
    
    trainset <- irisdatastream$get_points(irisdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))
    trainset <- datastream_dataframe(data=trainset)
    
    hdtreetrained <- trainMOA(model = hdt,
                              Species ~ .,
                              data = trainset)
    
    testset <- irisdatastream$get_points(irisdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))
    
    scores <- predict(hdtreetrained,
                      newdata=testset[, c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")],
                      type="response")
    str(scores)
    table(scores, testset$Species)
    scores <- predict(hdtreetrained, newdata=testset, type="response")
    head(scores)
    

    它不起作用:

    ## Hoeffdingtree
    hdt <- HoeffdingTree(numericEstimator = "GaussianNumericAttributeClassObserver")
    
    iris <- read.csv("Poker.csv", sep= ",")
    iris <- factorise(iris)
    irisdatastream <- datastream_dataframe(data=iris)
    
    trainset <- irisdatastream$get_points(irisdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))
    trainset <- datastream_dataframe(data=trainset)
    
    hdtreetrained <- trainMOA(model = hdt,
                              Class ~ .,
                              data = trainset)
    
    testset <- irisdatastream$get_points(irisdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))
    
    scores <- predict(hdtreetrained,
                      newdata=testset[, c("S1","C1","S2","C2","S3","C3","S4","C4","S5","C5")],
                      type="response")
    str(scores)
    table(scores, testset$Class)
    scores <- predict(hdtreetrained, newdata=testset, type="response")
    head(scores)
    
    1 回复  |  直到 9 年前
        1
  •  3
  •   erasmortg    9 年前

    你必须改变你的工厂化。 注意,来自 factorise ,它将:

    将字符串转换为数据集中的因子。

    事实上,即使对于数据集,这一行也是一种过度使用 iris 。请注意,当您加载虹膜并检查结构时( str(iris) ). Species 已是因子。对于数据集,不能这样说 poker 因此,必须考虑另一种方法。根据评论, 因子分解 将无法工作:

    poker$Class <- as.factor(poker$Class)
    

    这就是你要找的。

    如果出于任何原因不愿意更改数据集的名称,则应如下所示:

    iris$Class <- as.factor(iris$Class) #insert this where your current factorise call is
    

    至于 因子分解 不按预期工作。考虑以下示例:

    poker <- read.csv("Poker.csv", sep= ",")
    all.equal(poker,factorise(poker))
    #[1] TRUE
    #VS
    poker2 <- poker
    poker2$Class <- as.factor(poker2$Class)
    all.equal(poker,poker2)
    #[1] "Component “Class”: Attributes: < target is NULL, current is list >"
    #[2] "Component “Class”: target is numeric, current is factor"   
    

    与此完整脚本相比(我将大多数/所有名称从irisX转换为pokerX,请记住这一点):

    hdt <- HoeffdingTree(numericEstimator = "GaussianNumericAttributeClassObserver")
    
    poker <- read.csv("Poker.csv", sep= ",")
    poker$Class <- as.factor(poker$Class)
    pokerdatastream <- datastream_dataframe(data=poker)
    
    trainset <- pokerdatastream$get_points(pokerdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))
    trainset <- datastream_dataframe(data=trainset)
    
    hdtreetrained <- trainMOA(model = hdt,
                              Class ~ .,
                              data = trainset)
    
    testset <- pokerdatastream$get_points(pokerdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))
    
    scores <- predict(hdtreetrained,
                      newdata=testset[, colnames(testset[1:11])],
                      type="response")
    str(scores)
    #chr [1:10] "8" "8" "8" "8" "8" "8" "8" "8" "8" "8"
    #also switched this line as per the comments, even though it's edited in the OP now
    table(scores, testset$Class)
    #      
    #scores 0 1 2 3 4 5 6 7 8 9
    #     8 6 3 0 0 1 0 0 0 0 0
    scores <- predict(hdtreetrained, newdata=testset, type="response")
    head(scores)
    #[1] "8" "8" "8" "8" "8" "8"