代码之家 › 专栏 › 技术社区 › annabednarska

RMOA包错误

predict r

annabednarska · 技术社区 · 9 年前

我开始使用RMOA包,我遇到了一个问题…Iris数据集的第一个代码有效…UCI扑克数据集的第二个代码在预测函数中抛出了“尝试应用非函数”错误。我检查了数据集是否正确读取,并且似乎正常。这里怎么了? 提前向你求助。

它起作用:

## Hoeffdingtree
hdt <- HoeffdingTree(numericEstimator = "GaussianNumericAttributeClassObserver")

data(iris)
iris <- factorise(iris)
irisdatastream <- datastream_dataframe(data=iris)

trainset <- irisdatastream$get_points(irisdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))
trainset <- datastream_dataframe(data=trainset)

hdtreetrained <- trainMOA(model = hdt,
                          Species ~ .,
                          data = trainset)

testset <- irisdatastream$get_points(irisdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))

scores <- predict(hdtreetrained,
                  newdata=testset[, c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")],
                  type="response")
str(scores)
table(scores, testset$Species)
scores <- predict(hdtreetrained, newdata=testset, type="response")
head(scores)

它不起作用:

## Hoeffdingtree
hdt <- HoeffdingTree(numericEstimator = "GaussianNumericAttributeClassObserver")

iris <- read.csv("Poker.csv", sep= ",")
iris <- factorise(iris)
irisdatastream <- datastream_dataframe(data=iris)

trainset <- irisdatastream$get_points(irisdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))
trainset <- datastream_dataframe(data=trainset)

hdtreetrained <- trainMOA(model = hdt,
                          Class ~ .,
                          data = trainset)

testset <- irisdatastream$get_points(irisdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))

scores <- predict(hdtreetrained,
                  newdata=testset[, c("S1","C1","S2","C2","S3","C3","S4","C4","S5","C5")],
                  type="response")
str(scores)
table(scores, testset$Class)
scores <- predict(hdtreetrained, newdata=testset, type="response")
head(scores)

1 回复 | 直到 9 年前

erasmortg 9 年前

你必须改变你的工厂化。注意,来自 factorise ,它将:

将字符串转换为数据集中的因子。

事实上,即使对于数据集,这一行也是一种过度使用 iris 。请注意,当您加载虹膜并检查结构时( str(iris) ). Species 已是因子。对于数据集,不能这样说 poker 因此,必须考虑另一种方法。根据评论, 因子分解 将无法工作:

poker$Class <- as.factor(poker$Class)

这就是你要找的。

如果出于任何原因不愿意更改数据集的名称,则应如下所示:

iris$Class <- as.factor(iris$Class) #insert this where your current factorise call is

至于 因子分解 不按预期工作。考虑以下示例:

poker <- read.csv("Poker.csv", sep= ",")
all.equal(poker,factorise(poker))
#[1] TRUE
#VS
poker2 <- poker
poker2$Class <- as.factor(poker2$Class)
all.equal(poker,poker2)
#[1] "Component âClassâ: Attributes: < target is NULL, current is list >"
#[2] "Component âClassâ: target is numeric, current is factor"

与此完整脚本相比(我将大多数/所有名称从irisX转换为pokerX,请记住这一点):

hdt <- HoeffdingTree(numericEstimator = "GaussianNumericAttributeClassObserver")

poker <- read.csv("Poker.csv", sep= ",")
poker$Class <- as.factor(poker$Class)
pokerdatastream <- datastream_dataframe(data=poker)

trainset <- pokerdatastream$get_points(pokerdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))
trainset <- datastream_dataframe(data=trainset)

hdtreetrained <- trainMOA(model = hdt,
                          Class ~ .,
                          data = trainset)

testset <- pokerdatastream$get_points(pokerdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))

scores <- predict(hdtreetrained,
                  newdata=testset[, colnames(testset[1:11])],
                  type="response")
str(scores)
#chr [1:10] "8" "8" "8" "8" "8" "8" "8" "8" "8" "8"
#also switched this line as per the comments, even though it's edited in the OP now
table(scores, testset$Class)
#      
#scores 0 1 2 3 4 5 6 7 8 9
#     8 6 3 0 0 1 0 0 0 0 0
scores <- predict(hdtreetrained, newdata=testset, type="response")
head(scores)
#[1] "8" "8" "8" "8" "8" "8"