你必须改变你的工厂化。
注意,来自
factorise
,它将:
将字符串转换为数据集中的因子。
事实上,即使对于数据集,这一行也是一种过度使用
iris
。请注意,当您加载虹膜并检查结构时(
str(iris)
).
Species
已是因子。对于数据集,不能这样说
poker
因此,必须考虑另一种方法。根据评论,
因子分解
将无法工作:
poker$Class <- as.factor(poker$Class)
这就是你要找的。
如果出于任何原因不愿意更改数据集的名称,则应如下所示:
iris$Class <- as.factor(iris$Class) #insert this where your current factorise call is
至于
因子分解
不按预期工作。考虑以下示例:
poker <- read.csv("Poker.csv", sep= ",")
all.equal(poker,factorise(poker))
#[1] TRUE
#VS
poker2 <- poker
poker2$Class <- as.factor(poker2$Class)
all.equal(poker,poker2)
#[1] "Component âClassâ: Attributes: < target is NULL, current is list >"
#[2] "Component âClassâ: target is numeric, current is factor"
与此完整脚本相比(我将大多数/所有名称从irisX转换为pokerX,请记住这一点):
hdt <- HoeffdingTree(numericEstimator = "GaussianNumericAttributeClassObserver")
poker <- read.csv("Poker.csv", sep= ",")
poker$Class <- as.factor(poker$Class)
pokerdatastream <- datastream_dataframe(data=poker)
trainset <- pokerdatastream$get_points(pokerdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))
trainset <- datastream_dataframe(data=trainset)
hdtreetrained <- trainMOA(model = hdt,
Class ~ .,
data = trainset)
testset <- pokerdatastream$get_points(pokerdatastream, n = 10, outofpoints = c("stop", "warn", "ignore"))
scores <- predict(hdtreetrained,
newdata=testset[, colnames(testset[1:11])],
type="response")
str(scores)
#chr [1:10] "8" "8" "8" "8" "8" "8" "8" "8" "8" "8"
#also switched this line as per the comments, even though it's edited in the OP now
table(scores, testset$Class)
#
#scores 0 1 2 3 4 5 6 7 8 9
# 8 6 3 0 0 1 0 0 0 0 0
scores <- predict(hdtreetrained, newdata=testset, type="response")
head(scores)
#[1] "8" "8" "8" "8" "8" "8"