你得到了
0
对于所有结果,因为根据{DALEX}的模型类型是
"multiclass"
.
如果类型为
"classification"
.
knn_exp$model_info$type
#> [1] "multiclass"
这意味着发生的预测将是预测的概率(这里我们得到1和0,因为建模非常过拟合)
predicted <- knn_exp$predict_function(knn_exp$model, newdata = df_train)
predicted
#> setosa versicolor virginica
#> [1,] 1 0 0
#> [2,] 1 0 0
#> [3,] 1 0 0
#> [4,] 1 0 0
#> [5,] 1 0 0
#> [6,] 1 0 0
#> ...
当您使用
loss_accuracy()
作为损失函数,它通过使用以下计算来实现
loss_accuracy
#> function (observed, predicted, na.rm = TRUE)
#> mean(observed == predicted, na.rm = na.rm)
#> <bytecode: 0x159276bb8>
#> <environment: namespace:DALEX>
#> attr(,"loss_name")
#> [1] "Accuracy"
如果我们一步一步地进行计算,我们就能明白为什么这会成为一个问题。首先,我们定义
observed
作为结果因素
observed <- df_train$Species
observed
#> [1] setosa setosa setosa setosa setosa setosa
#> [7] setosa setosa setosa setosa setosa setosa
#> [13] setosa setosa setosa setosa setosa setosa
#> [19] setosa setosa setosa setosa setosa setosa
#> [25] setosa setosa setosa setosa setosa setosa
#> [31] setosa setosa setosa setosa setosa setosa
#> [37] setosa setosa setosa setosa versicolor versicolor
#> [43] versicolor versicolor versicolor versicolor versicolor versicolor
#> [49] versicolor versicolor versicolor versicolor versicolor versicolor
#> [55] versicolor versicolor versicolor versicolor versicolor versicolor
#> [61] versicolor versicolor versicolor versicolor versicolor versicolor
#> [67] versicolor versicolor versicolor versicolor versicolor versicolor
#> [73] versicolor versicolor versicolor versicolor versicolor versicolor
#> [79] versicolor versicolor virginica virginica virginica virginica
#> [85] virginica virginica virginica virginica virginica virginica
#> [91] virginica virginica virginica virginica virginica virginica
#> [97] virginica virginica virginica virginica virginica virginica
#> [103] virginica virginica virginica virginica virginica virginica
#> [109] virginica virginica virginica virginica virginica virginica
#> [115] virginica virginica virginica virginica virginica virginica
#> Levels: setosa versicolor virginica
自从
观察
是因子向量,并且
predicted
是一个数字矩阵,我们得到的是一个逻辑矩阵
FALSE
因为这些值从不相同。
head(observed == predicted)
#> setosa versicolor virginica
#> [1,] FALSE FALSE FALSE
#> [2,] FALSE FALSE FALSE
#> [3,] FALSE FALSE FALSE
#> [4,] FALSE FALSE FALSE
#> [5,] FALSE FALSE FALSE
#> [6,] FALSE FALSE FALSE
所以当我们取平均值时,我们得到了预期
0
.
mean(observed == predicted)
#> [1] 0