代码之家 › 专栏 › 技术社区 › user8270077

在使用iris数据集的knn中,重加权距离返回与常规距离相同的结果

knn distance scikit-learn python

user8270077 · 技术社区 · 7 年前

我正在试验距离上的权重如何影响kNN算法的性能,对于一个可复制的示例,我正在使用iris数据集。

令我惊讶的是,加权2个预测值比其余2个预测值高100倍,生成了与未加权模型相同的预测。这一违反直觉的发现是什么?

我的代码如下:

X_original = iris['data']
Y = iris['target']

sc = StandardScaler() # Defines the parameters of the Scaler

X = sc.fit_transform(X_original)  # Transforms the original data to standardized data and returns them

from sklearn.model_selection import StratifiedShuffleSplit

sss = StratifiedShuffleSplit(n_splits = 1, train_size = 0.8, test_size = 0.2)

split = sss.split(X, Y)

s = list(split)

train_index = s[0][0]

test_index = s[0][1]

X_train = X[train_index, ] 

X_test = X[test_index, ] 

Y_train = Y[train_index] 

Y_test = Y[test_index] 

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors = 6)

iris_fit = knn.fit(X_train, Y_train)  # The data can be passed as numpy arrays or pandas dataframes/series.
                                                  # All the data should be numeric
                                                  # There should be no NaNs

predictions_w1 = knn.predict(X_test)

weights = np.array([1, 1, 100, 100])
weights =weights/np.sum(weights)

knn_w = KNeighborsClassifier(n_neighbors = 6, metric='wminkowski', p=2, 
                           metric_params={'w': weights})

iris_fit_w = knn_w.fit(X_train, Y_train)  # The data can be passed as numpy arrays or pandas dataframes/series.
                                                  # All the data should be numeric
                                                  # There should be no NaNs

predictions_w100 = knn_w.predict(X_test)

(predictions_w1 != predictions_w100).sum()
0

1 回复 | 直到 6 年前

Jan K 7 年前

它们并不总是相同的,将随机状态添加到您的列车测试分割中,您将看到不同值的变化。

 StratifiedShuffleSplit(n_splits = 1, train_size = 0.8, test_size = 0.2, random_state=3)

此外,在第3个(花瓣长度)和第4个(花瓣宽度)特征上具有这种极端权重的加权闵可夫斯基距离基本上会给出相同的结果,就像在这2个特征上只使用未加权的闵可夫斯基运行KNN一样。由于它们似乎信息量很大,因此与考虑所有4个特性的情况相比,得到的结果非常相似也就不足为奇了。请参见下面的wiki图片

推荐文章

Google User · Django管理员在`list_display中未显示`creation_date`字段`

4 月前

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

4 月前

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

4 月前

user29715306 · from_users=和chats=电视节目中的差异

4 月前

Redshoe · 当执行numpy.genfromtxt()时,python是否会读取文件的所有行?

5 月前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

5 月前

prayner · 更新嵌套字典包含列表中的项

5 月前

Bringo Jr · 我可以在O(n)中解决这个问题吗?

5 月前

Dave · 如何在for循环中修改列表值

5 月前

Shukurullox Komiljonov · 从记录中获得相互和解。使用SQL

5 月前