你的
LDA
将数据集转换为一个功能,因为lda将
n_components > (n_classes - 1)
.
这里有两个班
=> 2 - 1 = 1 feature
.
请参考
LDA for two classes on Wikipedia
将中心数更改为
200
例如,你会看到不同之处
Xx, yy = make_blobs(40000, 600, centers=200, cluster_std=5)
X_train, X_test, y_train, y_test = train_test_split(Xx, yy, test_size=0.3)
model = LinearDiscriminantAnalysis(n_components=100)
model.fit(X_train, y_train)
X_train_new = model.transform(X_train)
print(X_train_new.shape)
>> (28000, 100)
使用
PCA
或
SVD
否则
from sklearn.decomposition import TruncatedSVD
svd = TruncatedSVD(n_components=100)
X_train_new = svd.fit_transform(X_train)
svd.explained_variance_ratio_.sum() # should be > 0.90
print(X_train_new.shape)
>>> (28000, 100)