keras
我正在尝试对中可用的癌症数据集进行二元分类
scikit-learn
# load dataset
from sklearn import datasets
cancer = datasets.load_breast_cancer()
cancer.data
# dataset into pd.dataframe
import pandas as pd
donnee = pd.concat([pd.DataFrame(data = cancer.data, columns = cancer.feature_names),
pd.DataFrame(data = cancer.target, columns = ["target"])
], axis = 1)
# train/test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(donnee.loc[:, donnee.columns != "target"], donnee.target, test_size = 0.25, random_state = 1)
我在这里尝试遵循keras的教程:
https://keras.io/#getting-started-30-seconds-to-keras
问题是,我总是得到相同的损失值(
6.1316862406430541
),并且具有相同的精度(
0.61538461830232527
),因为预测总是1。
我不确定是不是因为代码错误:
-
X_train
是不是错了?
-
epochs
和/或
batch_size
.
-
如果我没有错的话,如果层没有偏差,所有1个预测都是可能的,我还不知道它们是如何初始化的
-
但也许是别的原因,也许只有一层就太少了?(如果是这样,我想知道为什么keras的教程只有一层……)
这是我的代码,如果你有任何想法:
import keras
from keras.models import Sequential
model = Sequential()
from keras.layers import Dense
model.add(Dense(units=64, activation='relu', input_dim=30))
model.add(Dense(units=1, activation='sigmoid'))
model.summary()
model.compile(loss = keras.losses.binary_crossentropy,
optimizer = 'rmsprop',
metrics=['accuracy']
)
model.fit(X_train.as_matrix(), y_train.as_matrix().reshape(426, -1), epochs=5, batch_size=32)
loss_and_metrics = model.evaluate(X_test.as_matrix(), y_test.as_matrix(), batch_size=128)
loss_and_metrics
classes = model.predict(X_test.as_matrix(), batch_size=128)
classes