我目前正在构建一个原型系统,包括一个预测工作。如果我根据下表中的几行进行预测,我的代码工作正常,结果也很完美。
Several rows from the dataset
但是,当从我的数据集中预测一行数据时,如下表所示,
One row data
我得到了这个错误:ValueError:找到了样本数为0的数组(shape=(0,8)),而最小值为1是必需的。这意味着我不能仅仅基于一行来做预测,这是我工作的主要内容。
下面是我的代码:
def upload_file(request):
template='upload_file.html'
if request.method == 'GET':
return render(request, template)
CSV_file=request.FILES['csv_file']
if not CSV_file.name.endswith('.csv'):
messages.error(request, 'This is not a CSV file')
return HttpResponseRedirect(reverse('add_pull_requests'))
train=pd.read_csv(CSV_file)
features_col = ['Comments', 'LC_added', 'LC_deleted', 'Commits', 'Changed_files', 'Evaluation_time','First_status','Reputation']
class_label=['Label']
X = train[features_col] # This also test
y=train[class_label]
random_state = 0
# test_size=request.GET.get('test_size')
for train_index, test_index in loo.split(X):
X_train, X_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = y.iloc[train_index], y.iloc[test_index]
# X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=random_state, test_size=test_size)
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print('Up to here is ok')
try:
Accuracy="{0:.2f}%".format(accuracy_score(y_test, y_pred) * 100)
Precision="{0:.2f}%".format(metrics.precision_score(y_test, y_pred) * 100)
Recall="{0:.2f}%".format(metrics.recall_score(y_test, y_pred) * 100)
F1_meseaure="{0:.2f}%".format(2*metrics.precision_score(y_test,y_pred)*metrics.recall_score(y_test,y_pred)/(metrics.precision_score(y_test,y_pred)+metrics.recall_score(y_test,y_pred))*100)
except ZeroDivisionError:
print("Error: dividing by zero")
F1_meseaure='nan%'
print("Accuracy:",Accuracy )
print("Precision:", Precision)
print("Recall:", Recall)
print("F1-measure: ", F1_meseaure)
importances_feautres = pd.DataFrame({'features': features_col, 'importance': np.round(clf.feature_importances_, 3)})
importances_feautres = importances_feautres.sort_values('importance', ascending=False).set_index('features')
print(importances_feautres.shape)
importances_feautres = [ls[0] for ls in importances_feautres.values.tolist()]
classification_report={'accuracy':Accuracy, 'pricision':Precision, 'recall':Recall, 'f1_score':F1_meseaure}
importance_features={'importances_feautre':importances_feautres}
data={
'new_data':new_data,
'classification_report':classification_report,
'importance_feature':importance_features,
'features':features_col,
}
return render(request,template, data)
错误来自以下代码行:
for train_index, test_index in loo.split(X):
X_train, X_test = X.iloc[train_index], X.iloc[test_index]
y_train, y_test = y.iloc[train_index], y.iloc[test_index]
如果我用下面的行替换这些行,我会得到相同的错误:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=random_state, test_size=test_size)