我将训练(X)和测试数据(test\u data\u process)设置为相同的列和顺序,如下所示:
但当我这么做的时候
predictions = my_model.predict(test_data_process)
它给出以下错误:
输入数据中需要f22、f25、f0、f34、f32、f5、f20、f3、f33、f15、f24、f31、f28、f9、f8、f19、f14、f18、f17、f2、f13、f4、f27、f16、f1、f29、f11、f26、f10、f7、f21、f30、f23、f6、f12
因此它抱怨训练数据(X)没有这些字段,而它有。
如何解决这个问题?
X = data.select_dtypes(exclude=['object']).drop(columns=['Id'])
X['YrMoSold'] = X['YrSold'] * 12 + X['MoSold']
X = X.drop(columns=['YrSold', 'MoSold', 'SalePrice'])
X = X.fillna(0.0000001)
train_X, val_X, train_y, val_y = train_test_split(X.values, y.values, test_size=0.2)
my_model = XGBRegressor(n_estimators=100, learning_rate=0.05, booster='gbtree')
my_model.fit(train_X, train_y, early_stopping_rounds=5,
eval_set=[(val_X, val_y)], verbose=False)
test_data_process = test_data.select_dtypes(exclude=['object']).drop(columns=['Id'])
test_data_process['YrMoSold'] = test_data_process['YrSold'] * 12 + test_data['MoSold']
test_data_process = test_data_process.drop(columns=['YrSold', 'MoSold'])
test_data_process = test_data_process.fillna(0.0000001)
test_data_process = test_data_process[X.columns]
predictions = my_model.predict(test_data_process)