代码之家 › 专栏 › 技术社区 › ketchup

ValueError:在easy-mashine学习样本中发现样本数不一致的输入变量:[915229]

artificial-intelligence machine-learning pandas python

ketchup · 技术社区 · 1 年前

import pandas as pd
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

df=pd.read_csv('https://raw.githubusercontent.com/dataprofessor/data/master/delaney_solubility_with_descriptors.csv')

X = df.drop('logS', axis=1)
y = df['logS']

X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

lr = LinearRegression()
lr.fit(X_train, y_train)

y_lr_train_pred = lr.predict(X_train)
y_lr_test_pred = lr.predict(X_test)

lr_train_mse = mean_squared_error(y_train, y_lr_train_pred)
lr_train_r2 = r2_score(y_train, y_lr_train_pred)
lr_test_mse = mean_squared_error(y_test, y_lr_test_pred)
lr_test_r2 = r2_score(y_test, y_lr_test_pred)

print(lr_train_mse)

lr_results = pd.DataFrame(['Linear regression',lr_train_mse, lr_train_r2, lr_test_mse, lr_test_r2]).transpose()
lr_results.columns = ['Method','Training MSE','Training R2','Test MSE','Test R2']

我有一个代码,试图预测logS值。这个代码不是我的,而是来自指南。那里没有错误。那么问题出在哪里呢?

full error message here:
Traceback (most recent call last):
  File "D:\Python\AI\test\main.py", line 14, in <module>
    lr.fit(X_train, y_train)
  File "D:\Python\AI\test\venv\lib\site-packages\sklearn\base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "D:\Python\AI\test\venv\lib\site-packages\sklearn\linear_model\_base.py", line 678, in fit
    X, y = self._validate_data(
  File "D:\Python\AI\test\venv\lib\site-packages\sklearn\base.py", line 622, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "D:\Python\AI\test\venv\lib\site-packages\sklearn\utils\validation.py", line 1164, in check_X_y
    check_consistent_length(X, y)
  File "D:\Python\AI\test\venv\lib\site-packages\sklearn\utils\validation.py", line 407, in check_consistent_length
    raise ValueError(
ValueError: Found input variables with inconsistent numbers of samples: [915, 229]

我更改了数据文件,ValueError中的值也发生了更改。我更改了test_size值,ValueError中的值发生了更改对于test_size=0.4:

ValueError: Found input variables with inconsistent numbers of samples: [686, 458]

1 回复 | 直到 1 年前

Isa-Ali 1 年前

如何使用“train_test_split”拆分数据存在问题。

您应该将“y_train”替换为“X_test”。见下文:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

正确的顺序是首先指定训练和测试数据集的特征('X'),然后指定目标变量('y')。

推荐文章

Google User · Django管理员在`list_display中未显示`creation_date`字段`

4 月前

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

5 月前

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

5 月前

user29715306 · from_users=和chats=电视节目中的差异

5 月前

Redshoe · 当执行numpy.genfromtxt()时,python是否会读取文件的所有行?

5 月前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

5 月前

prayner · 更新嵌套字典包含列表中的项

5 月前

Bringo Jr · 我可以在O(n)中解决这个问题吗?

5 月前

Dave · 如何在for循环中修改列表值

5 月前

Shukurullox Komiljonov · 从记录中获得相互和解。使用SQL

5 月前