代码之家  ›  专栏  ›  技术社区  ›  ketchup

ValueError:在easy-mashine学习样本中发现样本数不一致的输入变量:[915229]

  •  0
  • ketchup  · 技术社区  · 1 年前
    import pandas as pd
    from sklearn.metrics import mean_squared_error, r2_score
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LinearRegression
    
    df=pd.read_csv('https://raw.githubusercontent.com/dataprofessor/data/master/delaney_solubility_with_descriptors.csv')
    
    X = df.drop('logS', axis=1)
    y = df['logS']
    
    X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
    
    lr = LinearRegression()
    lr.fit(X_train, y_train)
    
    y_lr_train_pred = lr.predict(X_train)
    y_lr_test_pred = lr.predict(X_test)
    
    lr_train_mse = mean_squared_error(y_train, y_lr_train_pred)
    lr_train_r2 = r2_score(y_train, y_lr_train_pred)
    lr_test_mse = mean_squared_error(y_test, y_lr_test_pred)
    lr_test_r2 = r2_score(y_test, y_lr_test_pred)
    
    print(lr_train_mse)
    
    lr_results = pd.DataFrame(['Linear regression',lr_train_mse, lr_train_r2, lr_test_mse, lr_test_r2]).transpose()
    lr_results.columns = ['Method','Training MSE','Training R2','Test MSE','Test R2']
    

    我有一个代码,试图预测logS值。这个代码不是我的,而是来自指南。那里没有错误。那么问题出在哪里呢?

    full error message here:
    Traceback (most recent call last):
      File "D:\Python\AI\test\main.py", line 14, in <module>
        lr.fit(X_train, y_train)
      File "D:\Python\AI\test\venv\lib\site-packages\sklearn\base.py", line 1152, in wrapper
        return fit_method(estimator, *args, **kwargs)
      File "D:\Python\AI\test\venv\lib\site-packages\sklearn\linear_model\_base.py", line 678, in fit
        X, y = self._validate_data(
      File "D:\Python\AI\test\venv\lib\site-packages\sklearn\base.py", line 622, in _validate_data
        X, y = check_X_y(X, y, **check_params)
      File "D:\Python\AI\test\venv\lib\site-packages\sklearn\utils\validation.py", line 1164, in check_X_y
        check_consistent_length(X, y)
      File "D:\Python\AI\test\venv\lib\site-packages\sklearn\utils\validation.py", line 407, in check_consistent_length
        raise ValueError(
    ValueError: Found input variables with inconsistent numbers of samples: [915, 229]
    

    我更改了数据文件,ValueError中的值也发生了更改。 我更改了test_size值,ValueError中的值发生了更改 对于test_size=0.4:

    ValueError: Found input variables with inconsistent numbers of samples: [686, 458]
    

    :/

    1 回复  |  直到 1 年前
        1
  •  0
  •   Isa-Ali    1 年前

    如何使用“train_test_split”拆分数据存在问题。

    您应该将“y_train”替换为“X_test”。见下文:

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
    

    正确的顺序是首先指定训练和测试数据集的特征('X'),然后指定目标变量('y')。