代码之家  ›  专栏  ›  技术社区  ›  Levent Ozbek

如何生成带有特定条目的Kaggle提交CSV文件[[副本]

  •  0
  • Levent Ozbek  · 技术社区  · 7 年前

    我是一个机器学习的初学者,我试图通过卡格尔的巨大问题来学习。我已经完成了我的代码并且得到了正确的分数 0.78 但是现在我需要生成一个CSV文件 418个条目+标题行 但我不知道该怎么做。

    这是我应该制作的一个例子:

    PassengerId,Survived
     892,0
     893,1
     894,0
     Etc.
    

    数据来自我的 test_predictions

    import pandas as pd
    from sklearn.tree import DecisionTreeClassifier
    from sklearn.model_selection import train_test_split
    from sklearn.metrics import accuracy_score
    
    """Assigning the train & test datasets' adresses to variables"""
    train_path = "C:\\Users\\Omar\\Downloads\\Titanic Data\\train.csv"
    test_path = "C:\\Users\\Omar\\Downloads\\Titanic Data\\test.csv"
    
    """Using pandas' read_csv() function to read the datasets
    and then assigning them to their own variables"""
    train_data = pd.read_csv(train_path)
    test_data = pd.read_csv(test_path)
    
    """Using pandas' factorize() function to represent genders (male/female)
    with binary values (0/1)"""
    train_data['Sex'] = pd.factorize(train_data.Sex)[0]
    test_data['Sex'] = pd.factorize(test_data.Sex)[0]
    
    """Replacing missing values in the training and test dataset with 0"""
    train_data.fillna(0.0, inplace = True)
    test_data.fillna(0.0, inplace = True)
    
    """Selecting features for training"""
    columns_of_interest = ['Pclass', 'Sex', 'Age']
    
    """Dropping missing/NaN values from the training dataset"""
    filtered_titanic_data = train_data.dropna(axis=0)
    
    """Using the predictory features in the data to handle the x axis"""
    x = filtered_titanic_data[columns_of_interest]
    
    """The survival (what we're trying to find) is the y axis"""
    y = filtered_titanic_data.Survived
    
    """Splitting the train data with test"""
    train_x, val_x, train_y, val_y = train_test_split(x, y, random_state=0)
    
    """Assigning the DecisionClassifier model to a variable"""
    titanic_model = DecisionTreeClassifier()
    
    """Fitting the x and y values with the model"""
    titanic_model.fit(train_x, train_y)
    
    """Predicting the x-axis"""
    val_predictions = titanic_model.predict(val_x)
    
    """Assigning the feature columns from the test to a variable"""
    test_x = test_data[columns_of_interest]
    
    """Predicting the test by feeding its x axis into the model"""
    test_predictions = titanic_model.predict(test_x)
    
    """Printing the prediction"""
    print(val_predictions)
    
    """Checking for the accuracy"""
    print(accuracy_score(val_y, val_predictions))
    
    """Printing the test prediction"""
    print(test_predictions)
    
    1 回复  |  直到 7 年前
        1
  •  4
  •   petezurich rdelmar    7 年前

    这个怎么样:

    submission = pd.DataFrame({ 'PassengerId': test_data.passengerid.values, 'Survived': test_predictions })
    submission.to_csv("my_submission.csv", index=False)