代码之家  ›  专栏  ›  技术社区  ›  Depleted Money

源代码显示的不同输出(机器学习)(Python)

  •  2
  • Depleted Money  · 技术社区  · 1 年前

    我目前正在尝试一个小型的图像机器学习项目。我发现了这个人的 Kaggle code 我试着从头开始复制。然而,即使不是在主要部分,我也已经面临了一个错误。

    我确信这一定是一个本地化问题,但我不知道是怎么回事。

    我的代码:

    #Import Libraries
    
    #Data processing modules
    import pandas as pd 
    import numpy as np 
    import matplotlib.pyplot as plt
    import cv2
    #File directory modules
    import glob as gb
    import os
    #Training and testing (machine learning) modules
    import tensorflow as tf 
    import keras
    
    #Importing the images into the code
    
    trainDataset = 'melanoma_cancer_dataset/train'
    testDataset = 'melanoma_cancer_dataset/test'
    predictionDataset = 'melanoma_cancer_dataset/skinTest'
    
    #creating empty lists for the images to fall into for processing
    training_List = []
    testing_list = []
    #making a classification dictionary for the two keys, benign and malignant
    #used for inserting into the images
    diction = {'benign' : 0, 'malignant' : 1}
    
    #Read through the folder's length contents
    for folder in os.listdir(trainDataset):
        data = gb.glob(pathname=str(trainDataset + folder + '/*.jpg'))
        print(f'{len(data)} in folder {folder}')
        #read the images, resize them in a uniform order, and store them in the empty lists
        for data in data:
            image = cv2.imread(data)
            imageList = cv2.resize(image(120,120))
            training_List.append(list(imageList))
    

    笔记本的输出显示它有 0个图像/内容 存储在文件夹中。现在我有点怀疑这里发生了什么,我很想得到一些答案。提前谢谢。我也在使用我自己的VS代码。

    这是我的文件的屏幕截图:

    file tree and notebooks

    1 回复  |  直到 1 年前
        1
  •  2
  •   Niusoski    1 年前

    根据你的文件夹结构和你提供的代码,问题是你没有在文件夹路径的末尾加上斜杠。 在提供的代码中,您试图将文件夹名称与路径直接连接起来。但是,如果您错过了斜杠,或者文件夹变量不包括尾部斜杠,这可能会导致路径不正确。

    如下更新路径:

    trainDataset = 'melanoma_cancer_dataset/train/'
    testDataset = 'melanoma_cancer_dataset/test/'
    predictionDataset = 'melanoma_cancer_dataset/skinTest/'
    

    您的代码正在执行的操作如下:

    for folder in os.listdir(trainDataset):
        data = gb.glob(pathname=str(trainDataset + folder + '/*.jpg'))
    

    它将转到trainDataset的路径,然后使用列出那里的文件夹(分别命名为恶性和良性) os.listdir() . 这些路径被连接以生成具有以下内容的最终图像路径:

    data = gb.glob(pathname=str(trainDataset + folder + '/*.jpg'))
    

    此外,行中有轻微语法错误:

    imageList = cv2.resize(image(120,120))
    

    应该是

    cv2.resize(image, (120, 120))
    

    此外,附加到training_List的方式可能是错误的。在追加之前需要将imageList转换为列表,或者如果要保留图像数组结构,则直接追加imageList。

    完整更新代码:

    # Data processing modules
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import cv2
    # File directory modules
    import glob as gb
    import os
    # Training and testing (machine learning) modules
    import tensorflow as tf
    import keras
    
    # Directories
    trainDataset = 'melanoma_cancer_dataset/train/'
    testDataset = 'melanoma_cancer_dataset/test/'
    predictionDataset = 'melanoma_cancer_dataset/skinTest/'
    
    # Empty list for the images
    training_List = []
    testing_list = []
    
    # Classification dictionary
    diction = {'benign': 0, 'malignant': 1}
    
    # Read through the folder's contents
    for folder in os.listdir(trainDataset):
        # Corrected the path pattern and added a slash
        data = gb.glob(pathname=str(trainDataset + folder + '/*.jpg'))
        print(f'{len(data)} in folder {folder}')
        # Read the images, resize them, and store them in the list
        for file_path in data:
            image = cv2.imread(file_path)
            # Corrected the resize function call
            imageList = cv2.resize(image, (120, 120))
            # Append the image array directly
            training_List.append(imageList)
    
    print(f'Total images in training set: {len(training_List)}')