代码之家  ›  专栏  ›  技术社区  ›  Brock Woolf

python递归文件夹读取

  •  161
  • Brock Woolf  · 技术社区  · 16 年前

    我有一个C++/Obj-C背景,我刚刚发现Python(已经写了大约一个小时)。 我正在编写一个脚本来递归地读取文件夹结构中文本文件的内容。

    我的问题是我写的代码只能在一个文件夹深度下工作。我可以在代码中看到原因(参见 #hardcoded path )我只是不知道如何使用Python前进,因为我对它的体验只是全新的。

    Python代码:

    import os
    import sys
    
    rootdir = sys.argv[1]
    
    for root, subFolders, files in os.walk(rootdir):
    
        for folder in subFolders:
            outfileName = rootdir + "/" + folder + "/py-outfile.txt" # hardcoded path
            folderOut = open( outfileName, 'w' )
            print "outfileName is " + outfileName
    
            for file in files:
                filePath = rootdir + '/' + file
                f = open( filePath, 'r' )
                toWrite = f.read()
                print "Writing '" + toWrite + "' to" + filePath
                folderOut.write( toWrite )
                f.close()
    
            folderOut.close()
    
    8 回复  |  直到 7 年前
        1
  •  277
  •   AndiDog    11 年前

    确保您了解 os.walk :

    for root, subdirs, files in os.walk(rootdir):
    

    具有以下含义:

    • root :当前路径,即“经过”
    • subdirs 文件在 目录类型
    • files 文件在 (不在 细分市场 )目录以外的类型

    请使用 os.path.join 而不是用斜线连接!你的问题是 filePath = rootdir + '/' + file -您必须连接当前的“walked”文件夹,而不是最上面的文件夹。那一定是 filePath = os.path.join(root, file) . btw“file”是内置的,所以通常不将其用作变量名。

    另一个问题是循环,应该是这样的,例如:

    import os
    import sys
    
    walk_dir = sys.argv[1]
    
    print('walk_dir = ' + walk_dir)
    
    # If your current working directory may change during script execution, it's recommended to
    # immediately convert program arguments to an absolute path. Then the variable root below will
    # be an absolute path as well. Example:
    # walk_dir = os.path.abspath(walk_dir)
    print('walk_dir (absolute) = ' + os.path.abspath(walk_dir))
    
    for root, subdirs, files in os.walk(walk_dir):
        print('--\nroot = ' + root)
        list_file_path = os.path.join(root, 'my-directory-list.txt')
        print('list_file_path = ' + list_file_path)
    
        with open(list_file_path, 'wb') as list_file:
            for subdir in subdirs:
                print('\t- subdirectory ' + subdir)
    
            for filename in files:
                file_path = os.path.join(root, filename)
    
                print('\t- file %s (full path: %s)' % (filename, file_path))
    
                with open(file_path, 'rb') as f:
                    f_content = f.read()
                    list_file.write(('The file %s contains:\n' % filename).encode('utf-8'))
                    list_file.write(f_content)
                    list_file.write(b'\n')
    

    如果你不知道, with 文件声明是一个速记:

    with open('filename', 'rb') as f:
        dosomething()
    
    # is effectively the same as
    
    f = open('filename', 'rb')
    try:
        dosomething()
    finally:
        f.close()
    
        2
  •  41
  •   davidak    7 年前

    如果您使用的是python 3.5或更高版本,那么您可以在一行中完成这项工作。

    import glob
    
    for filename in glob.iglob(root_dir + '**/*.txt', recursive=True):
         print(filename)
    

    如中所述 documentation

    如果recursive为true,模式“**”将匹配任何文件以及零个或多个目录和子目录。

    如果您需要每个文件,可以使用

    import glob
    
    for filename in glob.iglob(root_dir + '**/*', recursive=True):
         print(filename)
    
        3
  •  33
  •   the Tin Man    12 年前

    同意戴夫·韦伯的观点, os.walk 将为树中的每个目录生成一个项。事实上,你不必担心 subFolders .

    这样的代码应该有效:

    import os
    import sys
    
    rootdir = sys.argv[1]
    
    for folder, subs, files in os.walk(rootdir):
        with open(os.path.join(folder, 'python-outfile.txt'), 'w') as dest:
            for filename in files:
                with open(os.path.join(folder, filename), 'r') as src:
                    dest.write(src.read())
    
        4
  •  3
  •   the Tin Man    12 年前

    使用 os.path.join() 建造你的道路-更整洁:

    import os
    import sys
    rootdir = sys.argv[1]
    for root, subFolders, files in os.walk(rootdir):
        for folder in subFolders:
            outfileName = os.path.join(root,folder,"py-outfile.txt")
            folderOut = open( outfileName, 'w' )
            print "outfileName is " + outfileName
            for file in files:
                filePath = os.path.join(root,file)
                toWrite = open( filePath).read()
                print "Writing '" + toWrite + "' to" + filePath
                folderOut.write( toWrite )
            folderOut.close()
    
        5
  •  1
  •   Diego    8 年前

    试试这个:

    import os
    import sys
    
    for root, subdirs, files in os.walk(path):
    
        for file in os.listdir(root):
    
            filePath = os.path.join(root, file)
    
            if os.path.isdir(filePath):
                pass
    
            else:
                f = open (filePath, 'r')
                # Do Stuff
    
        6
  •  0
  •   the Tin Man    12 年前

    os.walk 默认情况下是否进行递归遍历。对于每个dir,从根目录开始生成一个3元组(dirpath、dirnames、filename)

    from os import walk
    from os.path import splitext, join
    
    def select_files(root, files):
        """
        simple logic here to filter out interesting files
        .py files in this example
        """
    
        selected_files = []
    
        for file in files:
            #do concatenation here to get full path 
            full_path = join(root, file)
            ext = splitext(file)[1]
    
            if ext == ".py":
                selected_files.append(full_path)
    
        return selected_files
    
    def build_recursive_dir_tree(path):
        """
        path    -    where to begin folder scan
        """
        selected_files = []
    
        for root, dirs, files in walk(path):
            selected_files += select_files(root, files)
    
        return selected_files
    
        7
  •  0
  •   the Tin Man    12 年前

    我认为问题是你没有处理 os.walk 正确地。

    首先,改变:

    filePath = rootdir + '/' + file
    

    到:

    filePath = root + '/' + file
    

    rootdir 是您的固定起始目录; root 是由返回的目录 步行 .

    第二,您不需要缩进文件处理循环,因为为每个子目录运行它是没有意义的。你会得到 设置为每个子目录。您不需要手工处理子目录,除非您想对目录本身做些什么。

        8
  •  0
  •   Scott Smith    7 年前

    如果需要给定目录下所有路径的简单列表(如 find . 在壳中):

       files = [ 
           os.path.join(parent, name)
           for (parent, subdirs, files) in os.walk(YOUR_DIRECTORY)
           for name in files + subdirs
       ]
    

    要只包含基目录下文件的完整路径,请忽略 + subdirs .