代码之家  ›  专栏  ›  技术社区  ›  proximacentauri

Python遍历子目录查找文件对

  •  0
  • proximacentauri  · 技术社区  · 7 年前

    我的子文件夹结构很深,如下所示:

    a/b/file1.txt
    a/b/file1.doc
    a/b/file2.txt
    a/b/file2.doc
    a/c/file3.txt
    a/c/file3.doc
    a/c/d/file4.txt
    a/c/d/file4.doc
    

    到目前为止,我想到的最好的方法是以下看起来效率不高的方法:

    files = []
    for root, dirs, files in os.walk(path):
        for filename in files:
            if os.path.isdir(os.path.join(os.path.abspath("."), filename)):
                file_list = os.listdir(filename)
                file_list_copy = file_list.copy()
                #for each in file_list of type .txt
                # find .doc of same name in file_list_copy
                #add the 2 to tuple nd append to list
    
    1 回复  |  直到 7 年前
        1
  •  0
  •   proximacentauri    7 年前

    可能不是最有效的,但有效:

    find /path-to-files-root/ -type f -name '*.txt' -exec mv -i {} /new-path-to-files/txt/ \;
    

    然后我跑了:

    def get_all_files(path, pattern):
    #see https://stackoverflow.com/questions/17282887/getting-files-with-same-name-irrespective-of-their-extension
        datafiles = []
        for root,dirs,files in os.walk(path):
            for file in fnmatch.filter(files, pattern):
                datafiles.append(file)
        return datafiles
    
    txt_files = [f for f in os.listdir(txt_path) if isfile(join(txt_path, f))]
    doc_files = [f for f in os.listdir(doc_path) if isfile(join(doc_path, f))]
    for i, txt_file in enumerate(txt_files):
        filename = (os.path.splitext(txt_file)[0])
        doc_files = get_all_files(doc_path, '{0}.doc'.format(filename))
        if len(doc_files)== 1:
            doc_file = doc_files[0]
            #do something with txt_file and doc_file
    
    推荐文章