代码之家  ›  专栏  ›  技术社区  ›  YorSubs

Linux中遍历目录的时间不同方法[关闭]

  •  0
  • YorSubs  · 技术社区  · 7 月前

    我想找出递归计数子目录和文件的最有效方法,并提出了以下测试。有些似乎有效,但结果不一致。

    • 我如何解决任何方法导致不同数量的目录和不同数量的文件的不一致性(但每个测试都应该产生与其他测试完全相同的目录和文件数量!)。
    • 如何修复下面输出中显示的损坏的测试?
    • 是否有比以下更好/更快/更有效的技术?

    我想这篇文章跨越了StackOverflow和SuperUser之间的界限,但它确实与脚本有关,所以我想这是正确的地方。

    #!/bin/bash
    
    # Default to home directory if no argument is provided
    dir="${1:-$HOME}"
    
    echo "Analyzing directories and files in: $dir"
    echo
    
    # Function to time and run a command, and print the count
    time_command() {
        local description="$1"
        local command="$2"
        echo "$description"
        echo "Running: $command"
        start_time=$(date +%s.%N)
        result=$(eval "$command")
        end_time=$(date +%s.%N)
        duration=$(echo "$end_time - $start_time" | bc)
        echo "Count: $result"
        echo "Time: $duration seconds"
    }
    
    # Methods to count directories
    dir_methods=(
        "Directory Method 1 (find): find '$dir' -type d | wc -l"
        "Directory Method 2 (tree): tree -d '$dir' | tail -n 1 | awk '{print \$3}'"
        "Directory Method 3 (du): echo 'deprecated: usually around double length of ''find'' command'"
        "Directory Method 4 (ls): ls -lR '$dir' | grep '^d' | wc -l"
        "Directory Method 5 (bash loop): count=0; for d in \$(find '$dir' -type d); do count=\$((count + 1)); done; echo \$count"
        "Directory Method 6 (perl): perl -MFile::Find -le 'find(sub { \$File::Find::dir =~ /\\/ and \$n++ }, \"$dir\"); print \$n'"
        "Directory Method 7 (python): python3 -c 'import os; print(sum([len(dirs) for _, dirs, _ in os.walk(\"$dir\")]))'"
    )
    
    # Methods to count files
    file_methods=(
        "File Method 1 (find): find '$dir' -type f | wc -l"
        "File Method 2 (tree): tree -fi '$dir' | grep -E '^[├└─] ' | wc -l"
        "File Method 3 (ls): ls -lR '$dir' | grep -v '^d' | wc -l"
        "File Method 4 (bash loop): count=0; for f in \$(find '$dir' -type f); do count=\$((count + 1)); done; echo \$count"
        "File Method 5 (perl): perl -MFile::Find -le 'find(sub { -f and \$n++ }, \"$dir\"); print \$n'"
        "File Method 6 (python): python3 -c 'import os; print(sum([len(files) for _, _, files in os.walk(\"$dir\")]))'"
    )
    
    # Run and time each directory counting method
    echo "Counting directories..."
    echo
    for method in "${dir_methods[@]}"; do
        description="${method%%:*}"
        command="${method#*: }"
        if [[ "$description" == *"(du)"* ]]; then
            echo "$description"
            echo "Running: $command"
            eval "$command"
        else
            time_command "$description" "$command"
        fi
        echo
    done
    
    # Run and time each file counting method
    echo "Counting files..."
    echo
    for method in "${file_methods[@]}"; do
        description="${method%%:*}"
        command="${method#*: }"
        time_command "$description" "$command"
        echo
    done
    

    下面是上面的一段。正如您所看到的,在每种情况下找到的目录和文件的数量都是不同的(!),并且一些测试明显损坏,因此了解如何修复这些测试会很好。

    Analyzing directories and files in: /home/boss
    
    Counting directories...
    
    Directory Method 1 (find)
    Running: find '/home/boss' -type d | wc -l
    Count: 598844
    Time: 11.949245266 seconds
    
    Directory Method 2 (tree)
    Running: tree -d '/home/boss' | tail -n 1 | awk '{print $3}'
    Count:
    Time: 2.776698115 seconds
    
    Directory Method 3 (du)
    Running: echo 'deprecated: usually around double length of ''find'' command'
    deprecated: usually around double length of find command
    
    Directory Method 4 (ls)
    Running: ls -lR '/home/boss' | grep '^d' | wc -l
    Count: 64799
    Time: 6.522804741 seconds
    
    Directory Method 5 (bash loop)
    Running: count=0; for d in $(find '/home/boss' -type d); do count=$((count + 1)); done; echo $count
    Count: 604654
    Time: 14.693009738 seconds
    
    Directory Method 6 (perl)
    Running: perl -MFile::Find -le 'find(sub { $File::Find::dir =~ /\/ and $n++ }, "/home/boss"); print $n'
    String found where operator expected (Missing semicolon on previous line?) at -e line 1, at end of line
    Unknown regexp modifier "/h" at -e line 1, at end of line
    Unknown regexp modifier "/e" at -e line 1, at end of line
    Can't find string terminator '"' anywhere before EOF at -e line 1.
    Count:
    Time: .019156779 seconds
    
    Directory Method 7 (python)
    Running: python3 -c 'import os; print(sum([len(dirs) for _, dirs, _ in os.walk("/home/boss")]))'
    Count: 599971
    Time: 15.013263266 seconds
    
    Counting files...
    
    File Method 1 (find)
    Running: find '/home/boss' -type f | wc -l
    Count: 5184830
    Time: 13.066028457 seconds
    
    File Method 2 (tree)
    Running: tree -fi '/home/boss' | grep -E '^[├└─] ' | wc -l
    Count: 0
    Time: 8.431054237 seconds
    
    File Method 3 (ls)
    Running: ls -lR '/home/boss' | grep -v '^d' | wc -l
    Count: 767236
    Time: 6.593778380 seconds
    
    File Method 4 (bash loop)
    Running: count=0; for f in $(find '/home/boss' -type f); do count=$((count + 1)); done; echo $count
    Count: 5196437
    Time: 40.861512698 seconds
    
    File Method 5 (perl)
    Running: perl -MFile::Find -le 'find(sub { -f and $n++ }, "/home/boss"); print $n'
    Count: 5186461
    Time: 54.353541730 seconds
    
    File Method 6 (python)
    Running: python3 -c 'import os; print(sum([len(files) for _, _, files in os.walk("/home/boss")]))'
    Count: 5187084
    Time: 14.910791357 seconds
    
    2 回复  |  直到 7 月前
        1
  •  2
  •   choroba    7 月前

    我删除了 ls 方法不可靠( ls 它不仅输出目录中的文件,还输出目录名和总计,这些都不应该包含在目录或文件中)。

    我更改了Perl方法以利用 postprocess 该函数仅在离开目录时运行,因此不需要对文件类型进行测试。

    我也修复了 tree 方法:至少在我的系统上, 需要 -a 包含以点开头的文件名。您可以使用 awk 对于文件和目录,无需计算行数。

    # Methods to count directories
    dir_methods=(
        "Directory Method 1 (find): find '$dir' -type d | wc -l"
        "Directory Method 2 (tree): tree -afi '$dir' | tail -n 1 | awk '{print \$1}'"
        "Directory Method 5 (bash loop): count=0; for d in \$(find '$dir' -type d); do count=\$((count + 1)); done; echo \$count"
        "Directory Method 6 (perl): perl -MFile::Find -le 'find({wanted => sub {}, postprocess => sub { ++\$n }}, \"$dir\"); print \$n'"
        "Directory Method 7 (python): python3 -c 'import os; print(sum([len(dirs) for _, dirs, _ in os.walk(\"$dir\")]))'"
    )
    
    # Methods to count files
    file_methods=(
        "File Method 1 (find): find '$dir' -type f | wc -l"
        "File Method 2 (tree): tree -a '$dir' | tail -n1 | awk '{print \$3}'"
        "File Method 4 (bash loop): count=0; for f in \$(find '$dir' -type f); do count=\$((count + 1)); done; echo \$count"
        "File Method 5 (perl): perl -MFile::Find -le 'find({wanted => sub { ++\$n },postprocess => sub {--\$n}}, \"$dir\"); print \$n'"
        "File Method 6 (python): python3 -c 'import os; print(sum([len(files) for _, _, files in os.walk(\"$dir\")]))'"
    )
    

    不过,结果仍然不一样:在计算目录时,python和tree不计算顶级目录。

    如果文件或目录的名称中有空格,“bash循环”方法会分别计算每个单词,所以这是错误的。

    如果文件或目录的名称中有换行符,即使是 find 方法不对。您可以通过根本不打印名称来修复它:

        "Directory Method 1 (find): find '$dir' -type d -printf '\\n'  | wc -l"
    

    文件也是如此。你可以用同样的方法修复“bash循环”。

        2
  •  2
  •   Daweo    7 月前
    Method 4 (ls)(...)wc -l
    ...
    Method 3 (ls)(...)wc -l
    

    记住解析 ls 输出被认为是个坏主意,例如,您假设名称从不包含换行符,这在Unix系统中是不允许的,请参阅 ParsingLs 了解更多详情。

    print(sum([len(dirs) for _, dirs, _ in os.walk(\"$dir\")]))
    

    您不必为sum创建列表,而是直接传递生成器表达式,也就是说,您可以这样做

    print(sum(len(dirs) for _, dirs, _ in os.walk(\"$dir\")))
    

    这可能会加快速度,因为没有列表创建步骤,但您需要测试自己是否有明显的差异。