代码之家  ›  专栏  ›  技术社区  ›  Andres Mora

如何根据文件名条件删除文件夹中的图像?

  •  0
  • Andres Mora  · 技术社区  · 5 月前

    我需要根据以下条件删除JPG和JPG文件

    文件夹有多个JPG和JPG文件。每个文件都以如下方式命名: 172.30.165.212_20241231_132125.JPG 在哪里 172.30.165.212 是IP地址, 20241231 是YYYYMMDD格式的日期,以及 132125 是HHMMSS格式的时间。

    删除条件为:
    1-脚本应始终根据文件名中的日期/时间保留每个IP地址的最新文件。无论日期/时间有多旧。
    2-但是,由于每个IP地址可以有多个文件,脚本应该删除文件名中日期/时间比当前时间早2小时以上的所有文件。
    3-永远不要查看文件的修改日期/时间,只查看名称中的日期/时间。

    我尝试过,但没有成功,因为文件不会被删除。

    #!/bin/bash
    # Define target directory
    TARGET_DIR="/mnt/moe/results"
    # Log start time
    echo "[$(date)] Starting cleanup process in $TARGET_DIR"
    
    # Function to process files for a single IP
    process_ip_files() {
        local ip_prefix=$1
        local ip_files
        # Find files matching the IP
        ip_files=$(find "$TARGET_DIR" -type f -iname "${ip_prefix}_*" | sort)
    
        # Skip if no files
        if [[ -z "$ip_files" ]]; then
            echo "No files found for IP: $ip_prefix"
            return
        fi
    
        echo "Processing IP: $ip_prefix"
    
        # Variables to track files and the most recent file
        local most_recent_file=""
        local most_recent_time=0
        local files_to_delete=()
    
        # Get current time in seconds since epoch
        current_time=$(date +%s)
    
        # Iterate over files to determine the most recent and deletion criteria
        while IFS= read -r file; do
            echo "Processing file: $file"
    
            # Remove the path and get just the file name
            base_file=$(basename "$file")
            echo "Base file name: $base_file"
    
            # Split file name into components
            IFS='_' read -r ip date time ext <<< "$base_file"
            
            # Validate the expected number of fields and format
            if [[ -z "$ip" || -z "$date" || -z "$time" || "$ext" != "JPG" && "$ext" != "jpg" ]]; then
                echo "  Skipping file (does not match expected format): $file"
                continue
            fi
    
            # Check the timestamp format (YYYYMMDD HHMMSS)
            if ! [[ "$date" =~ ^[0-9]{8}$ ]] || ! [[ "$time" =~ ^[0-9]{6}$ ]]; then
                echo "  Skipping file (invalid timestamp format): $file"
                continue
            fi
    
            # Convert to seconds since epoch
            timestamp="$date $time"
            file_time=$(date -d "$timestamp" +%s)
    
            echo "  File: $file"
            echo "    Timestamp: $timestamp"
            echo "    File time (epoch): $file_time"
            echo "    Current time (epoch): $current_time"
    
            # Check if this file is the most recent one for the IP
            if (( file_time > most_recent_time )); then
                # If we already have a most recent file, we add it to the delete list
                if [[ -n "$most_recent_file" ]]; then
                    files_to_delete+=("$most_recent_file")
                fi
                most_recent_file="$file"
                most_recent_time="$file_time"
            else
                # Check if the file is older than 2 hours (7200 seconds)
                if (( current_time - file_time > 7200 )); then
                    echo "    Marking for deletion: $file"
                    files_to_delete+=("$file")
                fi
            fi
        done <<< "$ip_files"
    
        # Display the most recent file for this IP
        echo "Most recent file for IP $ip_prefix: $most_recent_file"
    
        # Deleting files not the most recent one
        if [[ ${#files_to_delete[@]} -gt 0 ]]; then
            echo "Files marked for deletion for IP $ip_prefix:"
            for file in "${files_to_delete[@]}"; do
                echo "  - $file"
            done
    
            for file in "${files_to_delete[@]}"; do
                if [[ "$file" != "$most_recent_file" ]]; then
                    echo "Deleting file: $file"
                    rm -v "$file"
                fi
            done
        else
            echo "No files to delete for IP $ip_prefix."
        fi
    }
    
    # Process unique IP addresses
    find "$TARGET_DIR" -type f \( -iname "*.jpg" -o -iname "*.JPG" \) -printf "%f\n" | \
        awk -F'_' '{print $1}' | sort -u | while read -r ip; do
        process_ip_files "$ip"
    done
    
    # Log completion
    echo "[$(date)] Cleanup process finished."
    

    例如,我有以下文件,当前日期/时间为20241231 13:30

    172.30.165.212_20241231_132125.JPG  
    172.30.165.212_20241231_122125.JPG  
    172.30.165.212_20241231_112125.JPG  
    172.30.165.212_20241231_102125.JPG  
    172.30.165.212_20241231_092125.JPG  
    172.30.165.213_20241231_062125.JPG  
    172.30.165.213_20241231_032125.JPG  
    172.30.165.213_20241231_012125.JPG  
    

    脚本应删除

    172.30.165.212_20241231_112125.JPG (older than 2 hours)  
    172.30.165.212_20241231_102125.JPG (older than 2 hours)  
    172.30.165.212_20241231_092125.JPG (older than 2 hours)  
    172.30.165.213_20241231_032125.JPG (older than 2 hours)  
    172.30.165.213_20241231_012125.JPG (older than 2 hours)  
    

    脚本应保留

    172.30.165.212_20241231_132125.JPG  (younger than 2 hours)  
    172.30.165.212_20241231_122125.JPG  (younger than 2 hours)  
    172.30.165.213_20241231_062125.JPG  (older than 2 hours but most recent from this ip address)  
    
    1 回复  |  直到 5 月前
        1
  •  3
  •   markp-fuso    5 月前

    与其试图批评一个100多行的剧本,我提出了以下替代方案:

    $ cat delfiles
    #!/bin/bash
    
    TARGET_DIR="/mnt/moe/results"                                # OP's directory; update accordingly (eg, TARGET_DIR='.' in my case)
    
    unset prev_ip
    
    printf -v now "%(%s)T"                                       # get current time in epoch format
    
    now=1735673400                                               # hardcoded to OP's 'current date' of '2024-12-31 13:30:00';
                                                                 # otherwise comment/remove this line for normal operations
    
    (( now-=7200 ))                                              # subtract 2 hours
    
    while read -r fname
    do
        IFS='_' read -r ip dt tm ext <<< "${fname}"
    
        [[ "${ip}" != "${prev_ip}" ]] && {                       # if new ip then this is the latest file for said ip so ...
            prev_ip="${ip}"                                      # save the new ip and ...
            continue                                             # skip to next file (ie, keep this file)
        }
    
        epoch=$(date -d "${dt:0:4}-${dt:4:2}-${dt:6:2} ${tm:0:2}:${tm:2:2}:${tm:4:2}" '+%s')
    
        (( epoch < now )) && echo rm "${TARGET_DIR}/${fname}"    # if file's epoch is more than 2 hrs old then remove the file;
                                                                 # NOTE: remove the 'echo' to perform the actual deletion
    
    done < <(find "${TARGET_DIR}" -type f -iname '*.jpg' -printf "%f\n" | sort -rV)
    

    笔记:

    • 我们对 find 结果由 -rV 进行排序 r everse订单使用a V ip+日期/时间戳的版本排序
    • OP可以根据需要添加额外的检查(例如,文件名与给定格式匹配)和信息消息
    • OP可以通过减少 $(date -d ... '+%s') 用类似的东西打电话 this answer's coproc solution

    对OP的文件集运行会生成:

    $ ./delfiles
    rm ./172.30.165.213_20241231_032125.JPG
    rm ./172.30.165.213_20241231_012125.JPG
    rm ./172.30.165.212_20241231_112125.JPG
    rm ./172.30.165.212_20241231_102125.JPG
    rm ./172.30.165.212_20241231_092125.JPG