代码之家 › 专栏 › 技术社区 › Hill

awk/sed:如果任何字段与模式匹配,则替换所有字段

replace sed awk unix

Hill · 技术社区 · 7 年前

我有一个以制表符分隔的文件,其中至少有16列(可能更多),其中第一列是唯一标识符;和>10000行(示例中仅显示6x6),如下所示:

ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    4    4     4     -9    4
5    5    5     5     5     5
6    6    -9    6     6     6

如果其中一个值已经是“-9”,我需要将VAR1-5的所有值更改为“-9”

因此,期望的输出是:

ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    -9   -9    -9    -9    -9
5    5    5     5     5     5
6    -9   -9    -9    -9    -9

到目前为止,我已经尝试在awk中这样做:

awk -F'\t' '
BEGIN{OFS="\t"}
{for(i=2;i<=NF;i++){if ($i=="-9"){for(j=2;j<=NF;j++){$j="-9"};continue}}};1
' < file1.tab

这是可行的,但应用到实际数据集时速度非常慢。有没有更快的方法?也许是结合了 grep 和 sed ?

5 回复 | 直到 7 年前

tripleee 7 年前

这里有一个变体,它不会对列数进行硬编码。

awk -F '\t' '/(^|\t)-9(\t|$)/ {
    printf $1; for(i=2; i<=NF; ++i) printf "\t-9"; printf "\n"
    next }
  1' file1 file2

这里的主要优化是Awk一次扫描整行并立即在regex上触发,而不需要遍历所有字段,除非它已经知道有匹配项。

因为我们知道我们将丢弃除第一个字段以外的所有字段,所以没有必要让Awk替换这些字段,以便它可以打印它们。只需生成我们想要打印的输出并继续,而不必接触Awk的线的内部表示。虽然这是一个非常小的性能改进,但这也应该购买一些周期。

RavinderSingh13 Nikita Bakshi 7 年前

下列的 awk 我已经用你们提供的样品对它进行了测试。

awk 'FNR==1{print;next} /(^|\t)-9(\t|$)/{print $1,"-9   -9    -9    -9    -9";next} 1' OFS="    "   Input_file

如果OP在Input\u文件中有5个以上的字段,那么下面可能会有所帮助,逻辑与triple sir的解决方案相同,在这里,我遍历字段,但不考虑打印 -9 我正在将字段的值指定给 -9 .

awk 'FNR==1{print;next} /(^|\t)-9(\t|$)/{for(i=2;i<=NF;i++){$i=-9};} 1' OFS="\t\t"   Input_file

输出如下。

ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    -9   -9    -9    -9    -9
5    5    5     5     5     5
6    -9   -9    -9    -9    -9

说明: 现在也在为上述代码添加解释。

awk '
FNR==1{                ##Checking condition here if line number is 1 then do following:
  print;               ##Printing the current line then which will be very first line of Input_file.
  next                 ##next is awk out of the box keyword which will skip all further statements for program.
}
/(^|\t)-9(\t|$)/{        ##Checking here if -9 is coming in a line either with spaces or without spaces, if yes then do following:
  print $1,"-9   -9    -9    -9    -9";  ##printing the first field of current line along with 5 -9 values as per OPs request to do so.
  next                 ##next will skip all further statements.
}
1                      ##awk works on method of condition then action, so I am making condition TRUE here by mentioning 1 here and not mentioning action here so by default print of the current line will happen.
' OFS="    " Input_file   ##Setting OFS(output field separator) value to spaces and mentioning the Input_file name here.

MiniMax 7 年前

sed -r '/-9/s/[^ ]+/-9/2g' input.txt

输出

ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    -9    -9     -9     -9    -9
5    5    5     5     5     5
6    -9    -9    -9     -9     -9

Akshay Hegde 7 年前

更多使用方法 GNU awk

一个衬里:

awk '/(^|[ \t]+)-9([ \t]+|$)/{for(i=2; i<=NF; i++)$0=gensub (/[^[:blank:]]+/,-9,i)}1' infile

更好的可读性:

awk '/(^|[ \t]+)-9([ \t]+|$)/{
       for(i=2; i<=NF; i++)
            $0=gensub (/[^[:blank:]]+/,-9,i)
     }1
    ' infile

试验结果:

输入:

$ cat infile
ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    4    4     4     -9    4
5    5    5     5     5     5
6    6    -9    6     6     6

输出:

(因为 - 间距已移动)

$ awk '/(^|[ \t]+)-9([ \t]+|$)/{for(i=2; i<=NF; i++)$0 = gensub (/[^[:blank:]]+/, -9 , i)}1' infile  
ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    -9    -9     -9     -9    -9
5    5    5     5     5     5
6    -9    -9    -9     -9     -9

如果您想让输出看起来更好,可以尝试以下方法:(不推荐)

awk '/(^|[ \t]+)-9([ \t]+|$)/{for(i=2; i<=NF; i++){ if($i==-9)continue; $0 = gensub (/[^[:blank:]]+/, "\b-9" , i)}}1' infile  
ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4   -9   -9    -9     -9   -9
5    5    5     5     5     5
6   -9    -9   -9    -9    -9

更具可读性的上述版本:

awk '/(^|[ \t]+)-9([ \t]+|$)/{
          for(i=2; i<=NF; i++)
          { 
            if($i==-9)continue; 
            $0 = gensub(/[^[:blank:]]+/, "\b-9" , i)
          }
     }1
    ' infile

Vicky 7 年前

awk 'BEGIN{IFS=OFS="    "}/-9/{for(i=2;i<=NF;i++){$i=-9}}1' filename