代码之家 › 专栏 › 技术社区 › capser

Perl在两个关键字之间捕获文本

perl

capser · 技术社区 · 6 年前

我在读两个关键词之间的文字。但不是真的有效。我只想在问题和答案中阅读,然后打印出来。它不工作,只是不断打印出一个非常大的循环。

#!/usr/bin/perl
use strict ;
use warnings;
my $question ;
my $answer ;

while(my $line = <>){
chomp $line ;

if ($line =~ /questionstart(.*)questionend/) {
    $question = $1 ; }
elsif ($line  =~ /answerstart(.*)answerend/) {
    $answer = $1 ; }

my $flashblock = <<"FLASH" ;
<!-- BEGIN -->
<p class="question">
  $question
</p>
<p class="answer">
   $answer
</p>
<!-- END -->
FLASH
print $flashblock ;
}

这是该文件的示例

questionstart

hellphellohellohello


questionend

answerstart

hellohellohello

answerend

3 回复 | 直到 6 年前

Dave Cross 6 年前

正如其他人指出的,当您一次读取一行输入文件时,多行regex永远不会工作。

这是Perl“触发器/触发器”运算符的完美用法( .. )

#!/usr/bin/perl

use strict;
use warnings;

my ($question, $answer);

while (<DATA>) {
  if (/questionstart/ .. /questionend/ and ! /question(start|end)/) {
    $question .= $_;
  }

  if (/answerstart/ .. /answerend/ and ! /answer(start|end)/) {
    $answer .= $_;
  }

  # If we're at the end of an answer, do all the stuff
  if (/answerend/) {
    q_and_a($question, $answer);

    # reset text variables
    $question = $answer = '';
  }
}

sub q_and_a {
  my ($q, $a) = @_;

  print <<"FLASH";
<!-- BEGIN -->
<p class="question">
  $question
</p>
<p class="answer">
   $answer
</p>
<!-- END -->
FLASH
}

__DATA__
questionstart

hellphellohellohello


questionend

answerstart

hellohellohello

answerend

更新: 将显示移到子例程中以使主循环更清晰。

zdim 6 年前

由于文件是逐行读取的,因此跨越多行的所需短语永远无法匹配。

解决这一问题的一个基本方法是为问答区域设置标志。因为您有非常清楚的标记来输入和离开这些区域,所以代码非常简单

use warnings;
use strict;
use feature 'say';

my ($question, $answer);
my ($in_Q, $in_A);

while (my $line = <>) {
    next if $line =~ /^\s*$/;

    if    ($line =~ /^\s*questionstart/) { $in_Q = 1; next }   
    elsif ($line =~ /^\s*questionend/)   { $in_Q = 0; next }   
    elsif ($line =~ /^\s*answerstart/)   { $in_A = 1; next }   
    elsif ($line =~ /^\s*answerend/)     { $in_A = 0; next }       

    if    ($in_Q) { $question .= $line }
    elsif ($in_A) { $answer   .= $line }
}

say "Question: $question";
say "Answer: $answer";

(我浓缩了 if-elsif 陈述只为了简洁和强调)

此代码对输入文件做了一些合理的假设。我需要标记以行首(可能有空格),但允许后面有更多的文本。如果你想确保他们是在线上唯一的东西 $ 锚定在regex的末尾(再次使用 \s* )。

说明输入有一个Q/A。如果它变为多个Q/A,那么一旦应答端如此低,就在循环中移动打印。 elsif (/^\s*answerend/) { .. }

这个问题的印刷品很好,所以我不在这里重复。如果有机会打印HTML以外的格式,则清除前导和尾随空格、多个空格和换行符中的字符串。

在同一个变量上重复的测试可能会导致寻找一个case类型构造,在perl中是这样的。 switch . 然而,这仍然是一个实验性的特性,它以一种

很难准确描述

(文档!).此外,它还可能 智能匹配 涉及,这是很难描述,广泛理解为打破其目前的形式,并一定要改变。所以我建议继续使用级联if-elsif语句(在这种方法中)。

ggorlen Hoàng Huy Khánh 6 年前

您的方法包括逐行读取文件,但是您的regex试图在问题/答案的开始和结束之间获取多行。文件中的任何行都不会与这样的多行regex匹配,最终将以未初始化结束。 $question 和 $answer 为文件中的每一行打印变量和块/警告。

将整个文本文件读取为字符串,然后将其拆分为问答块并修剪内容(如果需要)是有意义的:

#!/usr/bin/perl
use strict;
use warnings;

open my $fh, '<', 'file.txt' or die "Can't open file $!";
my @qa = grep(/\w+/g, split /^(questionstart|answerstart|questionend|answerend)$/mg, do {local $/; <$fh>});
s/^\s+|\s+$//g foreach @qa;

my $flashblock = << "FLASH";
<!-- BEGIN -->
<p class="question">
    $qa[0]
</p>
<p class="answer">
    $qa[1]
</p>
<!-- END -->
FLASH

print $flashblock;

输出:

<!-- BEGIN -->
<p class="question">
    hellphellohellohello
</p>
<p class="answer">
    hellohellohello
</p>
<!-- END -->

如果在一个文件中有多个问答对,则可以循环 @qa 数组和打印对,或者将它们放入散列并根据需要使用。