代码之家  ›  专栏  ›  技术社区  ›  kempinski

在分隔符之间随机化文本

  •  6
  • kempinski  · 技术社区  · 9 年前

    我有这个简单的输入

    I have {red;green;orange} fruit and cup of {tea;coffee;juice}
    

    我使用Perl识别两个外部大括号分隔符之间的模式 { } ,并使用内部分隔符随机化内部字段 ; .

    我得到这个输出

    I have green fruit and cup of coffee
    

    这是我的工作Perl脚本

    perl -plE 's!\{(.*?)\}!@x=split/;/,$1;$x[rand@x]!ge' <<< 'I have {red;green;orange} fruit and cup of {tea;coffee;juice}'
    

    我的任务是处理此输入格式

    I have { {red;green;orange} fruit ; cup of {tea;coffee;juice} } and {nice;fresh} {sandwich;burger}.
    

    正如我所理解的,脚本应该跳过外部右括号 { ... } 在第一个文本部分中,其内部有带有左括号和右括号的文本:

    { {red;green;orange} fruit ; cup of {tea;coffee;juice} }
    

    它应该选择一个随机的部分,像这样

    {red;green;orange} fruit
    

    cup of {tea;coffee;juice}
    

    然后再深入:

    green fruit
    

    处理完所有文本后,结果可能是以下任何一种

    I have red fruit and fresh burger.
    I have cup of tea and nice sandwich
    I have green fruit and nice burger.
    I have cup of coffee and fresh burger.
    

    脚本也应该解析和随机化下一个文本。例如

    This {beautiful;perfect} {image;photography}, captured with the { {NASA;ESA} Hubble Telescope ; {NASA;ESA} Hubble Space Telescope} }, is the {largest;sharpest} image ever taken of the Andromeda galaxy { {— otherwise known as M31;— known as M31}; [empty here] }.
    This is a cropped version of the full image and has 1.5 billion pixels. { You would need more than {600;700;800} HD television screens to display the whole image. ; If you want to display the whole image, you need to download more than {1;2} Tb. traffic and use 800 HD displays }
    

    示例输出可以是

    This beautiful image, captured with the NASA Hubble Telescope, is the
    sharpest image ever taken of the Andromeda galaxy — otherwise known as
    M31.
    This is a cropped version of the full image and has 1.5 billion
    pixels. You would need more than 700 HD television screens to display
    the whole image.
    
    3 回复  |  直到 9 年前
        1
  •  3
  •   William Pursell    9 年前

    不贪婪是一个好主意,但并不能完全奏效。您可以添加循环:

    perl -plE 'while(s!\{([^{}]*)\}!@x=split/;/,$1;$x[rand@x]!ge){}'
    

    请注意,您的示例输入有不匹配的大括号,因此这似乎输出了一个假“}”

        2
  •  2
  •   glenn jackman    9 年前

    不错的挑战。你需要做的是找到一组没有内部支架的支架,然后从里面随机挑选一个项目。你需要在全球范围内做到这一点。这将仅取代“1级”支架。您需要遍历字符串,直到找不到更多匹配项为止。

    use v5.18;
    use strict;
    use warnings;
    
    sub rand_sentence {
        my $copy = shift;
        1 while $copy =~ s{ \{ ([^{}]+) \} } 
                          { my @words = split /;/, $1; $words[rand @words] }xsge;
        return $copy;
    }
    
    my $str = 'I have { {red;green;orange} fruit ; cup of {tea;coffee;juice} } and {nice;fresh} {sandwich;burger}.';
    say rand_sentence($str);
    say '';
    
    $str = <<'END';
    This {beautiful;perfect} {image;photography}, captured with the { {NASA;ESA}
    Hubble Telescope ; {NASA;ESA} Hubble Space Telescope }, is the
    {largest;sharpest} image ever taken of the Andromeda galaxy { {— otherwise
    known as M31;— known as M31}; [empty here] }. This is a cropped version of the
    full image and has 1.5 billion pixels. { You would need more than {600;700;800}
    HD television screens to display the whole image. ; If you want to display the
    whole image, you need to download more than {1;2} Tb.  traffic and use 800 HD
    displays }
    END
    
    say rand_sentence($str);
    

    样本输出

    I have  orange fruit  and fresh sandwich.
    
    This beautiful photography, captured with the  ESA Hubble Space Telescope , is the
    largest image ever taken of the Andromeda galaxy  — otherwise
    known as M31. This is a cropped version of the
    full image and has 1.5 billion pixels.  If you want to display the
    whole image, you need to download more than 1 Tb.  traffic and use 800 HD
    displays
    
        3
  •  0
  •   Kaz    9 年前

    TXR 解决方案有很多方法可以解决这个问题。

    假设我们正在从标准输入读取数据。我们如何读取记录中的数据,这些记录不是由通常的换行符分隔的,而是由括号选择模式分隔的?我们通过在标准输入流上创建一个记录适配器对象来实现这一点。第三个论点 record-adapter 函数是一个布尔值,表示我们希望保留终止分隔符(与记录分隔正则表达式匹配的部分)。

    因此,如果数据看起来像这样 foo bar {bra;ces} xyzzy {a;b;c} d\n 它变成了这些记录: foo bar {bra;ces} , xyzzy {a;b;c} d\n .

    然后,我们使用提取语言处理这些记录,就像它们是文本行一样。它们分为两种模式:以大括号模式结束的线和不以大括号结束的线。后者只是附和。前者按随机支架替换的要求进行处理。

    我们还初始化 *random-state* 使得PRNG被播种以在每次运行中产生不同的伪随机序列。如果 make-random-state 如果没有参数,它将创建一个随机状态对象,该对象由进程ID和系统时间等系统参数初始化:

    @(do (set *random-state* (make-random-state)))
    @(next @(record-adapter #/{[\w;]+}/ *stdin* t))
    @(repeat)
    @  (cases)
    @*text{@switch}
    @    (do (put-string `@text@(first (shuffle (split-str switch ";")))`))
    @  (or)
    @text
    @    (do (put-string text))
    @  (end)
    @(end)
    

    试运行:

    $ cat data
    I have {red;green;orange} fruit and cup of {tea;coffee;juice}.
    $ txr rndchoose.txr < data
    I have red fruit and cup of tea.
    $ txr rndchoose.txr < data
    I have orange fruit and cup of tea.
    $ txr rndchoose.txr < data
    I have green fruit and cup of coffee.