代码之家  ›  专栏  ›  技术社区  ›  John F. Miller

生成奇怪字符的Ruby CGI.unescapeHTML

  •  1
  • John F. Miller  · 技术社区  · 16 年前

    我已将一堆标记格式的注释备份到XML文档中。当然,这意味着我需要拍摄他们。当我尝试使用CGI.unescapeHTML时,它会在标记中添加一些奇怪的字符,这些字符在所有浏览器中都不能很好地呈现。

    具体来说,它用“\302\240”替换了两个空格,但不一致。我如何才能阻止这种行为?

    如:

    s = "I am seeing more and more <a href="http://github.com/aslakhellesoy/cucumber /tree/master">Cucumber</a> usage.  This is a good thing!  But I'm also seeing people who are not using regular expressions to their fullest.  Here are some quick regex tips to keep you features readable:

* `(?:a|an)` -- using a this construct you can group things wihout actually matching them.  I'm seeing a lot of steps that have unused params because someone needed a group but didn't know how to avoid capturing it&#x000A"
    CGI.unescapeHTML s
    # => "I am seeing more and more <a href=\"http://github.com/aslakhellesoy/cucumber/tree/master\">Cucumber</a> usage.\302\240 This is a good thing!\302\240 But I'm..."
    
    1 回复  |  直到 13 年前
        1
  •  0
  •   Reactormonk    13 年前

    那些是不间断的空间。 Read up on wikipedia.

    In computer-based text processing and digital typesetting, a
    non-breaking space, also known as a no-break space or
    non-breakable space (NBSP), is a variant of the space character
    that prevents an automatic line break (line wrap) at its position.
    In certain formats (such as HTML), it also prevents the
    “collapsing” of multiple consecutive whitespace characters into a
    single space. The non-breaking space is also known as a hard space
    or fixed space. In Unicode, it is encoded as U+00A0 no-break space
    (HTML: &#160; &nbsp;).
    
    推荐文章