代码之家 › 专栏 › 技术社区 › Jack

截断一个字符串而不以单词中间结尾

truncate python

Jack · 技术社区 · 16 年前

我正在寻找一种在python中截断字符串的方法,这种方法不会在单词中间截断字符串。

例如:

Original:          "This is really awesome."
"Dumb" truncate:   "This is real..."
"Smart" truncate:  "This is really..."

我正在寻找一种从上面完成“智能”截断的方法。

7 回复 | 直到 7 年前

Adam 16 年前

我在最近的一个项目中为这个写了一个解决方案。我把它的大部分压缩到了一点小。

def smart_truncate(content, length=100, suffix='...'):
    if len(content) <= length:
        return content
    else:
        return ' '.join(content[:length+1].split(' ')[0:-1]) + suffix

如果语句检查内容是否已经小于临界点,就会发生什么情况。如果不是,它会截短到所需的长度,在空间上拆分,删除最后一个元素(这样您就不会切掉一个单词),然后将其重新连接在一起(同时附加“…”。

bobince 16 年前

下面是亚当解决方案中最后一行的稍好版本:

return content[:length].rsplit(' ', 1)[0]+suffix

(这会稍微提高效率,并在字符串前面没有空格的情况下返回更合理的结果。)

Brian 16 年前

有一些细微之处可能对您有或没有问题,例如选项卡的处理(例如,如果您将选项卡显示为8个空格,但在内部将其视为1个字符),处理各种中断和不中断空格的味道,或允许中断连字符等。如果需要这样做,您可能需要查看文本包装模块。如:

def truncate(text, max_size):
    if len(text) <= max_size:
        return text
    return textwrap.wrap(text, max_size-3)[0] + "..."

大于max_大小的单词的默认行为是打破它们(使max_大小成为硬限制)。通过将break-long-words=false传递给wrap(),您可以更改为其他一些解决方案使用的软限制,在这种情况下,它将返回整个单词。如果希望此行为将最后一行更改为:

    lines = textwrap.wrap(text, max_size-3, break_long_words=False)
    return lines[0] + ("..." if len(lines)>1 else "")

根据您想要的具体行为,还有一些其他的选项,比如展开选项卡。

Markus Jarderot 16 年前

def smart_truncate1(text, max_length=100, suffix='...'):
    """Returns a string of at most `max_length` characters, cutting
    only at word-boundaries. If the string was truncated, `suffix`
    will be appended.
    """

    if len(text) > max_length:
        pattern = r'^(.{0,%d}\S)\s.*' % (max_length-len(suffix)-1)
        return re.sub(pattern, r'\1' + suffix, text)
    else:
        return text

或

def smart_truncate2(text, min_length=100, suffix='...'):
    """If the `text` is more than `min_length` characters long,
    it will be cut at the next word-boundary and `suffix`will
    be appended.
    """

    pattern = r'^(.{%d,}?\S)\s.*' % (min_length-1)
    return re.sub(pattern, r'\1' + suffix, text)

或

def smart_truncate3(text, length=100, suffix='...'):
    """Truncates `text`, on a word boundary, as close to
    the target length it can come.
    """

    slen = len(suffix)
    pattern = r'^(.{0,%d}\S)\s+\S+' % (length-slen-1)
    if len(text) > length:
        match = re.match(pattern, text)
        if match:
            length0 = match.end(0)
            length1 = match.end(1)
            if abs(length0+slen-length) < abs(length1+slen-length):
                return match.group(0) + suffix
            else:
                return match.group(1) + suffix
    return text

Anthony 11 年前

>>> import textwrap
>>> textwrap.wrap('The quick brown fox jumps over the lazy dog', 12)
['The quick', 'brown fox', 'jumps over', 'the lazy dog']

你只要抓住第一个元素,就完成了……

Vebjorn Ljosa 16 年前

def smart_truncate(s, width):
    if s[width].isspace():
        return s[0:width];
    else:
        return s[0:width].rsplit(None, 1)[0]

测试:

>>> smart_truncate('The quick brown fox jumped over the lazy dog.', 23) + "..."
'The quick brown fox...'

marcanuy 7 年前

从python 3.4+可以使用 textwrap.shorten . 使用OP示例:

>>> import textwrap
>>> original = "This is really awesome."
>>> textwrap.shorten(original, width=20, placeholder="...")
'This is really...'

文本换行。缩短(文本,宽度,**kwargs)

折叠并截断给定的文本以适应给定的宽度。

首先,文本中的空白被折叠(所有空白被单个空格替换)。如果结果与宽度相符,则为返回。否则,将从结尾处删除足够多的单词,以便剩余单词加上占位符的宽度: