代码之家  ›  专栏  ›  技术社区  ›  xRobot

如何显示包含这些字符的所有单词?

  •  6
  • xRobot  · 技术社区  · 14 年前

    我有一个文本文件,我想显示所有包含z和x字符的单词。

    7 回复  |  直到 14 年前
        1
  •  12
  •   Wooble    14 年前

    如果你不想有两个问题:

    for word in file('myfile.txt').read().split():
        if 'x' in word and 'z' in word:
            print word
    
        2
  •  8
  •   Tim Pietzcker    14 年前

    import re
    for word in re.findall(r"\w+", mystring):
        if 'x' in word and 'z' in word:
            print word
    
        3
  •  3
  •   Steven Rumbalski    14 年前
    >>> import re
    >>> pattern = re.compile('\b(\w*z\w*x\w*|\w*x\w*z\w*)\b')
    >>> document = '''Here is some data that needs
    ... to be searched for words that contain both z
    ... and x.  Blah xz zx blah jal akle asdke asdxskz
    ... zlkxlk blah bleh foo bar'''
    >>> print pattern.findall(document)
    ['xz', 'zx', 'asdxskz', 'zlkxlk']
    
        4
  •  3
  •   Community CDub    8 年前

    我只想指出,与简单的 string methods-based solution provided by Wooble .

    我们来计时,好吗?

    #!/usr/bin/env python
    # -*- coding: UTF-8 -*-
    
    import timeit
    import re
    import sys
    
    WORD_RE_COMPILED = re.compile(r'\w+')
    Z_RE_COMPILED = re.compile(r'(\b\w*z\w*\b)')
    XZ_RE_COMPILED = re.compile(r'\b(\w*z\w*x\w*|\w*x\w*z\w*)\b')
    
    ##########################
    # Tim Pietzcker's solution
    # https://stackoverflow.com/questions/3962846/how-to-display-all-words-that-contain-these-characters/3962876#3962876
    #
    def xz_re_word_find(text):
        for word in re.findall(r'\w+', text):
            if 'x' in word and 'z' in word:
                print word
    
    
    # Tim's solution, compiled
    def xz_re_word_compiled_find(text):
        pattern = re.compile(r'\w+')
        for word in pattern.findall(text):
            if 'x' in word and 'z' in word:
                print word
    
    
    # Tim's solution, with the RE pre-compiled so compilation doesn't get
    # included in the search time
    def xz_re_word_precompiled_find(text):
        for word in WORD_RE_COMPILED.findall(text):
            if 'x' in word and 'z' in word:
                print word
    
    
    ################################
    # Steven Rumbalski's solution #1
    # (provided in the comment)
    # https://stackoverflow.com/questions/3962846/how-to-display-all-words-that-contain-these-characters/3963285#3963285
    def xz_re_z_find(text):
        for word in re.findall(r'(\b\w*z\w*\b)', text):
            if 'x' in word:
                print word
    
    
    # Steven's solution #1 compiled
    def xz_re_z_compiled_find(text):
        pattern = re.compile(r'(\b\w*z\w*\b)')
        for word in pattern.findall(text):
            if 'x' in word:
                print word
    
    
    # Steven's solution #1 with the RE pre-compiled
    def xz_re_z_precompiled_find(text):
        for word in Z_RE_COMPILED.findall(text):
            if 'x' in word:
                print word
    
    
    ################################
    # Steven Rumbalski's solution #2
    # https://stackoverflow.com/questions/3962846/how-to-display-all-words-that-contain-these-characters/3962934#3962934
    def xz_re_xz_find(text):
        for word in re.findall(r'\b(\w*z\w*x\w*|\w*x\w*z\w*)\b', text):
            print word
    
    
    # Steven's solution #2 compiled
    def xz_re_xz_compiled_find(text):
        pattern = re.compile(r'\b(\w*z\w*x\w*|\w*x\w*z\w*)\b')
        for word in pattern.findall(text):
            print word
    
    
    # Steven's solution #2 pre-compiled
    def xz_re_xz_precompiled_find(text):
        for word in XZ_RE_COMPILED.findall(text):
            print word
    
    
    #################################
    # Wooble's simple string solution
    def xz_str_find(text):
        for word in text.split():
            if 'x' in word and 'z' in word:
                print word
    
    
    functions = [
            'xz_re_word_find',
            'xz_re_word_compiled_find',
            'xz_re_word_precompiled_find',
            'xz_re_z_find',
            'xz_re_z_compiled_find',
            'xz_re_z_precompiled_find',
            'xz_re_xz_find',
            'xz_re_xz_compiled_find',
            'xz_re_xz_precompiled_find',
            'xz_str_find'
    ]
    
    import_stuff = functions + [
            'text',
            'WORD_RE_COMPILED',
            'Z_RE_COMPILED',
            'XZ_RE_COMPILED'
    ]
    
    
    if __name__ == '__main__':
    
        text = open(sys.argv[1]).read()
        timings = {}
        setup = 'from __main__ import ' + ','.join(import_stuff)
        for func in functions:
            statement = func + '(text)'
            timer = timeit.Timer(statement, setup)
            min_time = min(timer.repeat(3, 10))
            timings[func] = min_time
    
    
        for func in functions:
            print func + ":", timings[func], "seconds"
    

    在上运行此脚本 plaintext copy of Moby Dick Project Gutenberg ,在Python2.6上,我得到以下计时:

    xz_re_word_find: 1.21829485893 seconds
    xz_re_word_compiled_find: 1.42398715019 seconds
    xz_re_word_precompiled_find: 1.40110301971 seconds
    xz_re_z_find: 0.680151939392 seconds
    xz_re_z_compiled_find: 0.673038005829 seconds
    xz_re_z_precompiled_find: 0.673489093781 seconds
    xz_re_xz_find: 1.11700701714 seconds
    xz_re_xz_compiled_find: 1.12773990631 seconds
    xz_re_xz_precompiled_find: 1.13285303116 seconds
    xz_str_find: 0.590088844299 seconds
    

    在Python3.1中(使用 2to3 为了修复打印语句),我得到以下计时:

    xz_re_word_find: 2.36110496521 seconds
    xz_re_word_compiled_find: 2.34727501869 seconds
    xz_re_word_precompiled_find: 2.32607793808 seconds
    xz_re_z_find: 1.32204890251 seconds
    xz_re_z_compiled_find: 1.34104800224 seconds
    xz_re_z_precompiled_find: 1.34424304962 seconds
    xz_re_xz_find: 2.33851099014 seconds
    xz_re_xz_compiled_find: 2.29653286934 seconds
    xz_re_xz_precompiled_find: 2.32416701317 seconds
    xz_str_find: 0.656699895859 seconds
    

    我们可以看到,基于正则表达式的函数的运行时间往往是Python2.6中基于字符串方法的函数的两倍,而Python3中的运行时间则是后者的三倍以上。对于一次性解析来说,时间差很小(没有人会错过这些毫秒),但是对于必须多次调用函数的情况,基于字符串方法的方法既简单又快速。

        5
  •  2
  •   Tony Veijalainen    14 年前

    我不知道这台发电机的性能,但对我来说是这样的:

    from __future__ import print_function
    import string
    
    bookfile = '11.txt' # Alice in Wonderland
    hunted = 'az' # in your case xz but there is none of those in this book
    
    with open(bookfile) as thebook:
        # read text of book and split from white space
        print('\n'.join(set(word.lower().strip(string.punctuation)
                        for word in thebook.read().split()
                        if all(c in word.lower() for c in hunted))))
    """ Output:
    zealand
    crazy
    grazed
    lizard's
    organized
    lazy
    zigzag
    lizard
    lazily
    gazing
    ""
    

        6
  •  0
  •   Brad Mace Mike King    14 年前

    听起来像是一份工作 Regular Expressions . 读一下试试看。如果您遇到问题,请更新您的问题,我们可以帮助您解决具体问题。

        7
  •  0
  •   Paweł Nadolski    14 年前
    >>> import re
    >>> print re.findall('(\w*x\w*z\w*|\w*z\w*x\w*)', 'axbzc azb axb abc axzb')
    ['axbzc', 'axzb']