代码之家  ›  专栏  ›  技术社区  ›  foobarfuzzbizz

如何处理此文本文件并分析所需内容?

  •  2
  • foobarfuzzbizz  · 技术社区  · 15 年前

    我试图解析python doctest模块中的ouput并将其存储在HTML文件中。

    我有类似的输出:

    **********************************************************************
    File "example.py", line 16, in __main__.factorial
    Failed example:
        [factorial(n) for n in range(6)]
    Expected:
        [0, 1, 2, 6, 24, 120]
    Got:
        [1, 1, 2, 6, 24, 120]
    **********************************************************************
    File "example.py", line 20, in __main__.factorial
    Failed example:
        factorial(30)
    Expected:
        25252859812191058636308480000000L
    Got:
        265252859812191058636308480000000L
    **********************************************************************
    1 items had failures:
       2 of   8 in __main__.factorial
    ***Test Failed*** 2 failures.
    

    每个失败前面都有一行星号,这些星号将每个测试失败彼此分隔开来。

    我想做的是去掉失败的文件名和方法,以及预期的和实际的结果。然后我想用这个创建一个HTML文档(或者将它存储在一个文本文件中,然后进行第二轮解析)。

    我如何使用python或unix shell实用程序的某些组合来实现这一点?

    编辑:我编写了下面的shell脚本,它按照我希望的方式匹配每个块,但是我不确定如何将每个sed匹配重定向到自己的文件。

    python example.py | sed -n '/.*/,/^\**$/p' > `mktemp error.XXX`
    
    4 回复  |  直到 15 年前
        1
  •  1
  •   Roberto Bonvallet    15 年前

    这是一个快速而脏的脚本,它将输出解析为包含相关信息的元组:

    import sys
    import re
    
    stars_re = re.compile('^[*]+$', re.MULTILINE)
    file_line_re = re.compile(r'^File "(.*?)", line (\d*), in (.*)$')
    
    doctest_output = sys.stdin.read()
    chunks = stars_re.split(doctest_output)[1:-1]
    
    for chunk in chunks:
        chunk_lines = chunk.strip().splitlines()
        m = file_line_re.match(chunk_lines[0])
    
        file, line, module = m.groups()
        failed_example = chunk_lines[2].strip()
        expected = chunk_lines[4].strip()
            got = chunk_lines[6].strip()
    
        print (file, line, module, failed_example, expected, got)
    
        2
  •  4
  •   Ned Batchelder    15 年前

    您可以编写一个python程序来区分这一点,但最好的做法是首先修改doctest以输出您想要的报告。从doctest.doctestrunner的文档中:

                                      ... the display output
    can be also customized by subclassing DocTestRunner, and
    overriding the methods `report_start`, `report_success`,
    `report_unexpected_exception`, and `report_failure`.
    
        3
  •  1
  •   David Raznick    15 年前

    我在pyparsing中编写了一个快速的解析器来完成这项工作。

    from pyparsing import *
    
    str = """
    **********************************************************************
    File "example.py", line 16, in __main__.factorial
    Failed example:
        [factorial(n) for n in range(6)]
    Expected:
        [0, 1, 2, 6, 24, 120]
    Got:
        [1, 1, 2, 6, 24, 120]
    **********************************************************************
    File "example.py", line 20, in __main__.factorial
    Failed example:
        factorial(30)
    Expected:
        25252859812191058636308480000000L
    Got:
        265252859812191058636308480000000L
    **********************************************************************
    """
    
    quote = Literal('"').suppress()
    comma = Literal(',').suppress()
    in_ = Keyword('in').suppress()
    block = OneOrMore("**").suppress() + \
            Keyword("File").suppress() + \
            quote + Word(alphanums + ".") + quote + \
            comma + Keyword("line").suppress() + Word(nums) + comma + \
            in_ + Word(alphanums + "._") + \
            LineStart() + restOfLine.suppress() + \
            LineStart() + restOfLine + \
            LineStart() + restOfLine.suppress() + \
            LineStart() + restOfLine + \
            LineStart() + restOfLine.suppress() + \
            LineStart() + restOfLine  
    
    all = OneOrMore(Group(block))
    
    result = all.parseString(str)
    
    for section in result:
        print section
    

    给予

    ['example.py', '16', '__main__.factorial', '    [factorial(n) for n in range(6)]', '    [0, 1, 2, 6, 24, 120]', '    [1, 1, 2, 6, 24, 120]']
    ['example.py', '20', '__main__.factorial', '    factorial(30)', '    25252859812191058636308480000000L', '    265252859812191058636308480000000L']
    
        4
  •  0
  •   Sean    15 年前

    这可能是我写过的最不优雅的Python脚本之一,但是它应该有一个框架来做您想要做的事情,而不需要借助于Unix实用程序和单独的脚本来创建HTML。它还没有经过测试,但只需要稍作调整就可以了。

    import os
    import sys
    
    #create a list of all files in directory
    dirList = os.listdir('')
    
    #Ignore anything that isn't a .txt file.
    #
    #Read in text, then split it into a list.
    for thisFile in dirList:
        if thisFile.endswith(".txt"):
            infile = open(thisFile,'r')
    
            rawText = infile.read()
    
            yourList = rawText.split('\n')
    
            #Strings
            compiledText = ''
            htmlText = ''
    
            for i in yourList:
    
                #clunky way of seeing whether or not current line  
                #should be included in compiledText
    
                if i.startswith("*****"):
                    compiledText += "\n\n--- New Report ---\n"
    
                if i.startswith("File"):
                    compiledText += i + '\n'
    
                if i.startswith("Fail"):
                    compiledText += i + '\n'
    
                if i.startswith("Expe"):
                    compiledText += i + '\n'
    
                if i.startswith("Got"):
                    compiledText += i + '\n'
    
                if i.startswith(" "):
                    compiledText += i + '\n'
    
    
        #insert your HTML template below
    
        htmlText = '<html>...\n <body> \n '+htmlText+'</body>... </html>'
    
    
        #write out to file
        outfile = open('processed/'+thisFile+'.html','w')
        outfile.write(htmlText)
        outfile.close()