代码之家  ›  专栏  ›  技术社区  ›  Symbal

如何在更正if语句之前打印X行

  •  1
  • Symbal  · 技术社区  · 7 年前

    我对Python非常陌生,对我在众多网页中找到的东西只有零碎的知识。

    也就是说,我试图在一个文件(大约10k行)中搜索我编写的一组类似“过滤器”的条件,然后我希望它打印出符合条件的行,以及前面有X行的行。

    我已经创建了以下脚本来打开所述文件,逐行迭代,并将符合筛选条件的行打印到输出文件中,但是我对如何将其合并到当前脚本中感到困惑。

    import os
    
    output_file = 'Output.txt'
    filename = 'BigFile.txt'                 
    
    numLines = 0
    numWords = 0
    numChrs = 0
    numMes = 0
    
    f1 = open(output_file, 'w')
    print 'Output File has been Opened'
    
    with open(filename, 'r') as file:
       for line in file:
          wordsList = line.split()
          numLines += 1
          numWords += len(wordsList)
          numChrs += len(line)
    
          if "X" in line and "Y" not in line and "Z" in line:
              numMes += 1
              print >>f1, line
              print 'Object found and Catalogued in Output.txt'                          
    
    print "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
    print >>f1, "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
    
    print "There are a total of %i thing in this file" % (numMes)
    print >>f1, "There are a total of %i things in this file" % (numMes)
    
    f1.close()
    
    print 'Output Files have been Closed'
    

    我的第一个猜测是 line.enumeration 但我不认为我可以这样说 lines - 5 打印之前为5的行 lines :

    lines = f1.enumeration()
    if "blah blah" in line and "so so" not in line:
        print >>f1, lines
        print >>f1, [lines - 5]
    

    但最好的部分还没有到来,因为我必须接受输出。txt文件,并与另一个文件进行比较,以输出两个文件中的匹配条件。。。但一步一个脚印,对吗?

    -也可以随意添加“适当”技巧的简介。。。我相信这个剧本可以写得更好,所以请务必告诉我我做错了什么。

    提前感谢您的帮助!


    更新时间: 由于以下帮助,已成功实施修复:

    import os
    
    output_file = 'Output.txt'
    filename = 'BigFile.txt'                 
    
    numLines = 0
    numWords = 0
    numChrs = 0
    
    numMulMes = 0
    
    last5 = []
    
    f1 = open(output_file, 'w')
    print 'Output Files have been Opened'
    
    with open(filename, 'r') as file:
        for line in file:
            wordsList = line.split()
            numLines += 1
            numWords += len(wordsList)
            numChrs += len(line)
            last5[:] = last5[-5:]+[line] 
            if "X" in line and "Y" not in line and "Z" not in line:
                del last5[1:5]           ###the missing piece of the puzzle!
                numMulMes += 1
                print >>f1, last5
                print 'Object found and Catalogued in Output.txt'
    
    print "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
    print >>f1, "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
    
    print "There are a total of %i messages in this file" % (numMulMes)
    print >>f1, "There are a total of %i messages in this file" % (numMulMes)
    
    f1.close()
    f3.close()
    
    print 'Output Files have been Closed'
    

    我一直试图通过另一个单独的脚本修改输出文件,在最长的时间里,我一直在与str vs lst操作和错误问题作斗争。只是一时兴起,决定回到原来的剧本,把它扔到那里,维奥拉。

    谢谢你把我推向了正确的方向,从那里很容易找到答案!

    4 回复  |  直到 7 年前
        1
  •  4
  •   Patrick Artner    7 年前

    你自己解决了大部分问题(数单词、行、行号等) -在浏览文件时,只需记住最后n行即可。

    示例:

    t = """"zero line
    one line
    two line
    three line
    four line 
    five line 
    six line
    seven line 
    eight line
    """ 
    
    last5 = [] # memory cell
    for l in t.split("\n"):  # similar to your for line in file: 
        last5[:] = last5[-4:]+[l] # keep last 4 and add current line, inplace list mod 
    
        if "six" in l:
            print last5
    

    您还可以查看 deque 并指定最大长度(需要导入)

    from collections import deque
    
    last5 = deque(maxlen=5)
    for l in t.split("\n"): 
        last5.append(l) # will automatically only keep 5 (maxlen)
    
        if "six" in l:
            print last5
    

    输出:

     # list version
     ['two line', 'three line', 'four line ', 'five line ', 'six line'] 
    
     # deque version
     deque(['two line', 'three line', 'four line ', 'five line ', 'six line'], maxlen=5) 
    
        2
  •  2
  •   rth    7 年前

    这里的解决方案与@PatricArtner建议的相同,但使用了环形缓冲区。它可能(也可能不是,我没有检查)在处理大文件时工作得更快。 想法很简单:我们可以创建一个具有所需大小(您应该保留的行数)和当前录制位置计数器的列表 cnt 。对于每一行,我们应该将cnt增加1,并根据缓冲区的大小进行模运算。因此 cnt公司 正在列表中循环。例如,如果列表大小为5 cnt = (cnt+1)%5 将给予 0 1 2 3 4 0 1 2 等等每一步 cnt公司 将指向列表中最旧的数据,这些数据将被新数据替代。下面是实现的一个示例。

    t = """"zero line
    six line - surprize 
    one line
    two line
    three line
    four line 
    five line 
    six line
    seven line 
    eight line
    """ 
    
    
    last5 = [None,None,None,None,None]
    cnt = 0
    for l in t.split("\n"):
      last5[cnt]=l
      if 'six' in l:
        print last5[(cnt+1)%5]
        print last5[(cnt+2)%5]
        print last5[(cnt+3)%5]
        print last5[(cnt+4)%5]
        print last5[(cnt+0)%5]
        print
      cnt = (cnt+1)%5
    

    输出非常简单:

    None
    None
    None
    "zero line
    six line - surprize 
    
    two line
    three line
    four line 
    five line 
    six line
    

    注: 如果你从一个文件中读取,而这个文件非常大,你需要保留的字符串非常大(例如,基因序列),并且你的情况不会经常触发,那么要聪明,不要在内存中保留字符串。在文件中创建最后一个字符串开始的位置列表,如果需要,请重新读取它们。下面是一个如何使其非常快速的示例。。。

    from numpy import random as rnd
    
    print "Creating the file ...."
    DNA=["G","C","T","A"]
    with open("bigdatafile","w") as fd:
        for i in xrange(5000):
            fd.write("".join([ DNA[rnd.randint(4)] for x in xrange(2000)])+"\n")
    print "DONE"
    print
    print "SEARCHING GGGGGGGGGGG"
    last5, cnt = [0,0,0,0,0], 1
    with open("bigdatafile","r") as fd:
        for i,l in enumerate(fd.readlines()):
            last5[cnt] = last5[(cnt+4)%5]+len(l)
            if "GGGGGGGGGGG" in l:
                print "FIND!"
                fd.seek(last5[(cnt+1)%5])
                print fd.read(last5[cnt]-last5[(cnt+1)%5])
            cnt = (cnt+1)%5
    
        3
  •  0
  •   Gene Burinsky    7 年前

    我把东西输出到字典,而不是写入文件。处理完整个文件后,汇总数据字典将以 json .使用Artner的测试文件。

    import os
    import json
    
    output_file = 'Output.txt'
    filename = 'BigFile.txt'                 
    
    #initiate output container
    outDict = {}
    for fields in ['numLines', 'numWords', 'numChrs', 'numMes']:
        outDict[fields] = 0
    
    outDict['lineNum'] = []    
    
    with open(filename, 'r') as file:
        for line in file:
          wordsList = line.strip().split("\s")
          outDict['numLines'] += 1
          outDict['numWords'] += len(wordsList)
          outDict['numChrs'] += len(line)
    
          #find items in the line
          if "t" in line:
              outDict['numMes'] += 1
              #save line number
              outDict['lineNum'].append(outDict['numLines']) 
              #save line content
              outDict['lineList'].append(line)
    
    #record output          
    with open(output_file, 'w') as f1:
        f1.write(json.dumps(outDict))    
    
    ##print lines of desire
    #x number of lines before
    x=5    
    with open(filename, 'r') as file:
        for i, line in enumerate(file):
            #iterate over line numbers for which condition is met
            for j in range(0,len(outDict['lineNum'])):
                #if line number is between found line num and line num minus x, print
                if (outDict['lineNum'][j]-x) <= i <= outDict['lineNum'][j]:
                    print(line)
    
        4
  •  0
  •   pault Tanjin    7 年前

    自从我在 comments ,下面是如何在*nix机器上使用 grep context line control 功能。

    首先假设您有以下文本文件 test.txt :

    zero line
    one line
    two line
    three line
    four line 
    five line 
    six line
    seven line 
    eight line
    

    如果你想 N 在匹配之前,可以使用 -B 选项例如,对于之前的5行 "six" :

    $ grep -B 5 six test.txt 
    one line
    two line
    three line
    four line 
    five line 
    six line
    

    还有 -A 您可以使用的选项 N 比赛后的线条和 -C 您可以使用它 N 前后的线条。