这是您修改的代码。现在,它将生成所需的输出。
with open("sample.txt") as infile:
matrix = [line.split() for line in infile.readlines()]
header_list = [row[0] for row in matrix]
seq_list = [str(row[1]) for row in matrix]
disorder_list = [str(row[2]) for row in matrix]
f = open('new_sample.txt', 'a')
for i in range(len(header_list)):
header = header_list[i]
seq = seq_list[i]
disorder = disorder_list[i]
# count sequence length and total residue of missing coordinates
sequence_length = len(seq)
# get total number of missing coordinates
num_missing = disorder.count('X')
# get the range of these missing coordinates
first_X_pos = disorder.find('X')
last_X_pos = disorder.rfind('X')
range_missing = '-'.join([str(first_X_pos), str(last_X_pos)])
reformat_seq=" ".join([header, str(num_missing), range_missing, str(sequence_length), seq, '\n'])
f.write(reformat_seq)
f.close()
更多提示:
不要忘记python的字符串函数。他们会自动解决你的许多问题。这个
documentation
非常好。
如果你在问题中搜索如何只做第二部分或第三部分,你会在其他地方找到结果。