我有这样的文本文件:>ENST00000511961.1|ENSG00000013561.13|OTTHUMG00000129660.5|OTTHUMT00000370661.3|RNF14-003|RNF14|278
MSSEDREAQEDELLALASIYDGDEFRKAESVQGGETRIYLDLPQNFKIFVSGNSNECLQNSGFEYTICFLPPLVLNFELPPDYPSSSPPSFTLSGKWLSPTQLSALCKHLDNLWEEHRGSVVLFAWMQFLKEETLAYLNIVSPFELKIGSQKKVQRRTAQASPNTELDFGGAAGSDVDQEEIVDERAVQDVESLSNLIQEILDFDQAQQIKCFNSKLFLCSICFCEKLGSECMYFLECRHVYCKACLKDYFEIQIRDGQVQCLNCPEPKCPSVATPGQ
>ENST00000506822.1|ENSG00000013561.13|OTTHUMG00000129660.5|OTTHUMT00000370662.1|RNF14-004|GAPDH|132
MSSEDREAQEDELLALASIYDGDEFRKAESVQGGETRIYLDLPQNFKIFVSGNSNECLQNSGFEYTICFLPPLVLNFELPPDYPSSSPPSFTLSGKWLSPTQLSALCKHLDNLWEEHRGSVVLFAWMQFLKE
>ENST00000513019.1|ENSG00000013561.13|OTTHUMG00000129660.5|OTTHUMT00000370663.1|RNF14-005|ACTB|99
MSSEDREAQEDELLALASIYDGDEFRKAESVQGGETRIYLDLPQNFKIFVSGNSNECLQNSGFEYTICFLPPLVLNFELPPDYPSSSPPSFTLSGKWLS
>ENST00000356143.1|ENSG00000013561.13|OTTHUMG00000129660.5|-|RNF14-202|HELLE|474
MSSEDREAQEDELLALASIYDGDEFRKAESVQGGETRIYLDLPQNFKIFVSGNSNECLQNSGFEYTICFLPPLVLNFELPPDYPSSSPPSFTLSGKWLSPTQLSALCKHLDNLWEEHRGSVVLFAWMQFLKEETLAYLNIVSPFELKIGSQKKVQRRTAQASPNTELDFGGAAGSDVDQEEIVDERAVQDVESLSNLIQEILDFDQAQQIKCFNSKLFLCSICFCEKLGSECMYFLECRHVYCKACLKDYFEIQIRDGQVQCLNCPEPKCPSVATPGQVKELVEAELFARYDRLLLQSSLDLMADVVYCPRPCCQLPVMQEPGCTMGICSSCNFAFCTLCRLTYHGVSPCKVTAEKLMDLRNEYLQADEANKRLLDQRYGKRVIQKAL
像这个:from itertools import groupby
with open('infile.txt') as f:
groups = groupby(f, key=lambda x: not x.startswith(">"))
d = {}
for k,v in groups:
if not k:
key, val = list(v)[0].rstrip(), "".join(map(str.rstrip,next(groups)[1],""))
d[key] = val
k = d.keys()
res = [el[5:] for s in k for el in s.split("|")]
但它返回行中的所有元素开头 ">".
你知道如何解决它吗?
这里预计输出:["RNF14", "GAPDH", "ACTB", "HELLE"]