The fasta.py python module contains code for reading fasta records from a file.
I know that there is also a fasta parser in BioPython, but I find my own implementation cleaner and easier to use, especially since I use python's iteration mechanism rather than a homegrown iterator as does BioPython. This means that I can use simple for loops for scanning through a fasta file, and not have to use the silly while 1: ... test for None construction.
To iterate through all records in a file, use:
from fasta import fasta_itr for rec in fasta_itr(file): # do stuff with record h = rec.header s = rec.sequence ...
In this example, file can be either a file object or the name of a file.
To iterate through a slice of the records in a file, use:
from fasta import fasta_slice for (header,sequence) in fasta_slice(file, start, stop): # do stuff with record h = rec.header s = rec.sequence ...
As a final feature, it is possible to index the sequences in an iterator through names:
itr = fasta.fasta_itr(file) rec1 = itr["foobar"] slice = fasta.fasta_slice(file,2,4) rec2 = slice["baz"] rec3 = fasta.get_sequence(file,"qux")
Download the file and put it somewhere in your PYTHON_PATH.
Thomas Mailund, <mailund@birc.au.dk>, Bioinformatics Research Center, University of Aarhus.
Time-stamp: "2006-01-26 21:57:53 mailund"