The fasta header extractor and splitter are able to do two simple tasks:
- Extract all the headers from a fasta file and output them in table format. This can be copied to excel for further editing. (The equivalent linux one-liner is: grep '>' sequencefile.fasta >outputfile.tsv
- Split each header using a specified character. Typically, this is useful for sequences downloaded from genbank with headers like this:
>gi|93102408|ref|NM_007988.3| Mus musculus fatty acid synthase (Fasn), mRNA
>gi|75677473|ref|NM_001033326.1| Mus musculus dehydrogenase/reductase (SDR family) X chromosome (Dhrsx), mRNA
>gi|41872630|ref|NM_004104.4| Homo sapiens fatty acid synthase (FASN), mRNA
>gi|32455238|ref|NM_181755.1| Homo sapiens hydroxysteroid (11-beta) dehydrogenase 1 (HSD11B1), transcript variant 2, mRNA
>gi|32455237|ref|NM_005525.2| Homo sapiens hydroxysteroid (11-beta) dehydrogenase 1 (HSD11B1), transcript variant 1, mRNA
You may then use '|' as the splitting character, which will give you the following output:
Original header | Splitted header fields |
gi|93102408|ref|NM_007988.3| Mus musculus fatty acid synthase (Fasn), mRNA | gi | 93102408 | ref | NM_007988.3 | Mus musculus fatty acid synthase (Fasn), mRNA |
gi|75677473|ref|NM_001033326.1| Mus musculus dehydrogenase/reductase (SDR family) X chromosome (Dhrsx), mRNA | gi | 75677473 | ref | NM_001033326.1 | Mus musculus dehydrogenase/reductase (SDR family) X chromosome (Dhrsx), mRNA |
gi|41872630|ref|NM_004104.4| Homo sapiens fatty acid synthase (FASN), mRNA | gi | 41872630 | ref | NM_004104.4 | Homo sapiens fatty acid synthase (FASN), mRNA |
gi|32455238|ref|NM_181755.1| Homo sapiens hydroxysteroid (11-beta) dehydrogenase 1 (HSD11B1), transcript variant 2, mRNA | gi | 32455238 | ref | NM_181755.1 | Homo sapiens hydroxysteroid (11-beta) dehydrogenase 1 (HSD11B1), transcript variant 2, mRNA |
gi|32455237|ref|NM_005525.2| Homo sapiens hydroxysteroid (11-beta) dehydrogenase 1 (HSD11B1), transcript variant 1, mRNA | gi | 32455237 | ref | NM_005525.2 | Homo sapiens hydroxysteroid (11-beta) dehydrogenase 1 (HSD11B1), transcript variant 1, mRNA |
This output can be opened in excel and later reinserted into your sequences using the header replacer tool.
|
|