[Back] [FAQ] Fasta header extractor and splitter - help page
 

The fasta header extractor and splitter are able to do two simple tasks:

  1. Extract all the headers from a fasta file and output them in table format. This can be copied to excel for further editing. (The equivalent linux one-liner is: grep '>' sequencefile.fasta >outputfile.tsv
  2. Split each header using a specified character. Typically, this is useful for sequences downloaded from genbank with headers like this:
    >gi|93102408|ref|NM_007988.3| Mus musculus fatty acid synthase (Fasn), mRNA
    >gi|75677473|ref|NM_001033326.1| Mus musculus dehydrogenase/reductase (SDR family) X chromosome (Dhrsx), mRNA
    >gi|41872630|ref|NM_004104.4| Homo sapiens fatty acid synthase (FASN), mRNA
    >gi|32455238|ref|NM_181755.1| Homo sapiens hydroxysteroid (11-beta) dehydrogenase 1 (HSD11B1), transcript variant 2, mRNA
    >gi|32455237|ref|NM_005525.2| Homo sapiens hydroxysteroid (11-beta) dehydrogenase 1 (HSD11B1), transcript variant 1, mRNA
    
    You may then use '|' as the splitting character, which will give you the following output:

    Original headerSplitted header fields
    gi|93102408|ref|NM_007988.3| Mus musculus fatty acid synthase (Fasn), mRNAgi93102408refNM_007988.3 Mus musculus fatty acid synthase (Fasn), mRNA
    gi|75677473|ref|NM_001033326.1| Mus musculus dehydrogenase/reductase (SDR family) X chromosome (Dhrsx), mRNAgi75677473refNM_001033326.1 Mus musculus dehydrogenase/reductase (SDR family) X chromosome (Dhrsx), mRNA
    gi|41872630|ref|NM_004104.4| Homo sapiens fatty acid synthase (FASN), mRNAgi41872630refNM_004104.4 Homo sapiens fatty acid synthase (FASN), mRNA
    gi|32455238|ref|NM_181755.1| Homo sapiens hydroxysteroid (11-beta) dehydrogenase 1 (HSD11B1), transcript variant 2, mRNAgi32455238refNM_181755.1 Homo sapiens hydroxysteroid (11-beta) dehydrogenase 1 (HSD11B1), transcript variant 2, mRNA
    gi|32455237|ref|NM_005525.2| Homo sapiens hydroxysteroid (11-beta) dehydrogenase 1 (HSD11B1), transcript variant 1, mRNAgi32455237refNM_005525.2 Homo sapiens hydroxysteroid (11-beta) dehydrogenase 1 (HSD11B1), transcript variant 1, mRNA

    This output can be opened in excel and later reinserted into your sequences using the header replacer tool.