2014-12-28

FASTA header parsing problem from FASTA36.3.5d to FASTA36.3.7

I just upgraded the FASTA program on my computer from FASTA36.3.5d to FASTA36.3.7 via http://faculty.virginia.edu/wrpearson/fasta/fasta36/

Then I see a very strange problem on FASTA header of my library file.

This is one entry in my library file:

>gi|54118649|gb|CO494158.1|CO494158 G.h.fbr-sw03548 G.h.fbr-sw Gossypium hirsutum cDNA, mRNA sequence
GCTTAGCGTGGTCGCGGCCGAGGTACTTTTTTTTTTTTTTTTTTTTTTTGGGAAACTTTCACAGTCTTGC
CATTTCCATAGTATTTAAATGATGACAAATTGGAGCAGGAATAACATTACAGTGCATGATACAAACAATT
AAGCTATAGGACTCTATTAAGTTATTCATTCTATGAAGATGATGCTAGTTTCCAATAGCAAATAAAGGCT

I simply use Smith-Waterman algorithm.

If I use FASTA36.3.7, the header of it in results will look like this:

The best scores are:                                                                              s-w bits E(1)
gb|CO494158.1|CO494158 G.h.fbr-sw03548 G.h.fbr-sw Gossypium hirsutum cDNA, mRNA sequen ( 569) [r]  315 30.8 6.4e-06

>>gb|CO494158.1|CO494158 G.h.fbr-sw03548 G.h.fbr-sw Gossypium hirsutum cDNA, mRNA se              (569 nt)
rev-comp s-w opt: 315  Z-score: 138.7  bits: 30.8 E(1): 6.4e-06
Smith-Waterman score: 315; 100.0% identity (100.0% similar) in 21 nt overlap (21-1:534-554)

Not that the "gi" part in header is missing.

But, if I reverse back to FASTA36.3.5d, the header looks fine:

The best scores are:                                                                              s-w bits E(1)
gi|54118649|gb|CO494158.1|CO494158 G.h.fbr-sw03548 G.h.fbr-sw Gossypium hirsutum cDNA, ( 569) [r]  315 32.8 1.6e-06

>>gi|54118649|gb|CO494158.1|CO494158 G.h.fbr-sw03548 G.h.fbr-sw Gossypium hirsutum cDNA, mRNA se  (569 nt)
rev-comp s-w opt: 315  Z-score: 149.3  bits: 32.8 E(1): 1.6e-06
Smith-Waterman score: 315; 100.0% identity (100.0% similar) in 21 nt overlap (21-1:534-554)

This causes my downstream parsing program to work weirdly. I am not sure whether someone has reported this issue.

No comments: