I just upgraded the
FASTA program on my computer from FASTA36.3.5d to FASTA36.3.7 via
http://faculty.virginia.edu/wrpearson/fasta/fasta36/
Then I see a very strange problem on FASTA header of my library file.
This is one entry in my library file:
>gi|54118649|gb|CO494158.1|CO494158 G.h.fbr-sw03548 G.h.fbr-sw Gossypium hirsutum cDNA, mRNA sequence
GCTTAGCGTGGTCGCGGCCGAGGTACTTTTTTTTTTTTTTTTTTTTTTTGGGAAACTTTCACAGTCTTGC
CATTTCCATAGTATTTAAATGATGACAAATTGGAGCAGGAATAACATTACAGTGCATGATACAAACAATT
AAGCTATAGGACTCTATTAAGTTATTCATTCTATGAAGATGATGCTAGTTTCCAATAGCAAATAAAGGCT
I simply use Smith-Waterman algorithm.
If I use FASTA36.3.7, the header of it in results will look like this:
The best scores are: s-w bits E(1)
gb|CO494158.1|CO494158 G.h.fbr-sw03548 G.h.fbr-sw Gossypium hirsutum cDNA, mRNA sequen ( 569) [r] 315 30.8 6.4e-06
>>gb|CO494158.1|CO494158 G.h.fbr-sw03548 G.h.fbr-sw Gossypium hirsutum cDNA, mRNA se (569 nt)
rev-comp s-w opt: 315 Z-score: 138.7 bits: 30.8 E(1): 6.4e-06
Smith-Waterman score: 315; 100.0% identity (100.0% similar) in 21 nt overlap (21-1:534-554)
Not that the "gi" part in header is missing.
But, if I reverse back to FASTA36.3.5d, the header looks fine:
The best scores are: s-w bits E(1)
gi|54118649|gb|CO494158.1|CO494158 G.h.fbr-sw03548 G.h.fbr-sw Gossypium hirsutum cDNA, ( 569) [r] 315 32.8 1.6e-06
>>gi|54118649|gb|CO494158.1|CO494158 G.h.fbr-sw03548 G.h.fbr-sw Gossypium hirsutum cDNA, mRNA se (569 nt)
rev-comp s-w opt: 315 Z-score: 149.3 bits: 32.8 E(1): 1.6e-06
Smith-Waterman score: 315; 100.0% identity (100.0% similar) in 21 nt overlap (21-1:534-554)
This causes my downstream parsing program to work weirdly. I am not sure whether someone has reported this issue.