Then I see a very strange problem on FASTA header of my library file.
This is one entry in my library file:
>gi|54118649|gb|CO494158.1|CO494158 G.h.fbr-sw03548 G.h.fbr-sw Gossypium hirsutum cDNA, mRNA sequence GCTTAGCGTGGTCGCGGCCGAGGTACTTTTTTTTTTTTTTTTTTTTTTTGGGAAACTTTCACAGTCTTGC CATTTCCATAGTATTTAAATGATGACAAATTGGAGCAGGAATAACATTACAGTGCATGATACAAACAATT AAGCTATAGGACTCTATTAAGTTATTCATTCTATGAAGATGATGCTAGTTTCCAATAGCAAATAAAGGCT
I simply use Smith-Waterman algorithm.
If I use FASTA36.3.7, the header of it in results will look like this:
The best scores are: s-w bits E(1) gb|CO494158.1|CO494158 G.h.fbr-sw03548 G.h.fbr-sw Gossypium hirsutum cDNA, mRNA sequen ( 569) [r] 315 30.8 6.4e-06 >>gb|CO494158.1|CO494158 G.h.fbr-sw03548 G.h.fbr-sw Gossypium hirsutum cDNA, mRNA se (569 nt) rev-comp s-w opt: 315 Z-score: 138.7 bits: 30.8 E(1): 6.4e-06 Smith-Waterman score: 315; 100.0% identity (100.0% similar) in 21 nt overlap (21-1:534-554)
Not that the "gi" part in header is missing.
But, if I reverse back to FASTA36.3.5d, the header looks fine:
The best scores are: s-w bits E(1) gi|54118649|gb|CO494158.1|CO494158 G.h.fbr-sw03548 G.h.fbr-sw Gossypium hirsutum cDNA, ( 569) [r] 315 32.8 1.6e-06 >>gi|54118649|gb|CO494158.1|CO494158 G.h.fbr-sw03548 G.h.fbr-sw Gossypium hirsutum cDNA, mRNA se (569 nt) rev-comp s-w opt: 315 Z-score: 149.3 bits: 32.8 E(1): 1.6e-06 Smith-Waterman score: 315; 100.0% identity (100.0% similar) in 21 nt overlap (21-1:534-554)
This causes my downstream parsing program to work weirdly. I am not sure whether someone has reported this issue.
No comments:
Post a Comment