Comments on: BLAST – C.gigas Larvae OA Illumina Data Against GenBank nt DB http://onsnetwork.org/kubu4/2015/05/04/blast-c-gigas-larvae-oa-illumina-data-against-genbank-nt-db/ University of Washington - Fishery Sciences - Roberts Lab Tue, 09 Apr 2019 18:37:58 +0000 hourly 1 http://wordpress.org/?v=4.0 By: kubu4 http://onsnetwork.org/kubu4/2015/05/04/blast-c-gigas-larvae-oa-illumina-data-against-genbank-nt-db/#comment-626 Thu, 07 May 2015 16:08:23 +0000 http://onsnetwork.org/kubu4/?p=1300#comment-626

Sorry, just saw this. I know you already spoke to Steven about this, but I have no idea what you’re talking about with “CTOT” and “CTOB”! Steven’s been the one dealing with the bisulfite mapping. Thanks for the heads up, though! Does all this interaction with you on this project mean you’re going to come back to our lab? :)

]]>
By: Mac http://onsnetwork.org/kubu4/2015/05/04/blast-c-gigas-larvae-oa-illumina-data-against-genbank-nt-db/#comment-615 Wed, 06 May 2015 18:57:50 +0000 http://onsnetwork.org/kubu4/?p=1300#comment-615 Thanks for the reply and the link to the notebook entry. Yeah, the evalues are not too good – as expected for bisulfite data I would think. Just to make sure, when the bisulfite alignment was done in bsmap it included mapping to the CTOT and CTOB strands? The reads won’t align to the OT and OB strands – which I think is the default for both bsmap and bismark. I was getting similar mapping before I figured this out :)

]]>
By: kubu4 http://onsnetwork.org/kubu4/2015/05/04/blast-c-gigas-larvae-oa-illumina-data-against-genbank-nt-db/#comment-613 Wed, 06 May 2015 18:12:05 +0000 http://onsnetwork.org/kubu4/?p=1300#comment-613

Yeah, Steven did a de-novo assembly with the BS-seq data because reads weren’t mapping to the C.gigas genome when using BS-Map. I took the de-novo-assembled BS-seq data (see this notebook entry) and BLASTed against the C.gigas genome and the results were not good. So, this particular BLAST against the GenBank nt DB was to potentially see if our sequencing data was actually “our” data by attempting to ID which species the data matched with the most. So, this wasn’t a “real” attempt at data analysis, it was mostly a long shot at seeing if our data was mostly comprised of another species, thus potentially being the wrong data set sent to use from the sequencing facility.

]]>
By: Mac http://onsnetwork.org/kubu4/2015/05/04/blast-c-gigas-larvae-oa-illumina-data-against-genbank-nt-db/#comment-612 Wed, 06 May 2015 17:12:45 +0000 http://onsnetwork.org/kubu4/?p=1300#comment-612 You’ve made contigs of the bisulfite data? I feel like between trying to first assemble bisulfite data and then trying to align bisulfite data – where you expect 25% mismatches anyway (no C’s) – there could be enough ‘error’ built in that alignments may not necessarily hit C. gigas first?

]]>
By: kubu4 http://onsnetwork.org/kubu4/2015/05/04/blast-c-gigas-larvae-oa-illumina-data-against-genbank-nt-db/#comment-610 Wed, 06 May 2015 16:22:25 +0000 http://onsnetwork.org/kubu4/?p=1300#comment-610

I haven’t done a BLAST with just raw sequencing reads (i.e. not already de-novo assembled into contigs), so not sure how it would play out. However, BLAST has an option (can’t remember if it’s “on” by default or if you have to use a flag in the BLAST arguments when running the command to enable it) that detects short query reads and takes their length in to account when evaluating database matches. This, in theory, should help ensure that matches using short query sequences are accurate. However, it probably would be prudent to run the BLAST and evaluate the top 10 matches (alignments, e-vals, bit score, etc) to see if the results agree with your expectations.

]]>
By: Mac http://onsnetwork.org/kubu4/2015/05/04/blast-c-gigas-larvae-oa-illumina-data-against-genbank-nt-db/#comment-598 Tue, 05 May 2015 19:11:31 +0000 http://onsnetwork.org/kubu4/?p=1300#comment-598 This may be a silly question, but have you ever done this type of blast with an RNA-Seq dataset with reads of similar length? Yes, I would expect C. gigas would be the best hit, but is there something inherent in the database where ‘shortish’ reads would map to other species as a top hit?

]]>