Last week I popped out a quick assembly and annotation on our geoduck gonadal transcriptome. A second assembly was also done using Trinity.
Updates
August 3 – Confirmed //
in file location had no impact on assembly.
July 14 – TransDecoder protein annotations
10:40am – added TransDecoder results
10:29am – added Stats via Trinity
Trinity.pl --seqType fq -JM 24G --left /Volumes/web/cnidarian/Geo_Pool_F_GGCTAC_L006_R1_001_val_1.fq /Volumes/web/cnidarian/Geo_Pool_M_CTTGTA_L006_R1_001_val_1.fq --right /Volumes/web/cnidarian//Geo_Pool_F_GGCTAC_L006_R2_001_val_2.fq /Volumes/web/cnidarian//Geo_Pool_M_CTTGTA_L006_R2_001_val_2.fq --CPU 16
Output
0:999 127840 1000:1999 18164 2000:2999 5321 3000:3999 1817 4000:4999 762 5000:5999 291 6000:6999 135 7000:7999 73 8000:8999 22 9000:9999 29 10000:10999 4 11000:11999 5 12000:12999 3 13000:13999 4 14000:14999 4 15000:15999 3 16000:16999 0 17000:17999 2 18000:18999 1 Total length of sequence: 101862868 bp Total number of sequences: 154480 N25 stats: 25% of total sequence length is contained in the 8095 sequences >= 2045 bp N50 stats: 50% of total sequence length is contained in the 26158 sequences >= 1014 bp N75 stats: 75% of total sequence length is contained in the 64574 sequences >= 446 bp Total GC count: 37657770 bp GC %: 36.97 %
hummingbird:Geo-trinity steven$ /Users/gilesg/compile/trinityrnaseq_r20131110/util/TrinityStats.pl /Volumes/web/cnidarian/Geo-trinity/trinity_out_dir/Trinity.fasta ################################ ## Counts of transcripts, etc. ################################ Total trinity transcripts: 154480 Total trinity components: 100155 Percent GC: 36.97 ######################################## Stats based on ALL transcript contigs: ######################################## Contig N10: 3444 Contig N20: 2385 Contig N30: 1766 Contig N40: 1343 Contig N50: 1014 Median contig length: 371 Average contig: 659.39 Total assembled bases: 101862868 ##################################################### ## Stats based on ONLY LONGEST ISOFORM per COMPONENT: ##################################################### Contig N10: 2999 Contig N20: 2026 Contig N30: 1462 Contig N40: 1067 Contig N50: 768 Median contig length: 321 Average contig: 553.88 Total assembled bases: 55473621
Rerunning to see if double slash was a problem- did not see anything in error. Also running TransDecoder
TransDecoder Results
Ran the following
/Users/gilesg/compile/trinityrnaseq_r20131110/trinity-plugins/TransDecoder_r20131110/TransDecoder -t /Volumes/web/cnidarian/Geo-trinity/trinity_out_dir/Trinity.fasta
This provided a peptide file with 36003 sequences.
!head /Volumes/web-1/cnidarian/Geo-trinity/Trinity.fasta.transdecoder.pep
>cds.comp100047_c0_seq2|m.5982 comp100047_c0_seq2|g.5982 ORF comp100047_c0_seq2|g.5982 comp100047_c0_seq2|m.5982 type:internal len:142 (-) comp100047_c0_seq2:3-425(-)
NAECRDLYKIFTQILSVRSQEGKIVIPDEFATKIRNWLGNKEELFKEAHNQKIITFYNEY
TREENTFNPIRGKRPMSVPDMPERKYIDQLSRKTQSQCDFCKYKTFTAEDTFGRIDSNFS
CSASNAFKLDHWHALFLLKTH
Running blastp on Trinity.fasta.transdecoder.pep
!blastp -query /Volumes/web/cnidarian/Geo-trinity/Trinity.fasta.transdecoder.pep -db /usr/local/bioinformatics/dbs/uniprot_sprot.fasta -evalue 1e-5 -max_target_seqs 1 -max_hsps 1 -outfmt 6 -num_threads 4 -out /Volumes/web/cnidarian/Geo-trinity/Trinity.fasta.transdecoder.pep-blastp-uniprot-2.out
Recent Comments