{
"metadata": {
"name": "OsHV_host"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": "#Larval Transcriptome of OsHV exposed oysters\nA re-examination of Cgigas larval transcriptome using genome.\n\n\n_Updated: July 23, 2013 14:15PDT_ - removed iframe \n_Updated: July 22, 2013 07:15PDT_\n\n---\nSolid Files -- http://eagle.fish.washington.edu/whale/index.php?dir=GE%2Freads%2F \n\nImporting into CLCv6\n\n###Quality trim \n`\n Ambiguous trim = Yes\n Ambiguous limit = 2\n Quality trim = Yes\n Quality limit = 0.05\n Create report = Yes\n Save discarded sequences = No\n Remove 5' terminal nucleotides = No\n Minimum number of nucleotides in reads = 20\n Discard short reads = Yes\n Remove 3' terminal nucleotides = No\n Discard long reads = No\n Save broken pairs = No\n`\n\n\n\n###QC Report\n\n\n\n###Mapping to Transcriptome\n\nRNA-seq to genes\n\nParameters \n\n`\n Use strand specific assembly = No\n Create report = Yes\n References = oyster_v9_gene\n Count paired reads as two = No\n Use colorspace encoding = Yes\n Minimum number of reads = 10\n Additional upstream bases = 0\n Minimum read count fusion gene table = 5\n Minimum length of putative exons = 25\n Minimum exon coverage fraction = 0.2\n Minimum length fraction (long reads) = 0.9\n Use annotations for gene and transcript identification = No\n Create fusion gene table = No\n Expression value = TOTAL_GENE_COUNT\n Minimum similarity fraction = 0.8\n Expression level = Genes\n Create list of unmapped reads = No\n Unspecific match limit = 5\n Exon discovery = No\n Organism type = PROKARYOTE\n Additional downstream bases = 0\n Maximum paired distance = 250\n Minimum paired distance = 180\n Strand = Forward\n Maximum number of mismatches allowed (applies to short reads) = 2\n Expression value = Number of reads mapped to the gene\n` \n\n\n\n\n\n"
},
{
"cell_type": "code",
"collapsed": false,
"input": "#tab delim RNA-Seq file\n!head /Volumes/web/cnidarian/solid0078_20091105_RobertsLab_GE_F3\\ trimmed\\ RNA-Seq.txt",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\"Feature ID\"\t\"Expression values\"\t\"Gene length\"\t\"Unique gene reads\"\t\"Total gene reads\"\t\"RPKM\"\r\nCGI_10000780\t2\t1350\t2\t2\t0.272\r\nCGI_10000456\t28\t438\t18\t28\t11.735\r\nCGI_10000457\t5\t603\t4\t5\t1.522\r\nCGI_10000774\t1\t375\t1\t1\t0.49\r\nCGI_10000917\t2\t426\t2\t2\t0.862\r\nCGI_10000861\t7\t2004\t7\t7\t0.641\r\nCGI_10000994\t96\t1635\t74\t96\t10.779\r\nCGI_10000643\t0\t552\t0\t0\t0\r\nCGI_10000763\t4\t249\t4\t4\t2.949\r\n"
}
],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": "#SAM output\n!head /Volumes/web/cnidarian/solid0078_20091105_RobertsLab_GE_F3_tr_RNA-Seq.sam",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "@HD\tVN:1.0\tSO:unsorted\r\n@SQ\tSN:CGI_10000780\tLN:1350\r\n@SQ\tSN:CGI_10000456\tLN:438\r\n@SQ\tSN:CGI_10000457\tLN:603\r\n@SQ\tSN:CGI_10000774\tLN:375\r\n@SQ\tSN:CGI_10000917\tLN:426\r\n@SQ\tSN:CGI_10000861\tLN:2004\r\n@SQ\tSN:CGI_10000994\tLN:1635\r\n@SQ\tSN:CGI_10000643\tLN:552\r\n@SQ\tSN:CGI_10000763\tLN:249\r\n"
}
],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": "#SAM output\n!tail /Volumes/web/cnidarian/solid0078_20091105_RobertsLab_GE_F3_tr_RNA-Seq.sam",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "read_5447380\t0\tCGI_10003117\t263\t60\t50M\t*\t0\t0\tCCTCGCATTGGAAAACCCCAGTGTTTGGGTGGGTGTCACAAGAGGAAGAA\t@A@?B=AA;@BB@:@:;<=2;>6:?98>:77/7\tNH:i:1\tCS:Z:T20223313010200010001211100100110011121110222020220\r\nread_5447381\t0\tCGI_10003117\t357\t60\t48M\t*\t0\t0\tACCATTGAATCGATTTCCAAATGTGATGTTCCATCTGCAGTCCCTCTT\tB@B@BA?ABBAAABAABBA@5A@?@ABA>AAB@=?AB@?@7@@8@?@5\tNH:i:1\tCS:Z:T310130120323230020100311123110201322131212002220\r\nread_5447382\t0\tCGI_10003117\t375\t60\t44M\t*\t0\t0\tAAATGTGATGTTCCATCTGCAGTCCCTCTTTAAATGGTCACAGT\t@?BB@@BABB>B@A??BBB9=B:B@@AAB<59>@><@>A?>?<:\tNH:i:1\tCS:Z:T30031112311020132213121200222003003101211121\r\nread_5447383\t0\tCGI_10003117\t375\t60\t44M\t*\t0\t0\tAAATGTGATGTTCCATCTGCAGTCCCTCTTTAAATGGTCACAGT\t>@BB@?BBBA>@B>@<78?@=9@?A>>?;;\tNH:i:1\tCS:Z:T30031112311020132213121200222003003101211121\r\nread_5447384\t0\tCGI_10003117\t375\t60\t44M\t*\t0\t0\tAAATGTGATGTTCCATCTGCAGTCCCTCTTTAAATGGTCACAGT\t@BB@@>@BAA=@@B?=B@B::@>5==?@@=/*?B81>;AA?@:7\tNH:i:1\tCS:Z:T30031112311020132213121200222003003101211121\r\nread_5447385\t0\tCGI_10003117\t507\t60\t29M\t*\t0\t0\tGATGAATTGGTATAACATTGTCAACTCTT\tBBB?<@BBA=@BA?>=AA@;6@?<:8AA/\tNH:i:1\tCS:Z:T12312030101333011301121012220\r\nread_5447386\t0\tCGI_10003117\t609\t60\t3S35M\t*\t0\t0\tAGCACAGATCCAAACTGGAATTTACCAGAACCGCCAGA\tBBBABBBABBBB>BA>@??BB@?AB@B<3@A>=9AA>=\tNH:i:1\tCS:Z:T32311122320100121020300310122010330122\r\nread_5447387\t0\tCGI_10003117\t609\t60\t3S47M\t*\t0\t0\tAGCACAAATCCAAACTGGAATTTACCAGAACCGCCAGAAGAATACATCCC\tBBBABAA=BBBA@?B@=@?AB??BA?@<;A@:>=@>8:7@=><=@=<5;?\tNH:i:1\tCS:Z:T32311100320100121020300310122010330122022033113200\r\nread_5447388\t0\tCGI_10003117\t609\t60\t3S35M\t*\t0\t0\tAGCACAGATCCAAACTGGAATTTACCAGAACCGCCAGA\tBBBBBBB@BBBBBB@B?BBBBB@BB@@==BA;A>AA@=\tNH:i:1\tCS:Z:T32311122320100121020300310122010330122\r\nread_5447389\t0\tCGI_10003117\t609\t60\t48M2S\t*\t0\t0\tACAGATCCAAACTGGAATTTACCAGAACCGCCAGAAGAATACATCCCCCC\tBA@BBBBAB@BBAAB>ABB?BAA??>;=??>A>=97>@@AA537<;@=\tNH:i:1\tCS:Z:T31122320100121020300310122010330122022033113200000\r\n"
}
],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": "#Lets figure out a way to visualize what genes are expressed\n#Take RNA-seq file import into SQLShare\n#imported /Volumes/web/cnidarian/solid0078_20091105_RobertsLab_GE_F3\\ trimmed\\ RNA-Seq.txt\n#cleaned up\n",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 39
},
{
"cell_type": "markdown",
"metadata": {},
"source": " \n\n"
},
{
"cell_type": "code",
"collapsed": false,
"input": "#joining with annotation data.\n#create generic SQLShare Wiki workflow.\n",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 42
},
{
"cell_type": "markdown",
"metadata": {},
"source": "\n \n--- "
},
{
"cell_type": "code",
"collapsed": false,
"input": "!head /Volumes/web/cnidarian/Cgigas_larvae_RNAseq_OsHV_GO.csv",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "ID,SPID1,GOID,term,aspect\r\r\nCGI_10023548,A0AVK6,GO:0003677,DNA binding,F\r\r\nCGI_10023548,A0AVK6,GO:0003700,transcription factor activity,F\r\r\nCGI_10023548,A0AVK6,GO:0005634,nucleus,C\r\r\nCGI_10023548,A0AVK6,GO:0005667,transcription factor complex,C\r\r\nCGI_10023548,A0AVK6,GO:0006351,\"\"\"transcription, DNA-dependent\"\"\",P\r\r\nCGI_10023548,A0AVK6,GO:0006355,\"\"\"regulation of transcription, DNA-dependent\"\"\",P\r\r\nCGI_10023548,A0AVK6,GO:0007049,cell cycle,P\r\r\nCGI_10003125,A0AVT1,GO:0000166,nucleotide binding,F\r\r\nCGI_10012444,A0AVT1,GO:0000166,nucleotide binding,F\r\r\n"
}
],
"prompt_number": 43
},
{
"cell_type": "code",
"collapsed": false,
"input": "!wc /Volumes/web/cnidarian/Cgigas_larvae_RNAseq_OsHV_GO.csv",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": " 121979 315335 6935565 /Volumes/web/cnidarian/Cgigas_larvae_RNAseq_OsHV_GO.csv\r\n"
}
],
"prompt_number": 44
},
{
"cell_type": "code",
"collapsed": false,
"input": "#into GoCategorizer then ManyEyes",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 45
},
{
"cell_type": "markdown",
"metadata": {},
"source": ""
},
{
"cell_type": "markdown",
"metadata": {},
"source": "###Enrichment Analysis\n\nBackground - all SPIDs associated with oyster transcriptome\nGene list - SPID of gene with at least 10 unique reads\n\nKegg \n\n\n\n\n\n---\nBP-Fat \n\n\n \n \n"
},
{
"cell_type": "code",
"collapsed": false,
"input": "",
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": "--- \n###Mapping to OsHV\n\n \nimported genbank format to hav CDS, gene information.. \nstarted in CLC using same parameters..\n \n Minimum number of reads = 10\n Minimum exon coverage fraction = 0.2\n Additional downstream bases = 0\n Use colorspace encoding = Yes\n Create report = Yes\n Use strand specific assembly = No\n Count paired reads as two = No\n Minimum length fraction (long reads) = 0.9\n Additional upstream bases = 0\n Unspecific match limit = 5\n Expression value = RPKM\n Minimum read count fusion gene table = 5\n Create fusion gene table = No\n Minimum paired distance = 180\n Use annotations for gene and transcript identification = Yes\n Strand = Forward\n Organism type = PROKARYOTE\n Maximum number of mismatches allowed (applies to short reads) = 2\n Minimum similarity fraction = 0.8\n References = NC_005881\n Expression level = Genes\n Minimum length of putative exons = 25\n Maximum paired distance = 250\n Create list of unmapped reads = No\n Exon discovery = No\n Expression value = Read Per Kilobase of exon Model value\n \n---\n`\nFound: 127 genes.\nTotal number of reads : 21344598 ( single reads: 21344598, paired reads: 0 )\nTotal number of mapped reads : 21135 ( single reads: 21135, paired reads: 0 )\nTotal number of unmapped reads : 21323463 ( single reads: 21323463, paired reads: 0 )\n` \n \n\n \n \n---\n \n \n \n \n"
},
{
"cell_type": "code",
"collapsed": false,
"input": "!head /Volumes/web/cnidarian/solid0078_20091105_RobertsLab_GE_F3trim_RNAseqOSHV.csv",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "\"Feature ID\",\"Expression values\",\"Gene length\",\"Unique gene reads\",\"Total gene reads\",\"RPKM\",\"Chromosome region start\",\"Chromosome region end\"\r\n\"OsHV1_gp001\",\"423.399\",\"447\",\"0\",\"4\",\"423.399\",\"115\",\"561\"\r\n\"OsHV1_gp002\",\"751.03\",\"504\",\"0\",\"8\",\"751.03\",\"680\",\"1183\"\r\n\"OsHV1_gp003\",\"61.85\",\"765\",\"0\",\"1\",\"61.85\",\"1890\",\"2654\"\r\n\"OsHV1_gp004\",\"2748.769\",\"1050\",\"0\",\"61\",\"2748.769\",\"3384\",\"4433\"\r\n\"OsHV1_gp005\",\"3168.303\",\"2031\",\"70\",\"136\",\"3168.303\",\"6421\",\"8451\"\r\n\"OsHV1_gp006\",\"1534.465\",\"3546\",\"115\",\"115\",\"1534.465\",\"8628\",\"12173\"\r\n\"OsHV1_gp007\",\"1429.304\",\"960\",\"29\",\"29\",\"1429.304\",\"12211\",\"13170\"\r\n\"OsHV1_gp008\",\"4016.047\",\"1779\",\"151\",\"151\",\"4016.047\",\"13258\",\"15036\"\r\n\"OsHV1_gp009\",\"6345.436\",\"1029\",\"138\",\"138\",\"6345.436\",\"15297\",\"16325\"\r\n"
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": "---\n### Reference Mapping to OsHV\n \n \n\n\n\n\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "##IGV Browser for OsHV"
},
{
"cell_type": "code",
"collapsed": false,
"input": "",
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}