{
"metadata": {
"name": "Larv_BS_Workflow_Example"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "Larvae Example Workflow- M1 Data (Spermatozoa from Male 1)"
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": "This is an example of my workflow for my bisulfite sequencing larvae dataset. I am working with a total of 28 files and will narrow it down to a workflow of 7 datasets (1 female/eggs, 2 males/sperm, and 4 larvae samples)"
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "1. Upload raw data to iPlant and unzip/uncompress this file this file using the gunzip analysis application"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "
"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "
"
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "2. Dowload uncompressed files and relocate to Eagle"
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": "File locations for M1 uncompressed (Nov): http://eagle.fish.washington.edu/Mollusk/index.php?dir=bs_larvae_exp%2FUncompressed_Nov%2F"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "
"
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "3. Concatenate November and September uncompressed files (R1 and R2 separate). These are technical replicates from the Bisulfite sequencing run."
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "M1_R1"
},
{
"cell_type": "code",
"collapsed": false,
"input": "!cat /Volumes/web/Mollusk/bs_larvae_exp/Uncompressed_Nov/filtered_BS_CgM1_ACTTGA_L004_R1.fastq /Volumes/web/Mollusk/bs_larvae_exp/Uncompressed_Sept/filtered_BS_CgM1_ACTTGA_L007_R1.fastq > /Volumes/web/Mollusk/bs_larvae_exp/Concatenated_Files_R1/M1_R1.fastq",
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "M1_R2"
},
{
"cell_type": "code",
"collapsed": false,
"input": "!cat /Volumes/web/Mollusk/bs_larvae_exp/Uncompressed_Nov/filtered_BS_CgM1_ACTTGA_L004_R2.fastq /Volumes/web/Mollusk/bs_larvae_exp/Uncompressed_Sept/filtered_BS_CgM1_ACTTGA_L007_R2.fastq > /Volumes/web/Mollusk/bs_larvae_exp/Concatenated_Files_R2/M1_R2.fastq ",
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": "Location of concatenated files:"
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "R1: http://eagle.fish.washington.edu/Mollusk/index.php?dir=bs_larvae_exp%2FConcatenated_Files_R1%2F"
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "R2: http://eagle.fish.washington.edu/Mollusk/index.php?dir=bs_larvae_exp%2FConcatenated_Files_R2%2F"
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "4. Upload these files to iPlant using the \"Import from URL\" function. This allows me to import multiple files from the URL location on Eagle. "
},
{
"cell_type": "markdown",
"metadata": {},
"source": "
"
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "5. Run BSMAP in iPlant"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "
"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "
"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "
"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "
"
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "6. Download BSMAP files from iPlant and upload to Eagle "
},
{
"cell_type": "code",
"collapsed": false,
"input": "ir=\"/usr/local/bin/irods3.2.icmds.mac.intel/\"",
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "!{ir}/icd Larvae",
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "!icd /iplant/home/che625/Larvae/BSMAP",
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": "!iget /iplant/home/che625/Larvae/BSMAP/BSMAP_analysis_M1-2014-03-04-17-56-24.707/logs/condor-stdout-M1 /Volumes/web-1/Mollusk/bs_larvae_exp/BSMAP_output",
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "7. Run methratio in iPython and filter for context and coverage"
},
{
"cell_type": "code",
"collapsed": false,
"input": "#working directory (parent)\nwd=\"/Volumes/web/Mollusk/bs_larvae_exp\"\n\n#where is bsmap\nbsmap=\"/Users/Shared/Apps/bsmap-2.73/\"\n\n#Where is bsmap file\nbsmapfile=\"/Volumes/web/Mollusk/bs_larvae_exp/BSMAP_output/\"\n\n#Output for methratio file\nmethratio=\"/Volumes/web/Mollusk/bs_larvae_exp/Methratio_out/\"\n\n#Location of filtered files\nfiltered=\"/Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/\"\n\n#genome file \ngenome=\"/Volumes/web/whale/ensembl/ftp.ensemblgenomes.org/pub/release-21/metazoa/fasta/crassostrea_gigas/dna/Crassostrea_gigas.GCA_000297895.1.21.dna_sm.genome.fa\"",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 33
},
{
"cell_type": "code",
"collapsed": false,
"input": "!python {bsmap}methratio.py -d {genome} -u -z -g -o {methratio}methratio_out_M1.txt -s {bsmap}samtools {bsmapfile}condor-stdout-M1.sam",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "@ Wed Mar 5 15:42:06 2014: reading reference /Volumes/web-1/whale/ensembl/ftp.ensemblgenomes.org/pub/release-21/metazoa/fasta/crassostrea_gigas/dna/Crassostrea_gigas.GCA_000297895.1.21.dna_sm.genome.fa ...\r\n"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "@ Wed Mar 5 15:43:41 2014: reading /Volumes/web-1/Mollusk/bs_larvae_exp/BSMAP_output/condor-stdout-M1.sam ...\r\n"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "[samopen] SAM header is present: 7658 sequences.\r\n"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\t@ Wed Mar 5 16:58:58 2014: read 10000000 lines\r\n"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "@ Wed Mar 5 17:00:08 2014: combining CpG methylation from both strands ...\r\n"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "@ Wed Mar 5 17:05:26 2014: writing /Volumes/web-1/Mollusk/bs_larvae_exp/Methratio_out/methratio_out_M1.txt ..."
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\r\n"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "@ Wed Mar 5 17:28:49 2014: done.\r\n"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "total 8562480 valid mappings, 48969439 covered cytosines, average coverage: 1.80 fold.\r\n"
}
],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": "#command for only obtaining the context '__CG_'\n!grep \"[A-Z][A-Z]CG[A-Z]\" <{methratio}methratio_out_M1.txt> {filtered}methratio_out_CG_M1.txt",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 29
},
{
"cell_type": "code",
"collapsed": false,
"input": "#obtaining a filtered file with at least 5x coverage\n!awk '{if ($8 >= 5) print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' /Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methratio_out_CG5x_M1.txt",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 43
},
{
"cell_type": "code",
"collapsed": false,
"input": "!tr ' ' \"\\t\" /Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methratio_out_CG5x_M1_tab.txt",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 44
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "8. Create a file formatted for visualization in IGV"
},
{
"cell_type": "code",
"collapsed": false,
"input": "!awk '{print $1,$2,$2+1,\"CpG\",$5}' /Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methratio_out_CG5x_M1_tab.txt >/Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methratio_out_CG5x_IGV_M1.txt ",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 45
},
{
"cell_type": "code",
"collapsed": false,
"input": "#Second line of code for creating a formatted file for IGV\n!tr ' ' \"\\t\" /Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methratio_out_CG5x_IGV_M1.igv",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 46
},
{
"cell_type": "code",
"collapsed": false,
"input": "!head /Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methratio_out_CG5x_IGV_M1.igv",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "C12768\t103\t104\tCpG\t0.167\r\nC12806\t142\t143\tCpG\t0.333\r\nC12924\t30\t31\tCpG\t0.000\r\nC12924\t38\t39\tCpG\t0.000\r\nC12924\t52\t53\tCpG\t0.000\r\nC12924\t60\t61\tCpG\t0.000\r\nC12924\t127\t128\tCpG\t0.000\r\nC12924\t136\t137\tCpG\t0.000\r\nC13128\t87\t88\tCpG\t0.000\r\nC13208\t83\t84\tCpG\t0.400\r\n"
}
],
"prompt_number": 47
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "9. Create a file formatted for visualization in the R package methylKit"
},
{
"cell_type": "code",
"collapsed": false,
"input": "!awk '{print $1,$2,$2+1,$3,$8,($7/$8),(1-($7/$8))}' /Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methylkit/methylkit_out_M1.txt ",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 53
},
{
"cell_type": "code",
"collapsed": false,
"input": "#Second line of code for creating a formatted file for methylKit\n!tr ' ' \"\\t\" /Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methylkit/methylkit_out_M1b.txt",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 54
},
{
"cell_type": "code",
"collapsed": false,
"input": "!head /Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methylkit/methylkit_out_M1b.txt",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": "C12768\t103\t104\t+\t6\t0.166667\t0.833333\r\nC12806\t142\t143\t+\t6\t0.333333\t0.666667\r\nC12924\t30\t31\t+\t5\t0\t1\r\nC12924\t38\t39\t+\t5\t0\t1\r\nC12924\t52\t53\t+\t6\t0\t1\r\nC12924\t60\t61\t+\t6\t0\t1\r\nC12924\t127\t128\t+\t6\t0\t1\r\nC12924\t136\t137\t+\t6\t0\t1\r\nC13128\t87\t88\t+\t6\t0\t1\r\nC13208\t83\t84\t+\t5\t0.4\t0.6\r\n"
}
],
"prompt_number": 55
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "Visualization of genomic tracks using IGV"
},
{
"cell_type": "code",
"collapsed": false,
"input": "",
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": "Comparison between samples using the R package methylKit"
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": "Total number of loci for this comparison= 1126"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "
"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "
"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "
"
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": "The female/eggs are definitely an outlier. I'm interested in comparing these samples excluding the Female from this analysis."
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": "Excluding eggs from the analysis the total number of loci for this comparison= 24193"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "
"
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": "Male sperm samples are clustering with their respective larvae offspring"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "
"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "
"
},
{
"cell_type": "code",
"collapsed": false,
"input": "",
"language": "python",
"metadata": {},
"outputs": []
}
],
"metadata": {}
}
]
}