{ "metadata": { "name": "Larv_BS_Workflow_Example-IGV" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": "Larvae Example Workflow- M1 Data (Spermatozoa from Male 1)" }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": "This is an example of my workflow for my bisulfite sequencing larvae dataset. I am working with a total of 28 files and will narrow it down to a workflow of 7 datasets (1 female/eggs, 2 males/sperm, and 4 larvae samples)" }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": "1. Upload raw data to iPlant and unzip/uncompress this file this file using the gunzip analysis application" }, { "cell_type": "markdown", "metadata": {}, "source": "\"_87__Discovery_Environment_18C6A781.png\"/" }, { "cell_type": "markdown", "metadata": {}, "source": "\"_87__Discovery_Environment_18C6A7B0.png\"/" }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": "2. Dowload uncompressed files and relocate to Eagle" }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": "File locations for M1 uncompressed (Nov): http://eagle.fish.washington.edu/Mollusk/index.php?dir=bs_larvae_exp%2FUncompressed_Nov%2F" }, { "cell_type": "markdown", "metadata": {}, "source": "\"Directory_Listing_of__Mollusk_bs_larvae_exp_Uncompressed_Nov_18C6A8BC.png\"/" }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": "3. Concatenate November and September uncompressed files (R1 and R2 separate). These are technical replicates from the Bisulfite sequencing run." }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": "M1_R1" }, { "cell_type": "code", "collapsed": false, "input": "!cat /Volumes/web/Mollusk/bs_larvae_exp/Uncompressed_Nov/filtered_BS_CgM1_ACTTGA_L004_R1.fastq /Volumes/web/Mollusk/bs_larvae_exp/Uncompressed_Sept/filtered_BS_CgM1_ACTTGA_L007_R1.fastq > /Volumes/web/Mollusk/bs_larvae_exp/Concatenated_Files_R1/M1_R1.fastq", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": "M1_R2" }, { "cell_type": "code", "collapsed": false, "input": "!cat /Volumes/web/Mollusk/bs_larvae_exp/Uncompressed_Nov/filtered_BS_CgM1_ACTTGA_L004_R2.fastq /Volumes/web/Mollusk/bs_larvae_exp/Uncompressed_Sept/filtered_BS_CgM1_ACTTGA_L007_R2.fastq > /Volumes/web/Mollusk/bs_larvae_exp/Concatenated_Files_R2/M1_R2.fastq ", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": "Location of concatenated files:" }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": "R1: http://eagle.fish.washington.edu/Mollusk/index.php?dir=bs_larvae_exp%2FConcatenated_Files_R1%2F" }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": "R2: http://eagle.fish.washington.edu/Mollusk/index.php?dir=bs_larvae_exp%2FConcatenated_Files_R2%2F" }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": "4. Upload these files to iPlant using the \"Import from URL\" function. This allows me to import multiple files from the URL location on Eagle. " }, { "cell_type": "markdown", "metadata": {}, "source": "\"_87__Discovery_Environment_18C6AB07.png\"/" }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": "5. Run BSMAP in iPlant" }, { "cell_type": "markdown", "metadata": {}, "source": "\"_87__Discovery_Environment_18C6AD33.png\"/" }, { "cell_type": "markdown", "metadata": {}, "source": "\"_87__Discovery_Environment_18C6AD66.png\"/" }, { "cell_type": "markdown", "metadata": {}, "source": "\"_87__Discovery_Environment_18C6AD66.png\"/" }, { "cell_type": "markdown", "metadata": {}, "source": "\"_93__Discovery_Environment_18C7E400.png\"/" }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": "6. Download BSMAP files from iPlant and upload to Eagle " }, { "cell_type": "code", "collapsed": false, "input": "ir=\"/usr/local/bin/irods3.2.icmds.mac.intel/\"", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "!{ir}/icd Larvae", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "!icd /iplant/home/che625/Larvae/BSMAP", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": "!iget /iplant/home/che625/Larvae/BSMAP/BSMAP_analysis_M1-2014-03-04-17-56-24.707/logs/condor-stdout-M1 /Volumes/web-1/Mollusk/bs_larvae_exp/BSMAP_output", "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": "7. Run methratio in iPython and filter for context and coverage" }, { "cell_type": "code", "collapsed": false, "input": "#working directory (parent)\nwd=\"/Volumes/web/Mollusk/bs_larvae_exp\"\n\n#where is bsmap\nbsmap=\"/Users/Shared/Apps/bsmap-2.73/\"\n\n#Where is bsmap file\nbsmapfile=\"/Volumes/web/Mollusk/bs_larvae_exp/BSMAP_output/\"\n\n#Output for methratio file\nmethratio=\"/Volumes/web/Mollusk/bs_larvae_exp/Methratio_out/\"\n\n#Location of filtered files\nfiltered=\"/Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/\"\n\n#genome file \ngenome=\"/Volumes/web/whale/ensembl/ftp.ensemblgenomes.org/pub/release-21/metazoa/fasta/crassostrea_gigas/dna/Crassostrea_gigas.GCA_000297895.1.21.dna_sm.genome.fa\"", "language": "python", "metadata": {}, "outputs": [], "prompt_number": 33 }, { "cell_type": "code", "collapsed": false, "input": "!python {bsmap}methratio.py -d {genome} -u -z -g -o {methratio}methratio_out_M1.txt -s {bsmap}samtools {bsmapfile}condor-stdout-M1.sam", "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "@ Wed Mar 5 15:42:06 2014: reading reference /Volumes/web-1/whale/ensembl/ftp.ensemblgenomes.org/pub/release-21/metazoa/fasta/crassostrea_gigas/dna/Crassostrea_gigas.GCA_000297895.1.21.dna_sm.genome.fa ...\r\n" }, { "output_type": "stream", "stream": "stdout", "text": "@ Wed Mar 5 15:43:41 2014: reading /Volumes/web-1/Mollusk/bs_larvae_exp/BSMAP_output/condor-stdout-M1.sam ...\r\n" }, { "output_type": "stream", "stream": "stdout", "text": "[samopen] SAM header is present: 7658 sequences.\r\n" }, { "output_type": "stream", "stream": "stdout", "text": "\t@ Wed Mar 5 16:58:58 2014: read 10000000 lines\r\n" }, { "output_type": "stream", "stream": "stdout", "text": "@ Wed Mar 5 17:00:08 2014: combining CpG methylation from both strands ...\r\n" }, { "output_type": "stream", "stream": "stdout", "text": "@ Wed Mar 5 17:05:26 2014: writing /Volumes/web-1/Mollusk/bs_larvae_exp/Methratio_out/methratio_out_M1.txt ..." }, { "output_type": "stream", "stream": "stdout", "text": "\r\n" }, { "output_type": "stream", "stream": "stdout", "text": "@ Wed Mar 5 17:28:49 2014: done.\r\n" }, { "output_type": "stream", "stream": "stdout", "text": "total 8562480 valid mappings, 48969439 covered cytosines, average coverage: 1.80 fold.\r\n" } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": "#command for only obtaining the context '__CG_'\n!grep \"[A-Z][A-Z]CG[A-Z]\" <{methratio}methratio_out_M1.txt> {filtered}methratio_out_CG_M1.txt", "language": "python", "metadata": {}, "outputs": [], "prompt_number": 29 }, { "cell_type": "code", "collapsed": false, "input": "#obtaining a filtered file with at least 5x coverage\n!awk '{if ($8 >= 5) print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' /Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methratio_out_CG5x_M1.txt", "language": "python", "metadata": {}, "outputs": [], "prompt_number": 43 }, { "cell_type": "code", "collapsed": false, "input": "!tr ' ' \"\\t\" /Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methratio_out_CG5x_M1_tab.txt", "language": "python", "metadata": {}, "outputs": [], "prompt_number": 44 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": "8. Create a file formatted for visualization in IGV" }, { "cell_type": "code", "collapsed": false, "input": "!awk '{print $1,$2,$2+1,\"CpG\",$5}' /Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methratio_out_CG5x_M1_tab.txt >/Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methratio_out_CG5x_IGV_M1.txt ", "language": "python", "metadata": {}, "outputs": [], "prompt_number": 45 }, { "cell_type": "code", "collapsed": false, "input": "#Second line of code for creating a formatted file for IGV\n!tr ' ' \"\\t\" /Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methratio_out_CG5x_IGV_M1.igv", "language": "python", "metadata": {}, "outputs": [], "prompt_number": 46 }, { "cell_type": "code", "collapsed": false, "input": "!head /Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methratio_out_CG5x_IGV_M1.igv", "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": "C12768\t103\t104\tCpG\t0.167\r\nC12806\t142\t143\tCpG\t0.333\r\nC12924\t30\t31\tCpG\t0.000\r\nC12924\t38\t39\tCpG\t0.000\r\nC12924\t52\t53\tCpG\t0.000\r\nC12924\t60\t61\tCpG\t0.000\r\nC12924\t127\t128\tCpG\t0.000\r\nC12924\t136\t137\tCpG\t0.000\r\nC13128\t87\t88\tCpG\t0.000\r\nC13208\t83\t84\tCpG\t0.400\r\n" } ], "prompt_number": 47 }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": "Visualization of genomic tracks using IGV" }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": "It's evident from some tracks that methylation patterns from offspring are similar to those methyl marks from their respective male parent" }, { "cell_type": "markdown", "metadata": {}, "source": "\"scaffold31.jpg\"/" }, { "cell_type": "markdown", "metadata": {}, "source": "\"RStudio_18CF9C36.png\"/" }, { "cell_type": "markdown", "metadata": {}, "source": "\"scaffold160.jpg\"/" }, { "cell_type": "markdown", "metadata": {}, "source": "\"scaffold340.jpg\"/" }, { "cell_type": "heading", "level": 4, "metadata": {}, "source": "It also appears that a greater amount of DNA methylation is present during later developmental stages (Day 5 vs Day 3 larvae)" }, { "cell_type": "markdown", "metadata": {}, "source": "\"RStudio_18CF9C36.png\"/" }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": "Load this IGV session on your own computer:\nhttp://eagle.fish.washington.edu/Mollusk/bs_larvae_exp/IGV_Larvae/igv_session_larvae_updated.xml" } ], "metadata": {} } ] }