{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Larvae Example Workflow- M1 Data (Spermatozoa from Male 1)"
]
},
{
"cell_type": "heading",
"level": 2,
"metadata": {},
"source": [
"This is an example of my workflow for my bisulfite sequencing larvae dataset. I am working with a total of 28 files and will narrow it down to a workflow of 7 datasets (1 female/eggs, 2 males/sperm, and 4 larvae samples)"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"1. Upload raw data to iPlant and unzip/uncompress this file this file using the gunzip analysis application"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"2. Dowload uncompressed files and relocate to Eagle"
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"File locations for M1 uncompressed (Nov): http://eagle.fish.washington.edu/Mollusk/index.php?dir=bs_larvae_exp%2FUncompressed_Nov%2F"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"3. Concatenate November and September uncompressed files (R1 and R2 separate). These are technical replicates from the Bisulfite sequencing run."
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"M1_R1"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!cat /Volumes/web/Mollusk/bs_larvae_exp/Uncompressed_Nov/filtered_BS_CgM1_ACTTGA_L004_R1.fastq /Volumes/web/Mollusk/bs_larvae_exp/Uncompressed_Sept/filtered_BS_CgM1_ACTTGA_L007_R1.fastq > /Volumes/web/Mollusk/bs_larvae_exp/Concatenated_Files_R1/M1_R1.fastq"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"M1_R2"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!cat /Volumes/web/Mollusk/bs_larvae_exp/Uncompressed_Nov/filtered_BS_CgM1_ACTTGA_L004_R2.fastq /Volumes/web/Mollusk/bs_larvae_exp/Uncompressed_Sept/filtered_BS_CgM1_ACTTGA_L007_R2.fastq > /Volumes/web/Mollusk/bs_larvae_exp/Concatenated_Files_R2/M1_R2.fastq "
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"Location of concatenated files:"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"R1: http://eagle.fish.washington.edu/Mollusk/index.php?dir=bs_larvae_exp%2FConcatenated_Files_R1%2F"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"R2: http://eagle.fish.washington.edu/Mollusk/index.php?dir=bs_larvae_exp%2FConcatenated_Files_R2%2F"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"4. Upload these files to iPlant using the \"Import from URL\" function. This allows me to import multiple files from the URL location on Eagle. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"5. Run BSMAP in iPlant"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"6. Download BSMAP files from iPlant and upload to Eagle "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ir=\"/usr/local/bin/irods3.2.icmds.mac.intel/\""
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!{ir}/icd Larvae"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!icd /iplant/home/che625/Larvae/BSMAP"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!iget /iplant/home/che625/Larvae/BSMAP/BSMAP_analysis_M1-2014-03-04-17-56-24.707/logs/condor-stdout-M1 /Volumes/web-1/Mollusk/bs_larvae_exp/BSMAP_output"
],
"language": "python",
"metadata": {},
"outputs": []
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"7. Run methratio in iPython and filter for context and coverage"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#working directory (parent)\n",
"wd=\"/Volumes/web/Mollusk/bs_larvae_exp\"\n",
"\n",
"#where is bsmap\n",
"bsmap=\"/Users/Shared/Apps/bsmap-2.73/\"\n",
"\n",
"#Where is bsmap file\n",
"bsmapfile=\"/Volumes/web/Mollusk/bs_larvae_exp/BSMAP_output/\"\n",
"\n",
"#Output for methratio file\n",
"methratio=\"/Volumes/web/Mollusk/bs_larvae_exp/Methratio_out/\"\n",
"\n",
"#Location of filtered files\n",
"filtered=\"/Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/\"\n",
"\n",
"#genome file \n",
"genome=\"/Volumes/web/whale/ensembl/ftp.ensemblgenomes.org/pub/release-21/metazoa/fasta/crassostrea_gigas/dna/Crassostrea_gigas.GCA_000297895.1.21.dna_sm.genome.fa\""
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 33
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!python {bsmap}methratio.py -d {genome} -u -z -g -o {methratio}methratio_out_M1.txt -s {bsmap}samtools {bsmapfile}condor-stdout-M1.sam"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"@ Wed Mar 5 15:42:06 2014: reading reference /Volumes/web-1/whale/ensembl/ftp.ensemblgenomes.org/pub/release-21/metazoa/fasta/crassostrea_gigas/dna/Crassostrea_gigas.GCA_000297895.1.21.dna_sm.genome.fa ...\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"@ Wed Mar 5 15:43:41 2014: reading /Volumes/web-1/Mollusk/bs_larvae_exp/BSMAP_output/condor-stdout-M1.sam ...\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[samopen] SAM header is present: 7658 sequences.\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\t@ Wed Mar 5 16:58:58 2014: read 10000000 lines\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"@ Wed Mar 5 17:00:08 2014: combining CpG methylation from both strands ...\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"@ Wed Mar 5 17:05:26 2014: writing /Volumes/web-1/Mollusk/bs_larvae_exp/Methratio_out/methratio_out_M1.txt ..."
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"@ Wed Mar 5 17:28:49 2014: done.\r\n"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"total 8562480 valid mappings, 48969439 covered cytosines, average coverage: 1.80 fold.\r\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#command for only obtaining the context '__CG_'\n",
"!grep \"[A-Z][A-Z]CG[A-Z]\" <{methratio}methratio_out_M1.txt> {filtered}methratio_out_CG_M1.txt"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 29
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#obtaining a filtered file with at least 5x coverage\n",
"!awk '{if ($8 >= 5) print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' /Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methratio_out_CG5x_M1.txt"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 43
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!tr ' ' \"\\t\" /Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methratio_out_CG5x_M1_tab.txt"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 44
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"8. Create a file formatted for visualization in IGV"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!awk '{print $1,$2,$2+1,\"CpG\",$5}' /Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methratio_out_CG5x_M1_tab.txt >/Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methratio_out_CG5x_IGV_M1.txt "
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 45
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"#Second line of code for creating a formatted file for IGV\n",
"!tr ' ' \"\\t\" /Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methratio_out_CG5x_IGV_M1.igv"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 46
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"!head /Volumes/web/Mollusk/bs_larvae_exp/Filtered_Larvae_Files/methratio_out_CG5x_IGV_M1.igv"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"C12768\t103\t104\tCpG\t0.167\r\n",
"C12806\t142\t143\tCpG\t0.333\r\n",
"C12924\t30\t31\tCpG\t0.000\r\n",
"C12924\t38\t39\tCpG\t0.000\r\n",
"C12924\t52\t53\tCpG\t0.000\r\n",
"C12924\t60\t61\tCpG\t0.000\r\n",
"C12924\t127\t128\tCpG\t0.000\r\n",
"C12924\t136\t137\tCpG\t0.000\r\n",
"C13128\t87\t88\tCpG\t0.000\r\n",
"C13208\t83\t84\tCpG\t0.400\r\n"
]
}
],
"prompt_number": 47
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Visualization of genomic tracks using IGV"
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"It's evident from some tracks that methylation patterns from offspring are similar to those methyl marks from their respective male parent"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "heading",
"level": 4,
"metadata": {},
"source": [
"It also appears that a greater amount of DNA methylation is present during later developmental stages (Day 5 vs Day 3 larvae)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
"
]
},
{
"cell_type": "heading",
"level": 3,
"metadata": {},
"source": [
"Load this IGV session on your own computer:\n",
"http://eagle.fish.washington.edu/Mollusk/bs_larvae_exp/IGV_Larvae/igv_session_larvae_updated.xml"
]
}
],
"metadata": {}
}
]
}