IGV | Sam's Notebook

0000-0002-2747-368X

I took the sorted BAM file from yesterday’s corrected RNAseq genome alignment and converted it to a bedgraph using BEDTools genomeCoverageBed tool.

Analysis took place on our HPC Mox node.

SBATCH script file:

20180926_oly_RNAseq_bedgraphs.sh


#!/bin/bash
## Job Name
#SBATCH --job-name=20180926_oly_bedgraphs
## Allocation Definition 
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=5-00:00:00
## Memory per node
#SBATCH --mem=500G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/scrubbed/samwhite/20180926_oly_RNAseq_bedgraphs

# Load Python Mox module for Python module availability

module load intel-python3_2017

# Document programs in PATH (primarily for program version ID)

date >> system_path.log
echo "" >> system_path.log
echo "System PATH for $SLURM_JOB_ID" >> system_path.log
echo "" >> system_path.log
printf "%0.s-" {1..10} >> system_path.log
echo ${PATH} | tr : \\n >> system_path.log

# Set sorted transcriptome assembly bam file
oly_transcriptome_bam=/gscratch/scrubbed/samwhite/20180925_oly_RNAseq_genome_hisat2/20180925_Olurida_v081.sorted.bam


# Set program paths
bedtools=/gscratch/srlab/programs/bedtools-2.27.1/bin
samtools=/gscratch/srlab/programs/samtools-1.9/samtools


# Create bedgraph
## Reports depth at each position (-bg in bedgraph format) and report regions with zero coverage (-a).
## Screens for portions of reads coming from exons (-split).
## Add genome browser track line to header of bedgraph file.
${bedtools}/genomeCoverageBed \
-ibam ${oly_transcriptome_bam} \
-bga \
-split \
-trackline \
> 20180926_oly_RNAseq.bedgraph

RESULTS

Output folder:

20180926_oly_RNAseq_bedgraphs/

Bedgraph file (1.2GB):

20180926_oly_RNAseq_bedgraphs/20180926_oly_RNAseq.bedgraph

Loaded in to IGV to verify things looked OK:

0000-0002-2747-368X

Attempting to get some sort of analysis of the Ostrea lurida GBS data from BGI, particularly since the last run at it using Stacks crashed (literally) and burned (not literally).

Katherine Silliman at UIC recommended using PyRAD.

I’ve taken the example Jupyter notebook from the PyRAD website and passed a subset of the 96 individuals through it.

In this instance, the subset of individuals were all analyzed as a single population. I have another Jupyter notebook running on a different computer that will separate the three populations that are present in this subset.

Overall, I don’t fully understand the results. However, this seems to be the quickest assessment of the data (from the *.snps file generated):

28 individuals, 36424 loci, 72251 snps

Additionally, I did run into an issue when I tried to visualize the data (using the *.vcf file generated) in IGV (see screen cap below). I’ve posted the issue to the pyrad GitHub repo in hopes of getting it resolved.

One last thing. This might be obvious to most, but I discovered that trying to do all this computation over the network (via a mounted server share) is significantly slower than performing these operations on th efiles when they’re stored locally. Somewhere in the notebook you’ll notice that I copy all of the working directory from the server (Owl) to the local machine (Hummingbird). Things proceeded very quickly after doing that. Didn’t realize this would have so much impact on speed!!

Jupyter Notebook: 20160418_pyrad_oly_PE-GBS.ipynb

NBviewer: 20160418_pyrad_oly_PE-GBS

Sam's Notebook

University of Washington – Fishery Sciences – Roberts Lab

Tag Archives: IGV

Bedgraph – Olympia oyster transcriptome with Olurida_v081 genome assembly

RESULTS

Data Analysis – Subset Olympia Oyster GBS Data from BGI as Single Population Using PyRAD