Bedgraph - Olympia oyster transcriptome (FAIL)

Progress on generating bedgraphs from our Olympia oyster transcriptome continues.

Transcriptome assembly with Trinity completed 20180919.

Then, aligned the assembled transcriptome to our genome using Bowtie2.

Finally, I used BEDTools to convert the BAM to BED to bedgraph.

This required an initial indexing of our Olympia oyster genome FastA using samtools faidx tool.

SBATCH script file:

20180924_oly_RNAseq_bedgraphs.sh

#!/bin/bash

Job Name

#SBATCH –job-name=20180924_oly_bedgraphs

Allocation Definition

#SBATCH –account=srlab #SBATCH –partition=srlab

Resources

Nodes

#SBATCH –nodes=1

Walltime (days-hours:minutes:seconds format)

#SBATCH –time=5-00:00:00

Memory per node

#SBATCH –mem=500G ##turn on e-mail notification #SBATCH –mail-type=ALL #SBATCH –mail-user=samwhite@uw.edu

Specify the working directory for this job

#SBATCH –workdir=/gscratch/scrubbed/samwhite/20180924_oly_RNAseq_bedgraphs

Load Python Mox module for Python module availability

module load intel-python3_2017

Document programs in PATH (primarily for program version ID)

date » system_path.log echo “” » system_path.log echo “System PATH for $SLURM_JOB_ID” » system_path.log echo “” » system_path.log printf “%0.s-“ {1..10} » system_path.log echo ${PATH} | tr : \n » system_path.log

Set genome assembly FastA

oly_genome_fasta=/gscratch/srlab/sam/data/O_lurida/oly_genome_assemblies/Olurida_v081.fa

Set indexed genome assembly file

oly_genome_indexed=/gscratch/srlab/sam/data/O_lurida/oly_genome_assemblies/Olurida_v081.fa.fai

Set sorted transcriptome assembly bam file

oly_transcriptome=/gscratch/scrubbed/samwhite/20180919_oly_transcriptome_bowtie2/20180919_Olurida_v081.sorted.bam

Set program paths

bedtools=/gscratch/srlab/programs/bedtools-2.27.1/bin samtools=/gscratch/srlab/programs/samtools-1.9/samtools

Index genome FastA

${samtools} faidx ${oly_genome_fasta}

Format indexed genome for bedtools

Requires only two columns: namelength

awk -v OFS=’\t’ {‘print $1,$2’} ${oly_genome_indexed} > Olurida_v081.fa.fai.genome

Create bed file

${bedtools}/bamToBed
-i ${oly_transcriptome} \

20180924_oly_RNAseq.bam.bed

Create bedgraph

Reports depth at each position (-bg in bedgraph format) and report regions with zero coverage (-a).

Screens for portions of reads coming from exons (-split).

Add genome browser track line to header of bedgraph file.

${bedtools}/genomeCoverageBed
-i ${PWD}/20180924_oly_RNAseq.bed
-g Olurida_v081.fa.fai.genome
-bga
-split
-trackline \

20180924_oly_RNAseq.bed </code>

Alignment was done using the following version of the Olympia oyster genome assembly:

Olurida_v081.fa

RESULTS:

Output folder:

20180924_oly_RNAseq_bedgraphs/

Indexed and formatted genome file:

Olurida_v081.fa.fai.genome

Bedgraph file (for IGV):

20180924_oly_RNAseq.bed

This doesn’t appear to have worked properly. Here’s a view of the bedgraph file:

<code>
track type=bedGraph
Contig0 0   116746  0
Contig1 0   87411   0
Contig2 0   139250  0
Contig3 0   141657  0
Contig4 0   95692   0
Contig5 0   130522  0
Contig6 0   94893   0
Contig7 0   109667  0
Contig8 0   95943   0
</code>

I’d expect multiple entries for each contig (ideally), indicating start/stop positions for where transcripts align within a given contig. However, this appears to simply be a list of all the genome contigs and their lengths (Start=0, Stop=n).

I would expect to see something li

I’ll look into this further and see where this pipeline goes wrong.

Job Name

Allocation Definition

Resources

Nodes

Walltime (days-hours:minutes:seconds format)

Memory per node

Specify the working directory for this job

Load Python Mox module for Python module availability

Document programs in PATH (primarily for program version ID)

Set genome assembly FastA

Set indexed genome assembly file

Set sorted transcriptome assembly bam file

Set program paths

Index genome FastA

Format indexed genome for bedtools

Requires only two columns: namelength

Create bed file

Create bedgraph

Reports depth at each position (-bg in bedgraph format) and report regions with zero coverage (-a).

Screens for portions of reads coming from exons (-split).

Add genome browser track line to header of bedgraph file.

RESULTS: