Progress on generating bedgraphs from our Olympia oyster transcriptome continues.
Transcriptome assembly with Trinity completed 20180919.
Then, aligned the assembled transcriptome to our genome using Bowtie2.
Finally, I used BEDTools to convert the BAM to BED to bedgraph.
This required an initial indexing of our Olympia oyster genome FastA using samtools faidx tool.
SBATCH script file:
#!/bin/bash
## Job Name
#SBATCH --job-name=20180924_oly_bedgraphs
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=5-00:00:00
## Memory per node
#SBATCH --mem=500G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/scrubbed/samwhite/20180924_oly_RNAseq_bedgraphs
# Load Python Mox module for Python module availability
module load intel-python3_2017
# Document programs in PATH (primarily for program version ID)
date >> system_path.log
echo "" >> system_path.log
echo "System PATH for $SLURM_JOB_ID" >> system_path.log
echo "" >> system_path.log
printf "%0.s-" {1..10} >> system_path.log
echo ${PATH} | tr : \\n >> system_path.log
# Set genome assembly FastA
oly_genome_fasta=/gscratch/srlab/sam/data/O_lurida/oly_genome_assemblies/Olurida_v081.fa
# Set indexed genome assembly file
oly_genome_indexed=/gscratch/srlab/sam/data/O_lurida/oly_genome_assemblies/Olurida_v081.fa.fai
# Set sorted transcriptome assembly bam file
oly_transcriptome=/gscratch/scrubbed/samwhite/20180919_oly_transcriptome_bowtie2/20180919_Olurida_v081.sorted.bam
# Set program paths
bedtools=/gscratch/srlab/programs/bedtools-2.27.1/bin
samtools=/gscratch/srlab/programs/samtools-1.9/samtools
# Index genome FastA
${samtools} faidx ${oly_genome_fasta}
# Format indexed genome for bedtools
## Requires only two columns: namelength
awk -v OFS='\t' {'print $1,$2'} ${oly_genome_indexed} > Olurida_v081.fa.fai.genome
# Create bed file
${bedtools}/bamToBed \
-i ${oly_transcriptome} \
> 20180924_oly_RNAseq.bam.bed
# Create bedgraph
## Reports depth at each position (-bg in bedgraph format) and report regions with zero coverage (-a).
## Screens for portions of reads coming from exons (-split).
## Add genome browser track line to header of bedgraph file.
${bedtools}/genomeCoverageBed \
-i ${PWD}/20180924_oly_RNAseq.bed \
-g Olurida_v081.fa.fai.genome \
-bga \
-split \
-trackline \
> 20180924_oly_RNAseq.bed
Alignment was done using the following version of the Olympia oyster genome assembly:
RESULTS:
Output folder:
Indexed and formatted genome file:
Bedgraph file (for IGV):
This doesn’t appear to have worked properly. Here’s a view of the bedgraph file:
track type=bedGraph
Contig0 0 116746 0
Contig1 0 87411 0
Contig2 0 139250 0
Contig3 0 141657 0
Contig4 0 95692 0
Contig5 0 130522 0
Contig6 0 94893 0
Contig7 0 109667 0
Contig8 0 95943 0
I’d expect multiple entries for each contig (ideally), indicating start/stop positions for where transcripts align within a given contig. However, this appears to simply be a list of all the genome contigs and their lengths (Start=0, Stop=n).
I would expect to see something li
I’ll look into this further and see where this pipeline goes wrong.