Progress on generating bedgraphs from our Olympia oyster transcriptome continues.
Transcriptome assembly with Trinity completed 20180919.
Then, aligned the assembled transcriptome to our genome using Bowtie2.
Finally, I used BEDTools to convert the BAM to BED to bedgraph.
This required an initial indexing of our Olympia oyster genome FastA using samtools faidx tool.
SBATCH script file:
-
20180924_oly_RNAseq_bedgraphs.sh
#!/bin/bash
Job Name
#SBATCH –job-name=20180924_oly_bedgraphs
Allocation Definition
#SBATCH –account=srlab #SBATCH –partition=srlab
Resources
Nodes
#SBATCH –nodes=1
Walltime (days-hours:minutes:seconds format)
#SBATCH –time=5-00:00:00
Memory per node
#SBATCH –mem=500G ##turn on e-mail notification #SBATCH –mail-type=ALL #SBATCH –mail-user=samwhite@uw.edu
Specify the working directory for this job
#SBATCH –workdir=/gscratch/scrubbed/samwhite/20180924_oly_RNAseq_bedgraphs
Load Python Mox module for Python module availability
module load intel-python3_2017
Document programs in PATH (primarily for program version ID)
date » system_path.log echo “” » system_path.log echo “System PATH for $SLURM_JOB_ID” » system_path.log echo “” » system_path.log printf “%0.s-“ {1..10} » system_path.log echo ${PATH} | tr : \n » system_path.log
Set genome assembly FastA
oly_genome_fasta=/gscratch/srlab/sam/data/O_lurida/oly_genome_assemblies/Olurida_v081.fa
Set indexed genome assembly file
oly_genome_indexed=/gscratch/srlab/sam/data/O_lurida/oly_genome_assemblies/Olurida_v081.fa.fai
Set sorted transcriptome assembly bam file
oly_transcriptome=/gscratch/scrubbed/samwhite/20180919_oly_transcriptome_bowtie2/20180919_Olurida_v081.sorted.bam
Set program paths
bedtools=/gscratch/srlab/programs/bedtools-2.27.1/bin samtools=/gscratch/srlab/programs/samtools-1.9/samtools
Index genome FastA
${samtools} faidx ${oly_genome_fasta}
Format indexed genome for bedtools
Requires only two columns: name
length awk -v OFS=’\t’ {‘print $1,$2’} ${oly_genome_indexed} > Olurida_v081.fa.fai.genome
Create bed file
${bedtools}/bamToBed
-i ${oly_transcriptome} \20180924_oly_RNAseq.bam.bed
Create bedgraph
Reports depth at each position (-bg in bedgraph format) and report regions with zero coverage (-a).
Screens for portions of reads coming from exons (-split).
Add genome browser track line to header of bedgraph file.
${bedtools}/genomeCoverageBed
-i ${PWD}/20180924_oly_RNAseq.bed
-g Olurida_v081.fa.fai.genome
-bga
-split
-trackline \20180924_oly_RNAseq.bed </code>
Alignment was done using the following version of the Olympia oyster genome assembly:
RESULTS:
Output folder:
Indexed and formatted genome file:
Bedgraph file (for IGV):
This doesn’t appear to have worked properly. Here’s a view of the bedgraph file:
<code>
track type=bedGraph
Contig0 0 116746 0
Contig1 0 87411 0
Contig2 0 139250 0
Contig3 0 141657 0
Contig4 0 95692 0
Contig5 0 130522 0
Contig6 0 94893 0
Contig7 0 109667 0
Contig8 0 95943 0
</code>
I’d expect multiple entries for each contig (ideally), indicating start/stop positions for where transcripts align within a given contig. However, this appears to simply be a list of all the genome contigs and their lengths (Start=0, Stop=n).
I would expect to see something li
I’ll look into this further and see where this pipeline goes wrong.