Transcriptome Annotation - Geoduck Heart with BLASTx on Mox

I’ll be annotating the transcriptome assembly (from 20190215) using Trinotate and part of that process is having BLASTx output for the Trinity assembly, so have run BLASTx on Mox.

SBATCH script:


#!/bin/bash
## Job Name
#SBATCH --job-name=blastx_heart
## Allocation Definition
#SBATCH --account=coenv
#SBATCH --partition=coenv
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=25-00:00:00
## Memory per node
#SBATCH --mem=120G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/scrubbed/samwhite/outputs/20190318_blastx_geoduck_heart_RNAseq

# Load Python Mox module for Python module availability

module load intel-python3_2017

# Document programs in PATH (primarily for program version ID)

date >> system_path.log
echo "" >> system_path.log
echo "System PATH for $SLURM_JOB_ID" >> system_path.log
echo "" >> system_path.log
printf "%0.s-" {1..10} >> system_path.log
echo ${PATH} | tr : \\n >> system_path.log


wd="$(pwd)"


# Paths to input/output files
blastx_out="${wd}/blastx.outfmt6"
sp_db="/gscratch/srlab/programs/Trinotate-v3.1.1/admin/uniprot_sprot.pep"

trinity_fasta="/gscratch/scrubbed/samwhite/outputs/20190215_trinity_geoduck_heart_RNAseq/trinity_out_dir/Trinity.fasta"

# Paths to programs
blast_dir="/gscratch/srlab/programs/ncbi-blast-2.8.1+/bin"
blastx="${blast_dir}/blastx"


# Run blastx on Trinity fasta
${blastx} \
-query ${trinity_fasta} \
-db ${sp_db} \
-max_target_seqs 1 \
-outfmt 6 \
-evalue 1e-3 \
-num_threads 28 \
> ${blastx_out}


RESULTS

Output folder:

BLASTx output (output format 6):