Transcriptome Comparisons - C.bairdi Transcriptomes Evaluations with DETONATE rsem-eval on Mox

UPDATE: I’ll lead in with the fact that this failed with an error message that I can’t figure out. This will save the reader some time. I’ve posted the problem as an Issue on the DETONATE GitHub repo, however it’s clear that this software is no longer maintained, as the repo hasn’t been updated in >3yrs; even lacking responses to Issues that are that old.

Here’s the error message and some other details that could be useful for troubleshooting (which are beyond my knowledge - although I suspect that the XM tag is the culprit and the first entry in the BAM file has XM:i:2 and the error message might suggest that 2 is not an acceptable value e.g. Assertion val == 0 || val == 1 || val == 5' failed.):

rsem-synthesis-reference-transcripts cbai_transcriptome_v3.1.fasta.temp/cbai_transcriptome_v3.1.fasta 0 0 0 /gscratch/srlab/sam/data/C_bairdi/transcriptomes/cbai_transcriptome_v3.1.fasta
Transcript Information File is generated!
Group File is generated!
Extracted Sequences File is generated!

rsem-preref cbai_transcriptome_v3.1.fasta.temp/cbai_transcriptome_v3.1.fasta.transcripts.fa 1 cbai_transcriptome_v3.1.fasta.temp/cbai_transcriptome_v3.1.fasta
Refs.makeRefs finished!
Refs.saveRefs finished!
cbai_transcriptome_v3.1.fasta.temp/cbai_transcriptome_v3.1.fasta.idx.fa is generated!
cbai_transcriptome_v3.1.fasta.temp/cbai_transcriptome_v3.1.fasta.n2g.idx.fa is generated!

rsem-parse-alignments cbai_transcriptome_v3.1.fasta.temp/cbai_transcriptome_v3.1.fasta cbai_transcriptome_v3.1.fasta.temp/cbai_transcriptome_v3.1.fasta cbai_transcriptome_v3.1.fasta.stat/cbai_transcriptome_v3.1.fasta b /gscratch/scrubbed/samwhite/outputs/20201224_cbai_bowtie2_transcriptomes_alignments/cbai_transcriptome_v3.1.fasta.sorted.bam -t 3 -tag XM
rsem-parse-alignments: parseIt.cpp:92: void parseIt(SamParser*) [with ReadType = PairedEndReadQ; HitType = PairedEndHit]: Assertion `val == 0 || val == 1 || val == 5' failed.
"rsem-parse-alignments cbai_transcriptome_v3.1.fasta.temp/cbai_transcriptome_v3.1.fasta cbai_transcriptome_v3.1.fasta.temp/cbai_transcriptome_v3.1.fasta cbai_transcriptome_v3.1.fasta.stat/cbai_transcriptome_v3.1.fasta b /gscratch/scrubbed/samwhite/outputs/20201224_cbai_bowtie2_transcriptomes_alignments/cbai_transcriptome_v3.1.fasta.sorted.bam -t 3 -tag XM" failed! Plase check if you provide correct parameters/options for the pipeline!

Here’s what the head of the BAM file looks like:

[samwhite@n2233 20201224_cbai_bowtie2_transcriptomes_alignments]$ /gscratch/srlab/programs/samtools-1.10/samtools view cbai_transcriptome_v3.1.fasta.sorted.bam | head
A00147:108:HLLJFDMXX:1:1369:3893:6637	163	TRINITY_DN5604_c0_g2_i1	1	42	101M	=	81	181	GAAAGAAAAACCGACAGGAGGAATTTCTTTGTTACCAACAAAAACTAATATATTTCGCATACCTGACAGACATGGTGACAGCGCCTCTGATGTTCGCCGAA	:FFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFF:FFFFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFFFFFFFFF	AS:i:-2	XN:i:0	XM:i:2	XO:i:0	XG:i:0	NM:i:2	MD:Z:2G31A66	YS:i:0	YT:Z:CP
A00147:121:HLLVMDMXX:1:2159:5674:25786	163	TRINITY_DN5604_c0_g2_i1	4	42	101M	=	72	169	AGAAAAACCGACAGGAGGAATTTCTTTGTTAACAACAAAAACTAATATATTTCGCATACCTGACAGACATGGTGACAGCGCCTCTGATGTTCGCCGAATTA	FFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:101	YS:i:0	YT:Z:CP
A00147:108:HLLJFDMXX:2:1420:13702:14920	163	TRINITY_DN5604_c0_g2_i1	5	42	101M	=	197	293	GAAAAACCGACAGGAGGAATTTCTTTGTTAACAACAAAAACTAATATATTTCGCATACCTGACAGACATGGTGACAGCGCCTCTGATGTTCGCCGAATTAA	FFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFF	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:101	YS:i:0	YT:Z:CP
A00147:108:HLLJFDMXX:2:1420:13431:15796	163	TRINITY_DN5604_c0_g2_i1	5	42	101M	=	197	293	GAAAAACCGACAGGAGGAATTTCTTTGTTAACAACAAAAACTAATATATTTCGCATACCTGACAGACATGGTGACAGCGCCTCTGATGTTCGCCGAATTAA	FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:101	YS:i:0	YT:Z:CP
A00147:121:HLLVMDMXX:1:1258:5141:32377	163	TRINITY_DN5604_c0_g2_i1	5	42	101M	=	107	203	GAAAAACCGACAGGAGGAATTTCTTTGTTACCAACAAAAACTAATATATTTCGCATACCTGACAGACATGGTGACAGCGCCTCTGATGTTCGCCGAATTAA	FFFFF,FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF:F:FFFFFFFF	AS:i:-1	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:30A70	YS:i:0	YT:Z:CP
A00147:121:HLLVMDMXX:1:1259:7645:5838	163	TRINITY_DN5604_c0_g2_i1	5	42	101M	=	107	203	GAAAAACCGACAGGAGGAATTTCTTTGTTACCAACAAAAACTAATATATTTCGCATACCTGACAGACATGGTGACAGCGCCTCTGATGTTCGCCGAATTAA	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF	AS:i:-1	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:30A70	YS:i:0	YT:Z:CP
A00147:108:HLLJFDMXX:1:1351:27082:1705	163	TRINITY_DN5604_c0_g2_i1	6	42	101M	=	158	253	AAAAACCGACAGGAGGAATTTCTTTGTTAACAACAAAAACTAATATATTTCGCATACCTGACAGACATGGTGACAGCGCCTCTGATGTTCGCCGAATTAAA	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:101	YS:i:0	YT:Z:CP
A00147:108:HLLJFDMXX:1:1475:28926:32706	163	TRINITY_DN5604_c0_g2_i1	6	42	101M	=	158	253	AAAAACCGACAGGAGGAATTTCTTTGTTAACAACAAAAACTAATATATTTCGCATACCTGACAGACATGGTGACAGCGCCTCTGATGTTCGCCGAATTAAA	FFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:101	YS:i:0	YT:Z:CP
A00147:121:HLLVMDMXX:1:2162:31385:26381	163	TRINITY_DN5604_c0_g2_i1	6	42	101M	=	158	253	AAAAACCGACAGGAGGAANTTCTTTGTTAACAACAAAAACTAATATATTTCGCATACCTGACAGACATGGTGACAGCGCCTCTGATGTTCGCCGAATTAAA	FFFFFFFFFFFFFFFFFF#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFF:	AS:i:-1	XN:i:0	XM:i:1	XO:i:0	XG:i:0	NM:i:1	MD:Z:18T82	YS:i:0	YT:Z:CP
A00147:121:HLLVMDMXX:2:1425:28745:30639	163	TRINITY_DN5604_c0_g2_i1	7	42	101M	=	169	263	AAAACCGACAGGAGGAATTTCTTTGTTAACAACAAAAACTAATATATTTCGCATACCTGACAGACATGGTGACAGCGCCTCTGATGTTCGCCGAATTAAAG	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:	AS:i:0	XN:i:0	XM:i:0	XO:i:0	XG:i:0	NM:i:0	MD:Z:101	YS:i:0	YT:Z:CP

bowtie2 commands to generate the BAM file:

  # Use bowtie2 and paired-end options
  # Uses settings specified for use with DETONATE
  # and for paired end reads when using DETONATE.
  ${programs_array[bowtie2]} \
  -x ${transcriptome_name} \
  -S ${transcriptome_name}.sam \
  --threads ${threads} \
  -1 ${R1_list} \
  -2 ${R2_list} \
  --sensitive \
  --dpad 0 \
  --gbar 99999999 \
  --mp 1,1 \
  --np 1 \
  --score-min L,0,-0.1 \
  --no-mixed \
  --no-discordant

  # Convert SAM to sorted BAM
  #
  ${programs_array[samtools_view]} \
  -b \
  ${transcriptome_name}.sam \
  | ${programs_array[samtools_sort]} \
  -m ${mem_per_thread} \
  --threads ${threads} \
  -o ${transcriptome_name}.sorted.bam \
  -

With all of that out of the way, you can find the original post below.


Using bowtie2, I generated transcriptome alignments on 20201224 to provide as input to the program DETONATE (rsem-eval) which should be able to generate a score to allow for an assessment of which transcriptome assembly is most accurate. My previous attempt to compare all of our C.bairdi transcriptome assemblies using DETONATE on 20200601 consistently failed due to hitting time limits on Mox and that is why I created the bowtie2 alignments in a separate job; it was significantly faster than doing so within the DETONATE (rsem-eval) software.

The job was run on Mox.

SBATCH script (GitHub):

#!/bin/bash
## Job Name
#SBATCH --job-name=20201229_cbai_detonate_transcriptome_evaluations
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=10-00:00:00
## Memory per node
#SBATCH --mem=500G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --chdir=/gscratch/scrubbed/samwhite/outputs/20201229_cbai_detonate_transcriptome_evaluations

# Script runs DETONATE on each of the C.bairdi transcriptomes
# using Bowtie2 BAM alignments generated on 20201224.
# DETONATE will generate a corresponding "score" for each transcriptome,
# providing another metric by which to compare each assembly.

# Requires Bash >=4.0, as script uses associative arrays.


###################################################################################
# These variables need to be set by user

# Assign Variables
## frag_size is guesstimate of library fragment sizes
frag_size=500
bams_dir=/gscratch/scrubbed/samwhite/outputs/20201224_cbai_bowtie2_transcriptomes_alignments
transcriptomes_dir=/gscratch/srlab/sam/data/C_bairdi/transcriptomes
threads=28

# Associative array of the transcriptomes and corresponding BAM file
declare -A transcriptomes_array
transcriptomes_array=(
["${transcriptomes_dir}/cbai_transcriptome_v1.5.fasta"]="${bams_dir}/cbai_transcriptome_v1.5.fasta.sorted.bam" \
["${transcriptomes_dir}/cbai_transcriptome_v1.6.fasta"]="${bams_dir}/cbai_transcriptome_v1.6.fasta.sorted.bam" \
["${transcriptomes_dir}/cbai_transcriptome_v1.7.fasta"]="${bams_dir}/cbai_transcriptome_v1.7.fasta.sorted.bam" \
["${transcriptomes_dir}/cbai_transcriptome_v2.0.fasta"]="${bams_dir}/cbai_transcriptome_v2.0.fasta.sorted.bam" \
["${transcriptomes_dir}/cbai_transcriptome_v2.1.fasta"]="${bams_dir}/cbai_transcriptome_v2.1.fasta.sorted.bam" \
["${transcriptomes_dir}/cbai_transcriptome_v3.0.fasta"]="${bams_dir}/cbai_transcriptome_v3.0.fasta.sorted.bam" \
["${transcriptomes_dir}/cbai_transcriptome_v3.1.fasta"]="${bams_dir}/cbai_transcriptome_v3.1.fasta.sorted.bam"
)


###################################################################################

# Exit script if any command fails
set -e

# Load Python Mox module for Python module availability

module load intel-python3_2017


# Programs array

declare -A programs_array
programs_array=(
[detonate_trans_length]="/gscratch/srlab/programs/detonate-1.11/rsem-eval/rsem-eval-estimate-transcript-length-distribution" \
[detonate]="/gscratch/srlab/programs/detonate-1.11/rsem-eval/rsem-eval-calculate-score"
)




# Loop through each comparison
for transcriptome in "${!transcriptomes_array[@]}"
do

  # Remove path from transcriptome
  transcriptome_name="${transcriptome##*/}"

  # Set RSEM distance output filename
  rsem_eval_dist_mean_sd="${transcriptome_name}_true_length_dis_mean_sd.txt"


  # Determine transcript length
  # Needed for subsequent rsem-eval command.
  ${programs_array[detonate_trans_length]} \
  "${transcriptome}" \
  "${rsem_eval_dist_mean_sd}"


  # Run rsem-eval
  # Use paired-end options
  ${programs_array[detonate]} \
  --transcript-length-parameters "${rsem_eval_dist_mean_sd}" \
  --bam \
  --paired-end \
  "${transcriptomes_array[$transcriptome]}" \
  "${transcriptome}" \
  "${transcriptome_name}" \
  ${frag_size}

  # Capture FastA checksums for verification
  echo "Generating checksum for ${transcriptome_name}"
  md5sum "${transcriptome}" >> fasta.checksums.md5
  echo "Finished generating checksum for ${transcriptome_name}"
  echo ""

  # Capture BAM checksums for verification
  echo "Generating checksum for ${transcriptomes_array[$transcriptome]}"
  md5sum "${transcriptomes_array[$transcriptome]}" >> bam.checksums.md5
  echo "Finished generating checksum for ${transcriptomes_array[$transcriptome]}"
  echo ""

done

# Capture program options
echo "Logging program options..."
for program in "${!programs_array[@]}"
do
	{
  echo "Program options for ${program}: "
	echo ""
  # Handle samtools help menus
  if [[ "${program}" == "samtools_index" ]] \
  || [[ "${program}" == "samtools_sort" ]] \
  || [[ "${program}" == "samtools_view" ]]
  then
    ${programs_array[$program]}
  fi
	${programs_array[$program]} -h
	echo ""
	echo ""
	echo "----------------------------------------------"
	echo ""
	echo ""
} &>> program_options.log || true

  # If MultiQC is in programs_array, copy the config file to this directory.
  if [[ "${program}" == "multiqc" ]]; then
  	cp --preserve ~/.multiqc_config.yaml multiqc_config.yaml
  fi
done

echo ""
echo "Finished logging program options."
echo ""

echo ""
echo "Logging system PATH."
# Document programs in PATH (primarily for program version ID)
{
date
echo ""
echo "System PATH for $SLURM_JOB_ID"
echo ""
printf "%0.s-" {1..10}
echo "${PATH}" | tr : \\n
} >> system_path.log

echo "Finished logging system PATH"

RESULTS

The job failed (virtually instantly) with this message:

rsem-synthesis-reference-transcripts cbai_transcriptome_v3.1.fasta.temp/cbai_transcriptome_v3.1.fasta 0 0 0 /gscratch/srlab/sam/data/C_bairdi/transcriptomes/cbai_transcriptome_v3.1.fasta
Transcript Information File is generated!
Group File is generated!
Extracted Sequences File is generated!

rsem-preref cbai_transcriptome_v3.1.fasta.temp/cbai_transcriptome_v3.1.fasta.transcripts.fa 1 cbai_transcriptome_v3.1.fasta.temp/cbai_transcriptome_v3.1.fasta
Refs.makeRefs finished!
Refs.saveRefs finished!
cbai_transcriptome_v3.1.fasta.temp/cbai_transcriptome_v3.1.fasta.idx.fa is generated!
cbai_transcriptome_v3.1.fasta.temp/cbai_transcriptome_v3.1.fasta.n2g.idx.fa is generated!

rsem-parse-alignments cbai_transcriptome_v3.1.fasta.temp/cbai_transcriptome_v3.1.fasta cbai_transcriptome_v3.1.fasta.temp/cbai_transcriptome_v3.1.fasta cbai_transcriptome_v3.1.fasta.stat/cbai_transcriptome_v3.1.fasta b /gscratch/scrubbed/samwhite/outputs/20201224_cbai_bowtie2_transcriptomes_alignments/cbai_transcriptome_v3.1.fasta.sorted.bam -t 3 -tag XM
rsem-parse-alignments: parseIt.cpp:92: void parseIt(SamParser*) [with ReadType = PairedEndReadQ; HitType = PairedEndHit]: Assertion `val == 0 || val == 1 || val == 5' failed.
"rsem-parse-alignments cbai_transcriptome_v3.1.fasta.temp/cbai_transcriptome_v3.1.fasta cbai_transcriptome_v3.1.fasta.temp/cbai_transcriptome_v3.1.fasta cbai_transcriptome_v3.1.fasta.stat/cbai_transcriptome_v3.1.fasta b /gscratch/scrubbed/samwhite/outputs/20201224_cbai_bowtie2_transcriptomes_alignments/cbai_transcriptome_v3.1.fasta.sorted.bam -t 3 -tag XM" failed! Plase check if you provide correct parameters/options for the pipeline!