Troubleshooting – PB Jelly Install on Emu

I previously installed and ran PB Jelly. Despite no error messages being output, I noticed something odd during my quick post-assembly stats check: The PB Jelly numbers were identical to the input reference file. This seemed very strange and made me decide to look a bit deeper in the PB Jelly output files.

As it turns out, PB Jelly did not complete successfully! Here’s a look at one of the output files (notice the error messages!):

Looking around the internet seemed to suggest that the issue could be that the blasr program wasn’t in my system PATH (blasr is located in: /home/shared/bin). So, I updated that, since /home/shared/bin wasn’t in the system PATH!:

After doing this, I noticed that the PATH assignment in the /etc/environment file is incorrect – it has the $PATH variable appended to the front of the list. This results in the system PATH appending itself to itself over and over again, resulting in a ridiculously long list (like in the screen cap directly above this text). So, I removed that portion and re-sourced the /etc/environment file to tidy things up.

Fingers crossed this will resolve the issue…

DNA Isolation & Quantification – C. virginica Gonad gDNA

I isolated DNA from the Crassotrea virginica gonad samples sent by Katie Lotterhos using the E.Z.N.A. Mollusc Kit with the following modifications:

  • Samples were homogenized with plastic, disposable pestle in 350μL of ML1 Buffer
  • No optional steps were used
  • Eluted each in 100μL of Elution Buffer and pooled into a single sample

NOTE: Sample 034 did not process properly (no phase separation after 24:1 chlorform:IAA addition – along with suggested additions of ML1 Buffer) and was discarded.

Quantified the DNA using the Qubit dsDNA BR Kit (Invitrogen). Used 2μL of DNA sample.

Samples were stored in the same box the tissue was delivered in and stored in the same location in our -80C: rack 8, row 5, column 4.

Results:

Qubit (Google Sheet): 20171114_qubit_Cvirginica_gDNA

Ample DNA in all samples for MBDseq. (Refer to “Original Sample Conc.” column in spreadsheet.)

Will let Steven & Katie know.

Assembly Comparison – Oly Assemblies Using Quast

I ran Quast to compare all of our current Olympia oyster genome assemblies.

See Jupyter Notebook in Results section for Quast execution.

Results:

Output folder: http://owl.fish.washington.edu/Athaliana/quast_results/results_2017_11_14_12_30_25/

Heatmapped table of results: http://owl.fish.washington.edu/Athaliana/quast_results/results_2017_11_14_12_30_25/report.html

Very enlightening!

BEST OF:

Largest Contig: redundans_sjw_02 (322,397bp)
Total Length: soap_bgi_01 & pbjelly_sjw_01 (697,528,655bp)
Total Length (>=50,000bp): redundans_sjw_03 (17,006,058bp)
N50: redundans_sjw_03 (17,679bp)

Interesting tidbit: The pbjelly_sjw_01 assembly is EXACTLY the same as the soap_bgi_01. Looking at the output messages from that PB Jelly assembly, one can see why. The messages indicate that no gaps were filled on the BGI scaffold reference! That means the PB Jelly output is just the BGI scaffold reference assembly!

Jupyter Notebook (GitHub): 20171114_swoose_oly_assembly_comparisons_quast.ipynb

Genome Assembly – Olympia Oyster Illumina & PacBio Using PB Jelly w/BGI Scaffold Assembly

Yesterday, I ran PB Jelly using Sean’s Platanus assembly, but that didn’t produce an assembly because PB Jelly was expecting gaps in the Illumina reference assembly (i.e. scaffolds, not contigs).

Re-ran this using the BGI Illumina scaffolds FASTA.

Here’s a brief rundown of how this was run:

See the Jupyter Notebook for full details of run (see Results section below).

Results:

Output folder: http://owl.fish.washington.edu/Athaliana/20171114_oly_pbjelly/

Output FASTA file: http://owl.fish.washington.edu/Athaliana/20171114_oly_pbjelly/jelly.out.fasta

OK! This seems to have worked (and it was quick, like less than an hour!), as it actually produced a FASTA file! Will run QUAST with this and some assemblies to compare assembly stats. Have added this assembly to our Olympia oyster genome assemblies table.

Jupyter Notebook (GitHub): 20171114_emu_pbjelly_BGI_scaffold.ipynb

Genome Assembly – Olympia Oyster Illumina & PacBio Using PB Jelly w/Platanus Assembly

Sean had previously attempted to run PB Jelly, but ran into some issues running on Hyak, so I decided to try this on Emu.

Here’s a brief rundown of how this was run:

See the Jupyter Notebook for full details of run (see Results section below).

Results:

Output folder: http://owl.fish.washington.edu/Athaliana/20171113_oly_pbjelly/

This completed very quickly (like, just a couple of hours). I also didn’t experience the woes of multimillion temp file production that killed Sean’s attempt at running this on Mox (Hyak).

However, it doesn’t seem to have produced an assembly!

Looking through the output, it seems as though it didn’t produce an assembly because there weren’t any gaps to fill in the reference. This makes sense (in regards to the lack of gaps in the reference Illumina assembly) because I used the Platanus contig FASTA file (i.e. not a scaffolds file). I didn’t realize PB Jelly was just designed for gap filling. Guess I’ll give this another go using the BGI scaffold FASTA file and see what we get.

Jupyter Notebook (GitHub): 20171113_emu_pbjelly_22mer_plat.ipynb

RNA Isolation & Quantification – Tanner crab hemolymph

We received three Tanner crab (Chionoecetes bairdi)hemolymph samples from Pam Jensen (NOAA) yesterday. From her email to Steven:

Hi Steven,
I am sending:
tube #1 crab 3859/3656: 300 ul blood + 1300 ul RNAlater​

tube #2 crab 3665/3873: 300 ul blood + 1300 ul RNAlater
​tube #3 crab 3665/3873: 200 ul blood + 1400 ul RNAlater​

The tubes hold max of 1600 ul. Will know on Sun or Mon if either crab is infected w Hematodinium.

Tracking info to follow.
Pam

Samples were stored at 4C O/N.

Here’s what the samples looked like before processing:

The samples are extremely cloudy. I’m not sure if this is expected.

Processed samples using RNAzol RT (MRC) according to the manufacturer’s protocol for Total RNA Isolation.

Pelleted samples at 5000g for 5 mins and the samples looked like this:

Decided to pellet samples for an additional 10mins. The pellet was more compact. Transferred supernatant to clean tube, since it seemed to contain “debris” (maybe cells?). Processed pellet with RNAzol RT. Brief rundown of procedure (all steps at room temp):

  1. Transferred supe to clean tube.
  2. Added 1mL RNAzol RT to pellet and mixed by repeated pipetting (solution was cloudy and slightly viscous).
  3. Added 400uL of 0.1% DEPC-treated H2O and mixed vigorously by hand.
  4. Incubated for 10mins.
  5. Centrifuged 12,000g for 15mins.

    Samples looked like this:

    This is not normal. Usually the supernatant is the clear portion, while the blue layer is below that.
  6. Transferred 750uL of the clear portion to clean 1.7mL tube.

  7. Added equal volume of isopropanol, mixed by inversion. Appeared to be a very high amount of genomic DNA precipitation visible in the tube.
  8. Incubated for 10mins.
  9. Centrifuged 12,000g, 15mins.

    Samples looked like this:

    It appears that the nucleotides (the white interphase) are suspended on a “cushion” of higher density solution, instead of pelleted at the bottom of the tube.
  10. Removed/discarded higher density solution, leaving the white layer on the bottom of the tube.

  11. Centrifuged 12,000g, 15mins.
  12. Discarded supe.
  13. Washed pellet with 75% ethanol.
  14. Centrifuged 8,000g, 3mins.
  15. Repeated Steps 12, 13, & 14, 1x.
  16. Discarded ethanol.
  17. Resuspended RNA in 50uL 0.1% DEPC-treated H2O. Pellets did not solubilize on their own. I dispersed the pellets by repeated pipetting (P200). Remaining insoluble material was pelleted (12,000g, 30s) and supernatant was transferred to a new 1.6mL tube.

RNA was quantified using the Qubit 3.0 and the Qubit HS RNA Assay. Used 5uL of each sample.

Results:

20171107_qubit_tanner_crab_hemo (Google Sheet)

Sample ID Conc. (ng/uL) Total Yield (ng)
3859/3656 0.44 22
3665/3873 1.66 83
3665/3873 2.04 102

Interestingly, both samples from the same crab had similar/decent yields.

Samples were labeled and stored at -80C in Shellfish RNA Box #6

Software Installation – ALPACA on Roadrunner

List of software that needed installing to run ALPACA:

Installed all software in:

/home/shared/

Had to change permissions on /home/shared/. Used the following to change permissions recursively (-R) to allow all admin (i.e. sudo group) users to read/write in this directory:

$sudo chown -R :sudo /home/shared

Compiled Celera Assembler from source (per the ALPACA requirements). This is the source file that I used: https://sourceforge.net/projects/wgs-assembler/files/wgs-assembler/wgs-8.3/wgs-8.3rc2.tar.bz2/download

Added all software to my system PATH by adding the following to my ~./bashrc file:

## Add bioinformatics softwares to PATH

export PATH=${PATH}:
/home/shared/alpaca:
/home/shared/Bismark:
/home/shared/bowtie2-2.3.3.1-linux-x86_64:
/home/shared/ectools-0.1:
/home/shared/PBSuite_15.8.24/bin:
/home/shared/pecan/bin:
/home/shared/samtools-1.6/bin:
/home/shared/wgs-assembler/Linux-amd64/bin

After adding that info to the bottom of my ~./bashrc file, I re-loaded the file into system memory by sourcing the file:

$source ~/.bashrc

Followed the ALPACA test instructions to confirm proper installation. More specific test instructions are actually located at the top of this file: /home/shared/alpaca/scripts/run_example.sh

Changed Celera Assembler directory name:

$mv /home/shared/wgs-8.3rc2 /home/shared/wgs-assembler
Step 1.
$mkdir /home/shared/test
Step 2.
$cd /home/shared/test/
Step 3.
$../alpaca/scripts/run_example.sh

Step three failed (which executes the run_example.sh script) due to permission problems.

Realized the script file didn’t have execute perimssions so I added execute permissions with the following command:

$sudo chmod +x /home/shared/alpaca/scripts/run_example.sh
Step 4. Continued with ALPACA Tests 2 & 3.

Everything tested successfully. Will try to get an assembly running with our PacBio and Illumina data.

Software Crash – Olympia oyster genome assembly with Masurca on Mox

Ah, the joys of bioinformatics. I just received an email from Mox indicating that the Masurca assembly I started 11 DAYS AGO (!!) crashed.

I’m probably not going to put much effort in to trying to figure out what went wrong, but here’s some log file snippets for reference. I’ll probably drop a line to the developers and see if they have any easy ways to address whatever caused the problems, but that’s about as much effort as I’m willing to put into troubleshooting this assembly.

Additionally, since this crashed, I’m not going to bother moving any of the files off of Mox. That means they will be deleted automatically by the system around Nov. 9th, 2017.


slurm-94620.out (tail)

compute_psa 6601202 2632582819
Refining alignments
Joining
Generating assembly input files
Coverage of the mega-reads less than 5 -- using the super reads as well
Coverage threshold for splitting unitigs is 138 minimum ovl 63
Running assembly
/gscratch/srlab/programs/MaSuRCA-3.2.3/bin/deduplicate_unitigs.sh: line 85: 24330 Aborted                 (core dumped) overlapStoreBuild -o $ASM_DIR/$ASM_PREFIX.ovlStore -M 65536 -g $ASM_DIR/$ASM_PREFIX.gkpStore $ASM_DIR/overlaps_dedup.ovb.gz > $ASM_DIR/overlapStore.rebuild.err 2>&1
Assembly stopped or failed, see CA.mr.41.15.17.0.029.log
[Mon Oct 30 23:19:37 PDT 2017] Assembly stopped or failed, see CA.mr.41.15.17.0.029.log

CA.mr.41.15.17.0.029.log (tail)

number of threads     = 28 (OpenMP default)

ERROR:  overlapStore '/gscratch/scrubbed/samwhite/20171019_masurca_oly_assembly/CA.mr.41.15.17.0.029/genome.ovlStore' is incomplete; previous overlapStoreBuild probably crashed.

----------------------------------------
Failure message:

failed to unitig

overlapStore.rebuild.err

Scanning overlap files to count the number of overlaps.
Found 277.972 million overlaps.
Memory limit 65536MB supplied.  Ill put 3246167525 IIDs (3435.97 million overlaps) into each of 1 buckets.
bucketizing CA.mr.41.15.17.0.029/overlaps_dedup.ovb.gz
bucketizing DONE!
overlaps skipped:
               0 OBT - low quality
               0 DUP - non-duplicate overlap
               0 DUP - different library
               0 DUP - dedup not requested
terminate called after throwing an instance of std::bad_alloc
  what():  std::bad_alloc

Failed with Aborted

Backtrace (mangled):

overlapStoreBuild[0x40523a]
/usr/lib64/libpthread.so.0(+0xf100)[0x2af83b3c0100]
/usr/lib64/libc.so.6(gsignal+0x37)[0x2af83c0395f7]
/usr/lib64/libc.so.6(abort+0x148)[0x2af83c03ace8]
/usr/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x165)[0x2af83b62d9d5]
/usr/lib64/libstdc++.so.6(+0x5e946)[0x2af83b62b946]
/usr/lib64/libstdc++.so.6(+0x5e973)[0x2af83b62b973]
/usr/lib64/libstdc++.so.6(+0x5eb93)[0x2af83b62bb93]
/usr/lib64/libstdc++.so.6(_Znwm+0x7d)[0x2af83b62c12d]
/usr/lib64/libstdc++.so.6(_Znam+0x9)[0x2af83b62c1c9]
overlapStoreBuild[0x402e10]
/usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2af83c025b15]
overlapStoreBuild[0x403089]

Backtrace (demangled):

[0] overlapStoreBuild() [0x40523a]
[1] /usr/lib64/libpthread.so.0::(null) + 0xf100  [0x2af83b3c0100]
[2] /usr/lib64/libc.so.6::(null) + 0x37  [0x2af83c0395f7]
[3] /usr/lib64/libc.so.6::(null) + 0x148  [0x2af83c03ace8]
[4] /usr/lib64/libstdc++.so.6::__gnu_cxx::__verbose_terminate_handler() + 0x165  [0x2af83b62d9d5]
[5] /usr/lib64/libstdc++.so.6::(null) + 0x5e946  [0x2af83b62b946]
[6] /usr/lib64/libstdc++.so.6::(null) + 0x5e973  [0x2af83b62b973]
[7] /usr/lib64/libstdc++.so.6::(null) + 0x5eb93  [0x2af83b62bb93]
[8] /usr/lib64/libstdc++.so.6::operator new(unsigned long) + 0x7d  [0x2af83b62c12d]
[9] /usr/lib64/libstdc++.so.6::operator new[](unsigned long) + 0x9  [0x2af83b62c1c9]
[10] overlapStoreBuild() [0x402e10]
[11] /usr/lib64/libc.so.6::(null) + 0xf5  [0x2af83c025b15]
[12] overlapStoreBuild() [0x403089]

GDB: