NovaSeq Assembly – Trimmed Geoduck NovaSeq with Meraculous

Attempted to use Meraculous to assemble the trimmed geoduck NovaSeq data.

After a bunch of various issues (running out of hard drive space – multiple times, config file issues, typos), I’ve finally given up on running meraculous. It failed, again, saying it couldn’t find a file in a directory that meraculous created! I’ve emailed the authors and if they have an easy fix, I’ll implement it and see what happens.

Anyway, it’s all documented in the Jupyter Notebook below.

One good thing came out of all of it is that I had to run kmergenie to identify an appopriate kmer size to use for assembly, as well as estimated genome size (this info is needed for both meraculous and SOAPdeNovo (which I’ll be trying next)):

kmergenie output folder: http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180206_kmergenie/
kmergenie HTML report (doesn’t display histograms for some reason): 20180206_kmergenie/histograms_report.html
kmer size: 117
Est. genome size: 2.17Gbp

Jupyter Notebook (GitHub): 20180205_roadrunner_meraculous_geoduck_novaseq.ipynb

Titrator Setup – Functional Methods & Data Exports

0000-0002-2747-368X

I’ve been working on getting our T5 Excellence titrator (Mettler Toledo) with Rondolino sample changer (Mettler Toledo) set up and operational.

A significant part of the setup process is utilizing the LabX Software (Mettler Toledo, v.8.0.0). The software is vastly overpowered (i.e. overly complicated) for the nature of our work. As such, it’s been quite the struggle to get the titrator to do what we need it to do.

We’ve received a great deal of help and insight from Dr. Hollie Putnam (Univ. of Rhode Island). She provided us with a LabX Method that was previously used by some of her colleagues. In addition, she provided us with an SOP from those colleagues that helps describe how to physically operate the titrator and a corresponding workflow for data collection. Of course, she’s also helped immensely with understanding the entire process of the chemistry to how to process samples to implementing various quality control checks to ensure we’ll get the most accurate data we can.

After some significant struggles with getting the method to work properly, I contacted Mettler Toledo and the technician helped me modify the method that Hollie passed along to us to run properly.

Basic Functionality Fixes

The updated method resolves the following issues we were having (see annotated screenshot below the lists for more detailed overview of method changes):

Method only ends titration at specified maximum volume instead of specified potential
Method exits with Sample State = “Not OK”
Method fails to calculate acid Consumption at end of titration

I also modified the method to bring it up to spec with the Dickson Guide to Best Practices for Ocean CO₂ Measurements- SOP3b:

Changed method template to use Endpoint (EP) titration instead of Equivalence Point (EQP) titration
Implemented initial titration step to pH=3.5 (200mV)
Added degassing step to match Dickson specs (increase stir speed and duration)
Improved titration precision by changing titration rate to automatically adjust relative to set endpoint values
Set correct voltages for pH=3.5 (200mV) & pH=3.0 (228.57mV); no need to adjust prior to running method

I recommend opening the image below in a new tab in order to be able to read all of the annotations.

So, all of that stuff makes the method actually run according to the Dickson specs. It also fixes the Consumption calculation. Although this aspect of the method is totally unnecessary (the consumption calculation could easily be integrated in downstream analysis), it feels good to have fixed the issue and learn how that aspect of the LabX software functions.

Data Export Fixes

We recently acquired the necessary license to unlock the ability to export data.

Digression:

This is a terrible practice implemented by Mettler Toledo, btw. I’m particularly annoyed because I specifically asked about data exports when I spoke with the sales rep when initiating the purchase. He neglected to mention that I’d need to purchase an additional license. I wasted a lot of time and hair pulling before I learned that this feature was locked by design and that I needed this additional license.

Anyway, the struggles continued even after activating the license. I was not able to export any data – every attempt at specifying a directory in which to save data failed.

Mettler Toledo informed me that it has to do with Windows Services Permissions.

To change permissions to allow data export:

Search Windows for “Services”
Open “Services”
Right-click on “LabXHostService”
Select “Properties”
Click “Log On” tab.
Click “Local System Account” radio button.
Check “Allow service to interact with desktop” box.
Restart computer.

Now, the data gets automatically exported to my desired directory as soon as a LabX task is finished!!

Data Management & Informational Resources

As we are very close to beginning to actually collect data with the titrator, I realized that all this data needs to go somewhere. Additionally, people need someplace to find out how to use all of this stuff (equipment, software, etc.).

I created a new GitHub repo: RobertsLab/titrator

Please feel free to look through it and post any ideas to the Issues section of the repo.

In my mind, this will be a “master” data repository for all measurements conducted on the titrator. All daily pH calibration data should get pushed to this repo. Any sample titration data should also end up on this repo. Basically, all the raw data coming off the machine each time it is used should end up in this repo. I think this will reduce data fragmentation (e.g. I perform measurements on a subset of samples one day and put the data in my folder on Owl. Then Grace performs measurements on the remaining samples and uploads those data to her folder on Owl. Now, the complete data set for an experiment is split between two different locations, making it difficult to find.)

Although this single data repository approach won’t eliminate fragmentation (it can’t be avoided since the Rondolino sample changer can only hold nine samples), I think it will be beneficial to know that all the data is in a single location.

This repo will also be a resource for SOPs and troubleshooting. A detailed SOP is currently in development (being modified from the SOP Hollie originally sent us) which will detail daily startup/shutdown procedures, running scripts to process data, and guides on how to evaluate quality control procedures at various points throughout the titration process.

Finally, it will also contain the necessary scripts that we develop for data grooming and analysis.

What’s Left?

Equipment

We only need one final physical component to actually begin collecting data. After reviewing the Dickson SOP3b and speaking with Hollie, it turns out we need an aquarium pump and rotameter (acquired this week) to sparge our samples; this aids in degassing prior to the titration from pH=3.5 to pH=3.0. Just waiting on tubing and tubing adapters for the rotameter. Once we have this, I should be able to start powering through samples.

Initial testing of the titrator (even without the full physical setup in place) shows highly consistent, reproducible measurements – both with pH calibration values and with total alkalinity (TA) determinations on Instant Ocean seawater. As such, I’m confident that I won’t have very much testing left to do once I can start bubbling air into the samples.

Software

Although these things are not necessary in order to start acquiring data, they will be necessary for performing TA calculations and, eventually be desired for analyzing and reporting TA calculations in near real-time (the end goal is to have this titrator setup in a wet lab where water TA calculations can be determined on the spot, as opposed to being stored and analyzed at a later time).

A to-do list is outlined here: RobertsLab/titrator/issues/1

Adapter Trimming and FASTQC – Illumina Geoduck Novaseq Data

0000-0002-2747-368X

We would like to get an assembly of the geoduck NovaSeq data that Illumina provided us with.

Steven previously ran the raw data through FASTQC and there was a significant amount of adapter contamination (up to 44% in some libraries) present (see his FASTQC report here).

So, I trimmed them using TrimGalore and re-ran FASTQC on them.

This required two rounds of trimming using the “auto-detect” feature of Trim Galore.

Round 1: remove NovaSeq adapters
Round 2: remove standard Illumina adapters

See Jupyter notebook below for the gritty details.

Results:

All data for this NovaSeq assembly project can be found here: http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/.

Round 1 Trim Galore reports: [20180125_trim_galore_reports/](http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180125_trim_galore_reports/]
Round 1 FASTQC: 20180129_trimmed_multiqc_fastqc_01
Round 1 FASTQC MultiQC overview: 20180129_trimmed_multiqc_fastqc_01/multiqc_report.html

Round 2 Trim Galore reports: 20180125_geoduck_novaseq/20180205_trim_galore_reports/
Round 2 FASTQC: 20180205_trimmed_fastqc_02/
Round 2 FASTQC MultiQC overview: 20180205_trimmed_multiqc_fastqc_02/multiqc_report.html

For the astute observer, you might notice the “Per Base Sequence Content” generates a “Fail” warning for all samples. Per the FASTQC help, this is likely expected (due to the fact that NovaSeq libraries are prepared using transposases) and doesn’t have any downstream impacts on analyses.

Jupyter Notebook (GitHub): 20180125_roadrunner_trimming_geoduck_novaseq.ipynb

Software Install – 10x Genomics Supernova on Mox (Hyak)

0000-0002-2747-368X

Steven asked me to install Supernova (by 10x Genomics on our Mox node.

First, need to install a dependency: bcl2fastq2
Followed Illumina bcl2fastq2 manual (PDF)

Logged into Mox and initiated a Build node:

srun -p build --time=1:00:00 --pty /bin/bash

Install bclsfastq2 dependency

Illumina bcl2fastq2 manual (PDF)

cd /gscratch/srlab/tmp

wget ftp://webdata2:webdata2@ussd-ftp.illumina.com/downloads/software/bcl2fastq/bcl2fastq2-v2-20-0-tar.zip

export TMP=/gscratch/srlab/tmp/

export SOURCE=${TMP}/bcl2fastq

export BUILD=${TMP}/bcl2fastq2.20-build

export INSTALL_DIR=/gscratch/srlab/programs/bcl2fastq-v2.20

cd ${TMP}

unzip bcl2fastq2-v2-20-0-tar.zip

tar -xvzf bcl2fastq2-v2.20.0.422-Source.tar.gz

cd ${BUILD}

chmod ugo+x ${SOURCE}/src/configure

chmod ugo+x ${SOURCE}/src/cmake/bootstrap/installCmake.sh

${SOURCE}/src/configure --prefix=${INSTALL_DIR}

cd ${BUILD}

make

make install

Install Supernova 2.0.0

Supernova install directions

cd /gscratch/srlab/programs

wget -O supernova-2.0.0.tar.gz "http://cf.10xgenomics.com/releases/assembly/supernova-2.0.0.tar.gz?Expires=1516707075&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cDovL2NmLjEweGdlbm9taWNzLmNvbS9yZWxlYXNlcy9hc3NlbWJseS9zdXBlcm5vdmEtMi4wLjAudGFyLmd6IiwiQ29uZGl0aW9uIjp7IkRhdGVMZXNzVGhhbiI6eyJBV1M6RXBvY2hUaW1lIjoxNTE2NzA3MDc1fX19XX0_&Signature=XJR7c9UlSkueydP304nKJrqomLXBH9~DWsenwlvBrplFMojbO-DPMghO09Sk6Wi5ApZSPwKB3sl1Wrnjy3qBLwr7dCoT~9oStyBpqlF~Xl2nBY6odnTzUaq3IpLyu8icIkt7DJM0GMXQTTp6rYu1PlLG31hMM5b5HZI3Tjzrhk8URbSrsG~7mm6m5-28afYHX00kT2Xfor7xr-ZSjjLe2jr99SEIARfzZjt6kUEnDMbl~3FXCHsSxXzKrkYXobGmfQhYBrey0iRyCAc9yNF7eSuBHAsqRGsP2yURVcYf3BB5nB1ZuEUo0qLgc5GlZJDQdsqDNC69HkyLCJamkJSnVg__&Key-Pair-Id=APKAI7S6A5RYOXBWRPDA"

tar -xzvf supernova-2.0.0.tar.gz

rm supernova-2.0.0.tar.gz

cd supernova-2.0.0

supernova-cs/2.0.0/bin/supernova sitecheck > sitecheck.txt

supernova-cs/2.0.0/bin/supernova upload samwhite@uw.edu sitecheck.txt

srun -p srlab -A srlab --time=2:00:00 --pty /bin/bash

/gscratch/srlab/programs/supernova-2.0.0/supernova testrun --id=tiny

OK, looks like the test run finished successfully.

Samples Submitted – C. virginica gDNA, MBD, and MspI to Qiagen

0000-0002-2747-368X

Sent Crassostrea virginica samples to Qiagen, as part of the collaboration we have with them for testing their new bisulfite conversion kit on various reduced representation DNA.

Samples were sent on dry ice via FedEx International Priority: 771231112481

Here are the samples I sent:

gDNA C.virginica – Genomic DNA, 20uL, 58.4ng/uL
MBD 1 virginica – Fragmented (~400bp average size), MBD-enriched, 25uL, 18.3ng/uL
MBD 2 virginica – Fragmented (~400bp average size), MBD-enriched, 25uL, 19.6ng/uL
MspI 1 virginica – gDNA digested with MspI, 25uL, 53.4ng/uL
MspI 2 virginica – gDNA digested with MspI, 25uL, 31.0ng/uL

Genomic DNA was isolated from mantle tissue using the E.Z.N.A. Mollusc DNA Kit (Omega). DNA was eluted with the Elution Buffer supplied with the kit.

MBD enrichment was performed using the MethylMiner Methylated DNA Enrichment Kit (Invitrogen). DNA was resuspended in Buffer EB (Qiagen).

MspI digestions were performed using MspI (NEB) and were subjected to a phenol:chloroform cleanup, post-digestion. DNA was resuspended in Buffer EB (Qiagen).

No tests have been performed on the samples to evaluate the presence/absence of non-oyster DNA.

Reference genome is available here: https://www.ncbi.nlm.nih.gov/assembly/GCF_002022765.2/

Notebook entries:

Assembly Comparisons – Oly Assemblies Using Quast

0000-0002-2747-368X

I ran Quast to compare all of our current Olympia oyster genome assemblies.

See Jupyter Notebook in Results section for Quast execution.

Results:

Output folder: http://owl.fish.washington.edu/Athaliana/quast_results/results_2018_01_16_10_08_35/

Heatmapped table of results: http://owl.fish.washington.edu/Athaliana/quast_results/results_2018_01_16_10_08_35/report.html

Very enlightening!

After all the difficulties with PB Jelly, it has produced the most large contigs. However, it does also have the highest quantity and rate of N’s of all the assemblies produced to date.

BEST OF:

# contigs (>= 50000 bp): pbjelly_sjw_01 (894)
Largest Contig: redundans_sjw_02 (322,397bp)
Total Length: pbjelly_sjw_01 (1,180,563,613bp)
Total Length (>=50,000bp): pbjelly_sjw_01 (57,741,906bp)
N50: redundans_sjw_03 (17,679bp)

Jupyter Notebook (GitHub): 20180116_swoose_oly_assembly_comparisons_quast.ipynb

DNA Quantification – MspI-digested Crassostrea virginica gDNA

0000-0002-2747-368X

Quantified the two MspI-digested DNA samples for the Qiagen project from earlier today with the Qubit 3.0 (ThermoFisher).

Used the Qubit dsDNA Broad Range (BR) Kit (ThermoFisher).

Used 1μL of DNA from each sample (including undigested gDNA from initial isolation 20171211

Results:

Quantification (Google Sheet): 20180111_qubit_DNA_MspI_virginica

Yields are good and are sufficient for submission to Qiagen:

MspI_virginica_01 – 53.4ng/μL (1335ng; 89% recovery after phenol/chloroform/EtOH precip)
MspI_virginca_02 – 31.0ng/μL (775ng; ~52% recovery after phenol/chloroform/EtOH precip)

Phenol:Chloroform Extractions and EtOH Precipitations – MspI Digestions of C.virginica DNA from Earlier Today

0000-0002-2747-368X

The two MspI restriction digestions from earlier today for our project with Qiagen were subjected to phenol:chloroform cleanup and subsequent ethanol precipitations.

Phenol:chloroform clean up procedure:

Added equal volume (50μL) of phenol:chloroform:IAA (25:24:1) to each sample.
Vortexed.
Centrifuged 5mins, 16,000g at room temperature.
Transferred aqueous phase (top layer) to clean 0.5mL snap-cap PCR tube.
Added equal volume of chloroform (50μL) to aqueous phase.
Vortexed.
Centrifuged 5mins, 16,000g at room temperature.
Transferred aqueous phase (top layer) to clean 0.5mL snap-cap PCR tube.

Performed ethanol precipitation on both samples according to lab protocol.

Resuspended precipitated DNA in 25μL Buffer EB (Qiagen).

Will quantify with Qubit 3.0.

Restriction Digestion – MspI on Crassotrea virginica gDNA

0000-0002-2747-368X

Digested two 1.5μg aliquots of Crassostrea virginica isolated 20171211, as part of the project we’re doing with Qiagen.

Digestion reactions:

Component	Volume(μL)
DNA (1.5μg)	25.7
10x CutSmart Buffer (NEB)	5.0
Water	17.3
MspI (NEB)	2
TOTAL	50

MspI info:

NEB R0106T (100,000U/mL; rec’d 20171214)

Reactions were carried out in 0.5mL snap-cap PCR tubes and incubated for 15mins @ 37oC in a PTC-200 thermalcycler (MJ Research), no heated lid.

Samples will be subjected to a phenol:chloroform extraction for cleanup.

DNA Quantification – C.virginica MBD-enriched DNA

0000-0002-2747-368X

Quantified Crassostrea virginica MBD-enriched DNA from earlier today for Qiagen project.

Used the Qubit 3.0 (ThermoFisher) and the Qubit dsDNA Broad Range (BR) Kit (ThermoFisher).

Used 1uL of template DNA.

Results:

Quantification Spreadsheet (Google Sheet): 20180110_qubit_dsDNA_BR_MBD_virginica

Both samples had decent yields and have usable quantities for Qiagen (they wanted ~300ng from each sample):

virginica_MBD_01 – 18.3ng/uL (457.5ng = 5.7% methylated DNA capture)

virginica_MBD_02 – 19.6ng/uL (490ng = 6.1% methylated DNA capture)

Will store @ -20C until next week so that we’re not shipping so close to the weekend (shipping address is in Germany).

Sam's Notebook

University of Washington – Fishery Sciences – Roberts Lab

NovaSeq Assembly – Trimmed Geoduck NovaSeq with Meraculous

Titrator Setup – Functional Methods & Data Exports

Basic Functionality Fixes

Data Export Fixes

Digression:

Data Management & Informational Resources

What’s Left?

Equipment

Software

Adapter Trimming and FASTQC – Illumina Geoduck Novaseq Data

Results:

Software Install – 10x Genomics Supernova on Mox (Hyak)

Install bclsfastq2 dependency

Install Supernova 2.0.0

Samples Submitted – C. virginica gDNA, MBD, and MspI to Qiagen

gDNA Isolation:

MBD:

MspI Digestion:

Assembly Comparisons – Oly Assemblies Using Quast

Results:

BEST OF:

DNA Quantification – MspI-digested Crassostrea virginica gDNA

Phenol:Chloroform Extractions and EtOH Precipitations – MspI Digestions of C.virginica DNA from Earlier Today

Restriction Digestion – MspI on Crassotrea virginica gDNA

DNA Quantification – C.virginica MBD-enriched DNA