Category Archives: Olympia Oyster Genome Sequencing

DNA Isolation – Ostrea lurida DNA for PacBio Sequencing

In an attempt to improve upon the partial genome assembly we received from BGI, we will be sending DNA to the UW PacBio core facility for additional sequencing.

Isolated DNA from mantle tissue from the same Ostrea lurida individual used for the BGI sequencing efforts. Tissue was collected by Brent & Steven on 20150812.

Used the E.Z.N.A. Mollusc Kit (Omega) to isolate DNA from two separate 50mg pieces of mantle tissue according to the manufacturer’s protocol, with the following changes:

  • Samples were homogenized with plastic, disposable pestle in 350μL of ML1 Buffer
  • Incubated homogenate at 60C for 1.5hrs
  • No optional steps were used
  • Performed three rounds of 24:1 chloroform:IAA treatment
  • Eluted each in 50μL of Elution Buffer and pooled into a single sample

Quantified the DNA using the Qubit dsDNA BR Kit (Invitrogen). Used 1μL of DNA sample.

Concentration = 326ng/μL (Quant data is here [Google Sheet]: 20161214_gDNA_Olurida_qubit_quant

Yield is good and we have more than enough (~5μg is required for sequencing) to proceed with sequencing.

Evaluated gDNA quality (i.e. integrity) by running ~500ng (1.5μL) of sample on 0.8% agarose, low-TAE gel stained with ethidium bromide.

Used 5μL of O’GeneRuler DNA Ladder Mix (ThermoFisher).

Results:

 

 

Overall, the gel looks OK. A fair amount of smearing, but a strong, high molecular weight band is present. The intensity of the smearing is likely due to the fact that the gel is overloaded for this particular well size. If I had used a broader comb and/or loaded less DNA, the band would be more defined and the smearing would be less prominent.

Will submit sample to the UW PacBio facility tomorrow!

Data Management – Download Final BGI Genome & Assembly Files

We received info to download the final data and genome assembly files for geoduck and Olympia oyster from BGI.

In total, the downloads took a little over three days to complete!

The notebook detailing how the files were downloaded is below, but it should be noted that I had to strip the output cells because the output from the download command made the file too large to upload to GitHub, and the size of the notebook file would constantly crash the browser/computer that it was opened in. So, the notebook below is here for posterity.

Jupyter Notebook: 20161206_docker_BGI_genome_downloads.ipynb

 

Data Management – Tracking O.lurida FASTQ File Corruption

UPDATE 20170104 – These two corrupt files have been replaced with non-corrupt files.


 

Sean identified an issue with one of the original FASTQ files provided to use by BGI. Additionally, Steven had (unknowingly) identified the same corrupt file, as well as a second corrupt file in the set of FASTQ files. The issue is discussed here: https://github.com/sr320/LabDocs/issues/334

Steven noticed the two files when he ran the program FASTQC and two files generated no output (but no error message!).

The two files in question are:

  • 151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz
  • 151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz

This post is an attempt to document where things went wrong, but having glanced through this data a bit already, it won’t provide any answers.

I originally downloaded the data on 20160127 to my home folder on Owl (this is detailed in the Jupyter notebook in that post) and generated/compared MD5 checksum values. The values matched at that time.

So, let’s investigate a bit further…

Launch Docker container

docker run - p 8888:8888 -v /Users/sam/data/:/data -v /Users/sam/owl_home/:/owl_home -v /Users/sam/owl_web/:owl_web -v /Users/sam/gitrepos/LabDocs/jupyter_nbs/sam/:/jupyter_nbs -it 0ba43904567e

The command allows access to Jupyter Notebook over port 8888 and makes my Jupyter Notebook GitHub repo and my data files accessible to the Docker container.

Once the container was started, started Jupyter Notebook with the following command inside the Docker container:

jupyter notebook

This command is configured in the Docker container to launch a Jupyter Notebook without a browser on port 8888.

Jupyter notebook file: 20161117_docker_oly_genome_fastq_corruption.ipynb

I’ve embedded the notebook below, but it’s much easier to view (there are many lengthy commands/filenames that wrap lines in the embedded version below) the actual file linked above.

Data Management – Olympia Oyster Small Insert Library Genome Assembly from BGI

Received another set of Ostrea lurida genome assembly data from BGI. In this case, it’s data assembled from the small insert libraries they created for this project.

All data is stored here: http://owl.fish.washington.edu/O_lurida_genome_assemblies_BGI/20160512/

They’ve provided a Genome Survey (PDF) that has some info about the data they’ve assembled. In it, is the estimated genome size:

Olympia oyster genome size: 1898.92 Mb

Additionally, there’s a table breaking down the N50 distributions of scaffold and contig sizes.

Data management stuff was performed in a Jupyter (iPython) notebook; see below.

Jupyter Notebook: 20160516_Oly_Small_Insert_Library_Genome_Read_Counts.ipynb


 

SRA Submission – Genome sequencing of the Olympia oyster (Ostrea lurida)

Adding our Olympia oyster genome sequencing (sequencing done by BGI) to the NCBI Sequence Read Archive (SRS). The current status can be seen in the screen cap below. Release date is set for a year from now, but will likely bump it up. Need Steven to review the details of the submission (BioProject, Experiment descriptions, etc.) before I initiate the public release. Will update this post with the SRA number once we receive it.

Here’s the list of files uploaded to the SRA:

151114_I191_FCH3Y35BCXX_L1_wHAIPI023992-37_1.fq.gz
151114_I191_FCH3Y35BCXX_L1_wHAIPI023992-37_2.fq.gz
151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_1.fq.gz
151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz
151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz
151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_2.fq.gz
160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCABDLAAPEI-62_1.fq.gz
160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCABDLAAPEI-62_2.fq.gz
160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCACDTAAPEI-75_1.fq.gz
160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCACDTAAPEI-75_2.fq.gz
160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCABDLAAPEI-62_1.fq.gz
160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCABDLAAPEI-62_2.fq.gz
160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCACDTAAPEI-75_1.fq.gz
160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCACDTAAPEI-75_2.fq.gz
160103_I137_FCH3V5YBBXX_L5_WHOSTibkDCAADWAAPEI-74_1.fq.gz
160103_I137_FCH3V5YBBXX_L5_WHOSTibkDCAADWAAPEI-74_2.fq.gz
160103_I137_FCH3V5YBBXX_L6_WHOSTibkDCAADWAAPEI-74_1.fq.gz
160103_I137_FCH3V5YBBXX_L6_WHOSTibkDCAADWAAPEI-74_2.fq.gz

Paired-end sequencing files were uploaded together within a single “Run”.

SRA Info:
SRA: SRS1365663
Study: SRP072461
BioProject: PRJNA316624
BioSample: SAMN04588827

Data Received – Initial Olympia oyster Genome Assembly from BGI

The initial assembly of the Ostrea lurida genome is available from BGI. Currently, we’ve stashed it here:

http://owl.fish.washington.edu/O_lurida_genome_assemblies_BGI/20160314/

The data provided consisted of the following three files:

  • md5.txt
  • N50.txt
  • scaffold.fa.fill

md5.txt – Checksum file to verify integrity of files after downloading.

N50.txt – Contains some very limited stats on scaffolds provided.

scaffold.fa.fill – A FASTA file of scaffolds. Since these are scaffolds (and NOT contigs!), there are many regions containing NNNNNN’s that have been put in place for scaffold assembly based on paired-end spatial information. As such, the N50 information is not as useful as it would be if these were contigs.

Additional assemblies will be provided at some point. I’ve emailed BGI about what we should expect from this initial assembly and what subsequent assemblies should look like.

Data Received – Ostrea lurida genome sequencing files from BGI

Downloaded data from the BGI project portal to our server, Owl, using the Synology Download Station. Although the BGI portal is aesthetically nice, it’s set up poorly for bulk downloads and took a few tries to download all of the files.

Data integrity was assessed and read counts for each file were generated. The files were moved to their permanent storage location on Owl: http://owl.fish.washington.edu/nightingales/O_lurida

The readme.md file was updated to include project/file information.

The file manipulations were performed in a Jupyter notebook (see below).

 

Total reads generated for this project: 1,225,964,680

BGI provided us with the raw data files for us to play around with, but they are also currently in the process of performing the genome assembly.

 

Jupyter Notebook file: 20160126_Olurida_BGI_data_handling.ipynb

Notebook Viewer: 20160126_Olurida_BGI_data_handling.ipynb

DNA Quality Assessment – Geoduck & Olympia Oyster gDNA

Have three separate sets of geoduck & olympia oyster gDNA that need to be run on gels before sending to BGI for genome sequencing:

GEODUCK

 

OLYMPIA OYSTER

 

Ran 100ng of each sample on a 0.8% agarose 1x modified TAE gel w/EtBr.

Results:

 

All the samples from both sets appear to be overloaded. Overloading is generally seen as the streaking seen immediately above each band.

GEODUCK

Overall, the samples look pretty good. Sadly, the worst of the three (due to the most smearing – i.e. degradation) appears to be the DNA extracted using the E.Z.N.A. Mollusc Kit (Omega BioTek).

Also of note are the two bands present in the DNAzol sample. These bands are likely ribosomal RNA because I neglected to perform a RNase treatment during the extraction. Doh!

 

OLYMPIA OYSTER

None of them are particularly great. Just like the geoduck set, the worst of the three came from the E.Z.N.A Mollusc Kit (Omega BioTek).

Also, just like the geoduck set, there are two bands present in the DNAzol sample. These bands are likely ribosomal RNA because I neglected to perform a RNase treatment during the extraction. Doh!

The phenol-chloroform clean up sample is either jacked up or severely overloaded, based on the crazy streaking that’s present. However, this sample looked similar after the initial extraction on 20151113.

 

I will send these samples separately (i.e. will not pool them into single samples) to BGI to run QC and, hopefully, add them to the DNA they already have to complete the genome sequencing for these two projects.

DNA Isolation – Olympia Oyster Outer Mantle gDNA

Isolated additional gDNA for the genome sequencing. To try to improve the quality (260/280 & 260/230 ratios) of the gDNA, I added a chloroform step after the initial tissue homogenization.

Used 123mg of Ostrea lurida outer mantle collected by Brent & Steven on 20150812.

  • Homogenized in 500μL of DNAzol.
  • Added additional 500μL of DNAzol.
  • Centrifuged 12,000g, 10mins, @ RT.
  • Split supernatant equally into two tubes.
  • Added 500μL of chloroform and mixed moderately fast by hand.
  • Centrifuged 12,000g, 10mins, RT.
  • Combined aqueous phases from both tubes in a clean tube.
  • Added 500μL of 100% EtOH and mixed by inversion.
  • Spooled precipitated gDNA and transferred to clean tube.
  • Performed 3 washes w/70% EtOH.
  • Dried pellet 3mins.
  • Resuspended in 200μL of Buffer EB (Qiagen).
  • Centrifuged 10,000g, 5mins, RT to pellet insoluble material.
  • Transferred supe to clean tube.

DNA was quantified using two methods: NanoDrop1000 & Qubit 3.0 (ThermoFisher).

For the Qubit, the samples were quantified using the Qubit dsDNA BR reagents (Invitrogen) according to the manufacturer’s protocol and used 1μL of sample for measurement.

Results:

Qubit Data (Google Sheet): 20151125_qubit_gDNA_geoduck_oly_quants

METHOD CONCENTRATION (ng/μL) TOTAL (μg)
Qubit 137 27.4
NanoDrop1000 295 59.0

 

Yield is solid. We should finally have sufficient quantities of gDNA to allow for BGI to proceed with the rest of the genome sequencing! Will run sample on gel to evaluate integrity and then send off to BGI.

The NanoDrop & Qubit numbers still aren’t close (as expected).

The addition of the chloroform step definitely helped improve the 260/280 OD ratio (see below). However, the addition of that step had no noticeable impact on the 260/230 OD ratios, which is a bit disappointing.

 

NanoDrop Absorbance Values & Plots