DNA Isolation – Geoduck gDNA for Illumina-initiated Sequencing Project

We were previously approached by Cindy Lawley (Illumina Market Development) for possible participation in an Illumina product development project, in which they wanted to have some geoduck tissue and DNA on-hand in case Illumina green-lighted the use of geoduck for testing out the new sequencing platform on non-model organisms. Well, guess what, Illumina has give the green light for sequencing our geoduck! However, they need at least 4μg of gDNA, so I’m isolating more.

Isolated DNA from ctenidia tissue from the same Panopea generosa individual used for the BGI sequencing efforts. Tissue was collected by Brent & Steven on 20150811.

Used the E.Z.N.A. Mollusc Kit (Omega) to isolate DNA from five separate ~60mg pieces of ctenidia tissue according to the manufacturer’s protocol, with the following changes:

  • Samples were homogenized with plastic, disposable pestle in 350μL of ML1 Buffer
  • Incubated homogenate at 60C for 1hr
  • No optional steps were used
  • Performed three rounds of 24:1 chloroform:IAA treatment
  • Eluted each in 50μL of Elution Buffer and pooled into a single sample

Quantified the DNA using the Qubit dsDNA BR Kit (Invitrogen). Used 1μL of DNA sample.

Concentration = 162ng/μL (Quant data is here [Google Sheet]: 20170105_gDNA_geoduck_qubit_quant

Yield is great (total = ~32μg).

Evaluated gDNA quality (i.e. integrity) by running 162ng (1μL) of sample on 0.8% agarose, low-TAE gel stained with ethidium bromide.

Used 5μL of O’GeneRuler DNA Ladder Mix (ThermoFisher).

 

Results:

 

 

DNA looks good: bright high molecular weight band, minimal smearing, and minimal RNA carryover (seen as more intense “smear” at ~500bp).

Will send off 10μg (they only requested 4μg) so that they have extra to work with in case they come across any issues.

Data Management – Replacement of Corrupt BGI Oly Genome FASTQ Files

Previously, Sean and Steven identified two potentially corrupt FASTQ files. I contacted BGI about getting replacement files and they informed me that all versions of the FASTQ files they have delivered on three separate occasions are all the same file (despite having different file names). As such, I could use one of these versions to replace the corrupt FASTQ files. So, that’s what I did!

See the Jupyter Notebook below for the deets!

Jupyter Notebook (GitHub): 20170104_docker_oly_BGI_genome_corruption_solved.ipynb

Goals – January 2017

One of the long-running goals I’ve had is to get this Oly GBS data taken care of and out the door to publication. I think I will finally succeed with this, with the help of Pub-A-Thon. Don’t get too excited, it’s not what you think. It is not the drinking extravaganza that the name implies. Instead, it’s a “friendly” lab competition to get some scientific publications assembled and submitted.

Another goal for this month is to get the -80C organized. We’ve made some major progress on lab organization, with major kudos going to Grace Crandall and her work on cleaning out fridges/freezers and putting together our lab inventory spreadsheet. The -80C organization is the final frontier of getting the lab fully under control and more well-regulated.

Continuing on the organization front, it’d be great if we could get the Data Management Plan finished. Sean Bennett has helped get us much closer to completion. Hopefully this month we can get it finalized and have it be fully functional so that any lab member can easily figure out what to do when they receive new sequencing data.

I’d also like to put together a more automated means of handling our high-throughput sequencing data when we receive it. Ideally, it’d be a Jupyter Notebook and all the user would have to do is enter the desired location (heck, maybe I could even simplify it further by requiring just a species name…) for the files to be stored and then press “play” on the notebook. The files would go through a post-download integrity check, moved to final location, re-check integrity, update checksum files, and update readme files. I have most of the bits here and there in various Jupyter Notebooks already, but haven’t taken the time to put them all together into a single, reusable notebook.

Data Management – Geoduck RRBS Data Integrity Verification

Yesterday, I downloaded the Illumina FASTQ files provided by Genewiz for Hollie Putnam’s reduced representation bisulfite geoduck libraries. However, Genewiz had not provided a checksum file at the time.

I received the checksum file from Genewiz and have verified that the data is intact. Verification is described in the Jupyter notebook below.

Data files are located here: owl/web/nightingales/P_generosa

Jupyter notebook (GitHub): 20161230_docker_geoduck_RRBS_md5_checks.ipynb

Data Received – Geoduck RRBS Sequencing Data

Hollie Putnam prepared some reduced representation bisulfite Illumina libraries and had them sequenced by Genewiz.

The data was downloaded and MD5 checksums were generated.

IMPORTANT: MD5 checksums have not yet been provided by Genewiz! We cannot verify the integrity of these data files at this time! Checksums have been requested. Will create new notebook entry (and add link to said entry) once the checksums have been received and we can compare them.

UPDATE 20161230 – Have received and verified checksums.

 

Jupyter notebook: 20161229_docker_genewiz_geoduck_RRBS_data.ipynb

DNA Isolation – Geoduck gDNA for Potential Illumina-initiated Sequencing Project

We were approached by Cindy Lawley (Illumina Market Development) yesterday to see if we’d be able to participate in some product development. We agreed and need some geoduck DNA to send them, in case she’s able to get our species greenlighted for use.

Isolated DNA from ctenidia tissue from the same Panopea generosa individual used for the BGI sequencing efforts. Tissue was collected by Brent & Steven on 20150811.

Used the E.Z.N.A. Mollusc Kit (Omega) to isolate DNA from two separate 50mg pieces of ctenidia tissue according to the manufacturer’s protocol, with the following changes:

  • Samples were homogenized with plastic, disposable pestle in 350μL of ML1 Buffer
  • Incubated homogenate at 60C for 1hr
  • No optional steps were used
  • Performed three rounds of 24:1 chloroform:IAA treatment
  • Eluted each in 50μL of Elution Buffer and pooled into a single sample

Quantified the DNA using the Qubit dsDNA BR Kit (Invitrogen). Used 1μL of DNA sample.

Concentration = 19.4ng/μL (Quant data is here [Google Sheet]: 20161221_gDNA_qubit_quant

Yield is low (~1.8μg), but have enough to satisfy the minimum of 1μg requested by Cindy Lawley.

Evaluated gDNA quality (i.e. integrity) by running ~250ng (12.5μL) of sample on 0.8% agarose, low-TAE gel stained with ethidium bromide.

Used 5μL of O’GeneRuler DNA Ladder Mix (ThermoFisher).

 

Results:

 

 

 

 

Overall, the sample looks good. Strong, high molecular weight band is present with minimal smearing. However, there is a smear in the ~500bp range. This is most likely residual RNA. This is surprsing since the E.Z.N.A Mollusc Kit includes n RNase step. Regardless, having intact, high molecular weight DNA is the important part for this project. Will prepare to send remainder (~1.5μg) of geoduck to Illumina with other requested samples.

Data Management – Integrity Check of Final BGI Olympia Oyster & Geoduck Data

After completing the downloads of these files from BGI, I needed to verify that the downloaded copies matched the originals. Below is a Jupyter Notebook detailing how I verified file integrity via MD5 checksums. It also highlights the importance of doing this check when working with large sequencing files (or, just large files in general), as a few of them had mis-matching MD5 checksums!

Although the notebook is embedded below, it might be easier viewing via the notebook link (hosted on GitHub).

At the end of the day, I had to re-download some files, but all the MD5 checksums match and these data are ready for analysis:

Final Ostrea lurida genome files

Final Panopea generosa genome files

Jupyter Notebook: 20161214_docker_BGI_data_integrity_check.ipynb