Tag Archives: BS-seq

Illumina Methylation Library Quantification – BS-seq Oly/C.gigas Libraries

Re-quantified the libraries that were completed yesterday using the Qubit3.0 dsDNA HS (high sensitivity) assay because the library concentrations were too low for the normal broad range kit.

Results:

Qubit Quants and Library Normalization Calcs: 20151222_qubit_illumina_methylation_libraries

SAMPLE CONCENTRATION (ng/μL)
1NF11 2.42
1NF15 1.88
1NF16 2.74
1NF17 2.54
2NF5 2.72
2NF6 2.44
2NF7 2.38
2NF8 1.88
M2 2.18
M3 2.56
NF2_6 2.5
NF_18 2.66

 

Things look pretty good. The TruSeq DNA Methylation Library Kit (Illumina) suggests that the libraries produced should end up with concentrations >3ng/μL, but we have plenty of DNA here to make a pool for running on the HiSeq2500.

Illumina Methylation Library Construction – Oly/C.gigas Bisulfite-treated DNA

Took the bisulfite-treated DNA from 20151218 and made Illumina libraries using the TruSeq DNA Methylation Library Kit (Illumina).

Quantified the completed libraries using the Qubit 3.0 dsDNA BR Kit (ThermoFisher).

Evaluated the DNA with the Bioanalyzer 2100 (Agilent) using the DNA 12000 assay. Illumina recommended using the High Sensitivity assay, but we don’t have access to that so I figured I’d just give the DNA 12000 assay a go.

SampleName IndexNumber BarCode
1NF11 1 ATCACG
1NF15 2 CGATGT
1NF16 3 TTAGGC
1NF17 4 TGACCA
2NF5 5 ACAGTG
2NF6 6 GCCAAT
2NF7 7 CAGATC
2NF8 8 ACTTGA
M2 9 GATCAG
M3 10 TAGCTT
NF2_6 11 GGCTAC
NF_18 12 CTTGTA

 

Results:

Library Quantification (Google Sheet): 20151221_quantification_illumina_methylation_libraries

Test Name Concentration (ng/μL)
1NF11 Out of range
1NF15 2.14
1NF16 2.74
1NF17 2.64
2NF5 2.92
2NF6 Out of range
2NF7 2.42
2NF8 2.56
M2 Out of range
M3 2.1
NF2_6 2.38
NF2_18 Out of range

 

I used the Qubit’s BR (broad range) kit because I wasn’t sure what concentrations to expect. I need to use the high sensitivity kit to get a better evaluation of all the samples’ concentrations.

 

 

Bioanalyzer Data File (Bioanalyzer 2100): 2100_20expert_DNA_2012000_DE72902486_2015-12-21_16-58-43.xad

 

Ha! Well, looks like you definitely need to use the DNA High Sensitivty assay for the Bioanalyzer to pick up anything. Although, I guess you can see a slight hump in most of the samples at the appropriate sizes (~300bp); you just have to squint. ;)

Bisulfite Treatment – Oly Reciprocal Transplant DNA & C.gigas Lotterhos DNA for BS-seq

After confirming that the DNA available for this project looked good, I performed bisulfite treatment on the following gDNA samples:

  • 1NF11
  • 1NF15
  • 1NF16
  • 1NF17
  • 2NF5
  • 2NF6
  • 2NF7
  • 2NF8
  • NF2_6
  • NF2_18
  • M2
  • M3

Sample names breakdown like this:

1NF#

1 = Fidalgo Bay outplants

NF = Fidalgo Bay broodstock origination

# = Sample number

2NF#

Same as above, but:

2 = Oyster Bay outplants

NF2_# (Oysters grown in Oyster Bay; DNA provided by Katherine Silliman)

NF2 = Fidalgo Bay broodstock origination, family #2

# = Sample number

M2/M3 = C.gigas from Katie Lotterhos

 

Followed the guidelines of the TruSeq DNA Methylation Library Prep Guide (Illumina).

Used the EZ DNA Methylation-Gold Kit (ZymoResearch) according to the manufacturer’s protocol with the following changes/notes:

  • Used 100ng DNA (per Illumina recs; Zymo recommends at least 200ng for “optimal results”).
  • Thermal cycling was performed in 0.5mL thin-wall tubes in a PTC-200 (MJ Research) using a heated lid
  • Centrifugations were performed at 13,000g
  • Desulphonation incubation for 20mins.

DNA quantity calculations are here (Google Sheet): 20151218_oly_bisulfite_calcs

Samples were stored @ -20C. Will check samples via Bioanalyzer before proceeding to library construction.

DNA Isolation – Oly gDNA for BS-seq

Need DNA to prep our own libraries for bisulfite-treated high-throughput sequencing (BS-seq).

Isolated gDNA from the following tissue samples stored in RNAlater (tissue was not weighed) using DNAzol:

2NF1
2NF2
2NF3
2NF4
2NF5
2NF6
2NF7
2NF8
1NF11
1NF12
1NF13
1NF14
1NF15
1NF16
1NF17
1NF18

The sample coding breaks down as follows (see the project wiki for a full explanation):

2NF#

2 = Oysters outplanted in Fidalgo Bay

NF = Broodstock originated in Fidalgo Bay

# = Sample number

1NF#

1 = Oysters outplanted in Oyster Bay

NF = Broodstock originated in Fidalgo Bay

# = Sample number

 

DNA was isolated in the following manner:

  • Homogenized tissues in 500μL of DNAzol (Molecular Research Center; MRC).
  • Added additional 500μL of DNAzol.
  • Added 10μL of RNase A (10mg/mL, ThermoFisher); incubated 10mins @ RT.
  • Added 300μL of chloroform and mixed moderately fast by hand.
  • Incubated 5mins @ RT.
  • Centrifuged 12,000g, 10mins, RT.
  • Transferred aqueous phase to clean tube.
  • Added 500μL of 100% EtOH and mixed by inversion.
  • Pelleted DNA 5,000g, 5mins @ RT.
  • Performed 3 washes w/70% EtOH.
  • Dried pellet 3mins.
  • Resuspended in 100μL of Buffer EB (Qiagen).
  • Centrifuged 12,000g, 10mins, RT to pellet insoluble material.
  • Transferred supe to clean tube.

The samples were quantified using the Qubit dsDNA BR reagents (Invitrogen) according to the manufacturer’s protocol and used 1μL of sample for measurement.

Results:

Qubit data (Google Sheet): 20151216_Oly_gDNA_qubit_quants

SAMPLE CONCENTRATION (ng/μL)
2NF1 76.4
2NF2 175
2NF3 690
2NF4 11.7
2NF5 142
2NF6 244
2NF7 25
2NF8 456
1NF11 182
1NF12 432
1NF13 155
1NF14 21
1NF15 244
1NF16 112
1NF17 25.2
1NF18 278

 

Will run samples on gel tomorrow to evaluate gDNA integrity.

Bioinformatics – Trimmomatic/FASTQC on C.gigas Larvae OA NGS Data

Previously trimmed the first 39 bases of sequence from reads from the BS-Seq data in an attempt to improve our ability to map the reads back to the C.gigas genome. However, Mac (and Steven) noticed that the last ~10 bases of all the reads showed a steady increase in the %G, suggesting some sort of bias (maybe adaptor??):

Although I didn’t mention this previously, the figure above also shows an odd “waves” pattern that repeats in all bases except for G. Not sure what to think of that…

Quick summary of actions taken (specifics are available in Jupyter notebook below):

  • Trim first 39 bases from all reads in all raw sequencing files.
  • Trim last 10 bases from all reads in raw sequencing files
  • Concatenate the two sets of reads (400ppm and 1000ppm treatments) into single FASTQ files for Steven to work with.

Raw sequencing files:

Notebook Viewer: 20150521_Cgigas_larvae_OA_Trimmomatic_FASTQC

Jupyter (IPython) notebook: 20150521_Cgigas_larvae_OA_Trimmomatic_FASTQC.ipynb

 

 

Output files

Trimmed, concatenated FASTQ files
20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz

 

FASTQC files
20150521_trimmed_2212_lane2_400ppm_GCCAAT_fastqc.html
20150521_trimmed_2212_lane2_400ppm_GCCAAT_fastqc.zip

20150521_trimmed_2212_lane2_1000ppm_CTTGTA_fastqc.html
20150521_trimmed_2212_lane2_1000ppm_CTTGTA_fastqc.zip

 

Example of FASTQC analysis pre-trim:

 

 

Example FASTQC post-trim (from 400ppm data):

 

Trimming has removed the intended bad stuff (inconsistent sequence in the first 39 bases and rise in %G in the last 10 bases). Sequences are ready for further analysis for Steven.

However, we still see the “waves” pattern with the T, A and C. Additionally, we still don’t know what caused the weird inconsistencies, nor what sequence is contained therein that might be leading to that. Will contact the sequencing facility to see if they have any insight.

Bioinformatics – Trimmomatic/FASTQC on C.gigas Larvae OA NGS Data

In another troubleshooting attempt for this problematic BS-seq Illumina data, I’m going to use Trimmomatic to remove the first 39 bases of each read. This is due to the fact that even after the previous quality trimming with Trimmomatic, the first 39 bases still showed inconsistent quality:

 

Ran Trimmomatic on just a single data set to try things out: 2212_lane2_CTTGTA_L002_R1_001.fastq.gz

Notebook Viewer: 20150506_Cgigas_larvae_OA_trimmomatic_FASTQC

Jupyter (IPython) notebook: 20150506_Cgigas_larvae_OA_trimmomatic_FASTQC.ipynb

Results:

Trimmed FASTQ: 20150506_trimmed_2212_lane2_CTTGTA_L002_R1_001.fastq.gz

FASTQC Report: 20150506_trimmed_2212_lane2_CTTGTA_L002_R1_001_fastqc.html

You can see how flat the newly trimmed data is (which is what one would expect).

Steven will take this trimmed dataset and try additional mapping with it to see if removal of the first 39 bases will improve the mapping.

 

BLAST – C.gigas Larvae OA Illumina Data Against GenBank nt DB

In an attempt to figure out what’s going on with the Illumina data we recently received for these samples, I BLASTed the 400ppm data set that had previously been de-novo assembled by Steven: EmmaBS400.fa.

Jupyter (IPython) Notebook : 20150501_Cgigas_larvae_OA_BLASTn_nt.ipynb

Notebook Viewer : 20150501_Cgigas_larvae_OA_BLASTn_nt

Results:

BLASTn Output File: 20150501_nt_blastn.tab

BLAST e-vals <= 0.001: 20150501_Cgigas_larvae_OA_blastn_evals_0.001.txt

Unique BLAST Species: 20150501_Cgigas_larvae_OA_unique_blastn_evals.txt

 

Firstly, since this library was bisulfite converted, we know that matching won’t be as robust as we’d normally see.

However, the BLAST matches for this are terrible.

Only 0.65% of the BLAST matches (e-value <0.001) are to Crassostrea gigas. Yep, you read that correctly: 0.65%.

It’s nearly 40-fold less than the top species: Dictyostelium discoideum (a slime mold)

It’s 30-fold less than the next species: Danio rerio (zebra fish)

Then it’s followed up by human and mouse.

I think I will need to contact the Univ. of Oregon sequencing facility to see what their thoughts on this data is, because it’s not even remotely close to what we should be seeing, even with the bisulfite conversion…

Goals – May 2015

Here are the things I plan to tackle throughout the month of May:

Geoduck Reproductive Development Transcriptomics

My primary goal for this project is to successfully isolate RNA from the remaining, troublesome paraffin blocks that have yet to yield any usable RNA. The next approach to obtain usable quantities of RNA is to directly gouge tissue from the blocks instead of sectioning the blocks (as recommended in the PAXgene Tissue RNA Kit protocol). Hopefully this approach will eliminate excess paraffin, while increasing the amount of input tissue. Once I have RNA from the entire suite of samples, I’ll check the RNA integrity via Bioanalyzer and then we’ll decide on a facility to use for high-throughput sequencing.

 

BS-Seq Illumina Data Assembly/Mapping

Currently, there are two projects that we have performed BS-Seq with (Crassostrea gigas larvae OA (2011) bisulfite sequencing and LSU C.virginica Oil Spill MBD BS Sequencing) and we’re struggling to align sequences to the C.gigas genome. Granted, the LSU samples are C.virginica, but the C.gigas larvae libraries are not aligning to the C.gigas genome via standard BLASTn or using a dedicated bisulfite mapper (e.g. BS-Map). I’m currently BLASTing a de-novo assembly of the C.gigas larvae OA 400ppm sequencing that Steven made against the NCBI nt DB in an attempt to assess the taxonomic distribution of the sequences we received back. I’ll also try using a different bisulfite mapper, bismark, that Mackenzie Gavery has previously used and has had better results with than BS-Map.

 

C.gigas Heat Stress MeDIP/BS-Seq

As part of Claire’s project, there’s still some BS-Seq data that would be nice to have to complement the data she generated via microarray. It would be nice to make a decision about how to proceed with the samples. However, part of our decision on how to proceed is governed by the results we get from the two projects above. Why do those two projects impact the decision(s) regarding this project? They impact this project because in the two projects above, we produced our own BS-Seq libraries. This is extremely cost effective. However, if we can’t obtain usable data from doing the library preps in-house, then that means we have to use an external service provider. Using an external company to do this is significantly more expensive. Additionally, not all companies can perform bisulfite treatment, which limits our choices (and, in turn, pricing options) on where to go for sequencing.

 

Miscellany

When I have some down time, I’ll continue working on migrating my Wikispaces notebook to this notebook. I only have one year left to go and it’d be great is all my notebook entries were here so they’d all be tagged/categorized and, thus, be more searchable. I’d also like to work on adding README files to our plethora of electronic data folders. Having these in place will greatly facilitate the ability of people to quickly and more easily figure out what these folders contain, file formats within those folders, etc. I also have a few computing tips/tricks that I’d like to add to our Github “Code” page. Oh, although this isn’t really lab related, I was asked to teach the Unix shell lesson (or, at least, part of it) at the next Software Carpentry Workshop that Ben Marwick is setting up at UW in early June. So, I’m thinking that I’ll try to incorporate some of the data handling stuff I’ve been tackling in lab in to the lesson I end up teaching. Additionally, going through the Software Carpentry materials will help reinforce some of the “fundamental” tasks that I can do with the shell (like find, cut and grep).

In the lab, I plan on sealing up our nearly overflowing “Broken Glass” box and establishing a new one. I need to autoclave, and dispose of, a couple of very full biohazard bags. I’m also going to vow that I will get Jonathan to finally obtain a successful PCR from his sea pen RNA.

Quality Trimming – C.gigas Larvae OA BS-Seq Data

Jupyter (IPython) Notebook: 20150414_C_gigas_Larvae_OA_Trimmomatic_FASTQC.ipynb

NBviewer: 20150414_C_gigas_Larvae_OA_Trimmomatic_FASTQC.ipynb

 

Trimmed FASTQC

400ppm Index – GCCAAT

20150414_trimmed_2212_lane2_GCCAAT_L002_R1_001_fastqc.html
20150414_trimmed_2212_lane2_GCCAAT_L002_R1_002_fastqc.html
20150414_trimmed_2212_lane2_GCCAAT_L002_R1_003_fastqc.html
20150414_trimmed_2212_lane2_GCCAAT_L002_R1_004_fastqc.html
20150414_trimmed_2212_lane2_GCCAAT_L002_R1_005_fastqc.html
20150414_trimmed_2212_lane2_GCCAAT_L002_R1_006_fastqc.html

1000ppm Index – CTTGTA

20150414_trimmed_2212_lane2_CTTGTA_L002_R1_001_fastqc.html
20150414_trimmed_2212_lane2_CTTGTA_L002_R1_002_fastqc.html
20150414_trimmed_2212_lane2_CTTGTA_L002_R1_003_fastqc.html
20150414_trimmed_2212_lane2_CTTGTA_L002_R1_004_fastqc.html