Tag Archives: methylation

DNA Methylation Analysis – Olympia Oyster Whole Genome BSseq Bismark Pipeline MethylKit Comparison

I previously ran two variations on the Bismark analysis for our Olympia oyster whole genome bisulfite sequencing data:

I followed this up using the MethylKit R package to identify differentially modified loci (DML), based on differing amounts of coverage (1x, 3x, 5x, & 10x) and percent methylation differences between the two groups of oysters (25%, 50%, & 75%).

See the project wiki for experimental design info).

Both sets of analyses were documented in R Projects:

Default Bismark settings:
“Relaxed” Bismark settings

RESULTS

BedGraphs (1x coverage, 25% diff in methylation):

Default Bismark settings:
“Relaxed” Bismark settings:

The BedGraph outputs from the least stringent coverage/percent difference in methylation for both Bismark pipelines yield suprisingly low numbers of DML.

They yield 22 and 21 DML, respectively. Of course, more stringent BedGraphs have fewer DML, but may be more believable due to having a more robust set of data.

Interestingly, the two analyses reveal that a single contig contains the majority of DML, all within a 1000bp range.

Will continue to examine this data by examining Bismark BedGraphs in IGV, and running some additional MethylKit analysis looking at differentially modified regions(DMRs) to see what we can gleen from this.

DNA Methylation Analysis – Olympia Oyster Whole Genome BSseq Bismark Pipeline Comparison

Ran Bismark using our high performance computing (HPC) node, Mox, with two different bowtie2 settings:

  1. Default settings

  2. –score_min L,0,-0.6

The second setting is a bit less stringent than the default settings and should result in a higher percentage of reads mapping. However, not entirely sure what the actual implications will be (if any) for interpreting the resulting data.

Input data was previously trimmed per Bismark’s recommendation for Illumina TruSeq libraries (TrimGalore! 5′ 10bp):

List of input files and Bismark configurations can be seen in the SLURM scripts:


RESULTS

Output folders:

DNA Methylation Analysis – Olympia oyster BSseq MethylKit Analysis

NOTE: IMPORTANT CAVEATS – READ POST BEFORE USING DATA.
I’m posting this for posterity and to provide an overview of what (and whatnot) to do. Plus, this has a good R script for using MethylKit that can be used for subsequent analyses.

The goal of this analysis was to compare the methylation profiles of Olympia oysters originating from a common population (Fidalgo Bay) that were raised in two different locations (Fidalgo Bay & Oyster Bay).

An overview of the experiment can be viewed here:

I previously ran all of Olympia oyster DNA methylation sequencing data through the Bismark pipeline, and then processed them using the MethylKit R library.

First mistake (Bismark):

  • Trimmed FastQ files “incorrectly”.

Bismark provides an excellent user guide and provides a handy table on how to decide on trimming parameters, but I mistakenly trimmed these according to the recommendations for a different library preparation technique. I trimmed based on the Zymo Pico-Methyl Kit (which was used for the other group of data that I processed simultaneously), instead of the TruSeq library prep.

So, “incorrectly” isn’t necessarily the proper term here. The analysis can still be used, however, it’s likely that the excessive trimming results in reducing sequencing coverage, and, in turn, making the downstream analysis result in a highly conservative output. Thus, the data isn’t wrong or bad, it is just very limited.

And, this leads to the second mistake (Bismark):

  • Bowtie alignment score too strict

There’s a bit of a weird “battle” between Bismark and bowtie2. Bismark uses bowtie2 for generating alignments, but bowtie2’s default cutoff score overrides Bismark’s. So, to adjust the score value, you have to explicitly add the scoring parameters to your Bismark parameters during the alignment step. I did not do this.

Again, it’s not wrong, per se, but leads to a significantly limited set of data in the final analysis.

The data were analyzed based on a minimum of:

  • 3x coverage

  • 25% difference in methylation


RESULTS:

Methylkit analysis (R project; GitHub):

BedGraph file (BED):

The analysis resulted in a total of seven (yes, 7) differentially methylated loci (DML) between the two groups. It was this result that made Steven and me revisit the initial Bismark analysis. He has done this previously (but differently) and gotten significantly greater numbers of DML.

Knowing all of this, I will re-trim the data and adjust Bismark alignment score thresholds and then re-analyze with MethylKit.

Regardless here’re some plots to add some visual flair to this notebook entry (these, and more, are available in the GitHub repo):

CLUSTERING DENDROGRAM


PCA PLOT

DNA Methylation Analysis – Bismark Pipeline on All Olympia oyster BSseq Datasets

Bismark analysis of all of our current Olympia oyster (Ostrea lurida) DNA methylation high-throughput sequencing data.

Analysis was run on Emu (Ubuntu 16.04LTS, Apple Xserve). The primary analysis took ~14 days to complete.

All operations are documented in a Jupyter notebook (GitHub):

Genome used:


Input files ( see Olympia oyster Genomic GitHub wiki for more info ):

WG BSseq of Fidalgo Bay offspring grown in Fidalgo Bay & Oyster Bay
  • 1_ATCACG_L001_R1_001.fastq.gz

  • 2_CGATGT_L001_R1_001.fastq.gz

  • 3_TTAGGC_L001_R1_001.fastq.gz

  • 4_TGACCA_L001_R1_001.fastq.gz

  • 5_ACAGTG_L001_R1_001.fastq.gz

  • 6_GCCAAT_L001_R1_001.fastq.gz

  • 7_CAGATC_L001_R1_001.fastq.gz

  • 8_ACTTGA_L001_R1_001.fastq.gz

MBDseq of two populations (Hood Canal & Oyster Bay) grown in Clam Bay
  • zr1394_10_s456.fastq.gz

  • zr1394_11_s456.fastq.gz

  • zr1394_12_s456.fastq.gz

  • zr1394_13_s456.fastq.gz

  • zr1394_14_s456.fastq.gz

  • zr1394_15_s456.fastq.gz

  • zr1394_16_s456.fastq.gz

  • zr1394_17_s456.fastq.gz

  • zr1394_18_s456.fastq.gz

  • zr1394_1_s456.fastq.gz

  • zr1394_2_s456.fastq.gz

  • zr1394_3_s456.fastq.gz

  • zr1394_4_s456.fastq.gz

  • zr1394_5_s456.fastq.gz

  • zr1394_6_s456.fastq.gz

  • zr1394_7_s456.fastq.gz

  • zr1394_8_s456.fastq.gz

  • zr1394_9_s456.fastq.gz


RESULTS:

With Bismark complete, these two sets of analyses can now be looked into further (and separately, as they are separate experiments) using things like MethylKit (R package) and
the Integrative Genomics Viewer (IGV).

Output folder:

Bismark Summary Report:

Individual Sample Reports:

DNA Methylation Quantification – Acropora cervicornis (Staghorn coral) DNA from Javier Casariego (FIU)

Used the MethylFlash Methylated DNA Quantification Kit (Colorimetric) from Epigentek to quantify methylation in these coral DNA samples.

All samples were run in duplicate <em>except</em> 2h Block 1 due to insufficient DNA.

The following samples were used in a 1:10 dilution (2uL DNA : 18uL NanoPure H2O), due to their relatively high concentrations, to ensure accurate pipetting:

  • 72h Block 4
  • D14 Block 1
  • D14 Block 2
  • D14 Block 3
  • D14 Block 4
  • D14 Block 5
  • D14 Block 6
  • D14 Block 8
  • D14 Block 10

All samples were diluted to a final concentration of 9.645ng/uL (154.24ng total; 17.6uL) in NanoPure water, which is equal to 77.12ng of DNA per assay replicate. These numbers were chosen based off of the sample with the lowest concentration.

The following samples were used in their entirety:

  • 2h Block 8
  • D35 Block 8

Calculations were added to the spreadsheet provided by Javier (Google Sheet): A.cervicornis_DNA_Extractions(May_2017).xlsx

The spreadsheet became overly complicated because I initially forgot to account for the need to run each sample in duplicate.

The kit reagent dilutions were as follows:

  • Diluted ME1: 52mL of ME1 + 464mL of <em>distilled</em> water
  • Diluted ME4: 10uL of ME4 + 10uL of TE Buffer (pH=8.0; made by me on 20130408).
  • Standard curve: Prepped per instruction manual, with double volumes for two plates.
  • Diluted ME5: 50uL/well x 152well = 7600uL; 7600uL/1000 = 7.6uL; 7.6uL ME5 + 7592.4uL Diluted ME1
  • Diluted ME6: 50uL/well x 152well = 7600uL; 7600uL/2000 = 3.8uL; 3.8uL ME6 + 7596.2uL Diluted ME1
  • Diluted ME7: 50uL/well x 152well = 7600uL; 7600uL/5000 = 1.52uL; 1.52uL ME7 + 7598.48uL Diluted ME1

All diluted solutions were stored on ice for duration of procedure.

The remaining Diluted ME1 solution was stored at 4C (FTR 209), and is stable for 6 months, per the manufacturer’s instructions.

See the Results section below for plate layouts.

Plates were read at 450nm on the Seeb Lab Victor 1420 Plate Reader (Perkin Elmer) and the amount of DNA methylation was determined.

Results:

Individual sample methylation quantification (Google Sheet): A.cervicornis_DNA_Extractions(May_2017).xlsx

Plate Reader Output File Plate #1 (Google Sheet): 20170511_coral_DNA_methylation_plate01.xls

Plate Reader Output File Plate #2 (Google Sheet): 20170511_coral_DNA_methylation_plate02.xls

 

I’m not familiar with the experimental design, so I’m not going to spend time handling any of the in-depth analysis at this point in time. However, here’s the background on how methylation quantification and percent methylation were determined.

  1. Mean absorbance (450nm) was determined for all samples and standard curve samples. It’s important to note that the standard deviation between replicates was not evaluated and there appears to be consistent variability between samples, but I’m not certain how much variation is “acceptable” with and assay of this nature.

  2. The mean absorbance of the standard curve samples were plotted against their corresponding DNA amounts and a linear trendline was fitted to the points.

  3. Per the manufacturer’s recommendations, the four points (including the zero point) that yielded the best linear fit (i.e. best R^2 value) were used and the slope of best fit line for those four points was determined.

  4. This slope was then utilized in the equation provided by the manufacturer (see pg. 8 of the MethylFlash Kit manual).

DNA Methylation Quantification – Coral DNA from Jose M. Eirin-Lopez (Florida International University)

Ran the coral DNA I quantified on 20160630 through the MethylFlash Methylated DNA Quantification Kit [Colorimetric] (Epigentek) kit to quantify global methylation.

Used 100ng of DNA per 8uL per replicate (x2 replicates = total 200ng in 16uL). Calcs are here (Google Sheet): 20160705_coral_DNA_methylation_calcs

Manufacturer’s protocol was followed.

Dilutions of kit reagents:

ME5 (1:1000) 2.6uL ME5 + 2597.4uL diluted ME1

ME6 (1:2000) 1.3uL ME6 + 2598.7uL diluted ME1

ME7 (1:5000) 0.52uL ME7 + 2599.48uL diluted ME1

Samples were quantified on the Seeb’s plate reader @ 450nm  (Wallac 1420 Victor 2  [Perkin Elmer])

Results:

Google Sheet: 20160707_coral_DNA_methylflash

sample treatment 5-mC(ng)
H1_1 nitrogen 0.8712248853
H1_10 nitrogen 0.6917168368
H1_12 control 0.2738478893
H1_5 nitrogen & phosphorous 0.9663585942
H1_6 control 0.6494783825
H1_8 nitrogen & phosphorous 0.4244913398
H24_1 nitrogen 0.372603297
H24_10 nitrogen 0.4237237786
H24_12 control 0.5350511937
H24_5 nitrogen & phosphorous 0.1495527697
H24_6 control 0.2291900804
H24_8 nitrogen & phosphorous 0.2213437801
H5_1 nitrogen -0.1233169902
H5_10 nitrogen 0.6997668774
H5_12 control 0.2307000493
H5_5 nitrogen & phosphorous -0.07790933048
H5_6 control 0.4562401662
H5_8 nitrogen & phosphorous 0.5949647121

 

Overall, it’s difficult to really interpret these results. I believe the data is a time course (e.g. H5 = hour 5, H24 = hour 24). Additionally, looking at treatments, there appear to be replicates, but it’s not clear what type of replicates they are (i.e. technical or biological). Generally, it seems like the control samples have lower quantities of methylated DNA than the treated samples. However, this doesn’t hold true for all three of the groups.

And, not that it really matters, but I don’t even know what species this is…

In any case, this was an attempt to gather some preliminary data for a grant that Steven is attempting to put together, so the original experiment and the subsequent data aren’t as robust as one would expect for a full-blown research project.

MeDIP – SB/WB Fragmented gDNA EtOH precipitation (continued from 20100702)

Finished EtOH precipitation of MeDIP gDNA. Samples were pelleted 16,000g, 4C, 30mins. Supe was discarded. Washed with 1mL 70% EtOH, pelleted 16,000g, 4C, 15mins. Supe discarded. MeDIP DNA was resuspended in 100uL of TE (pH = 8.5). Wash samples, containing unmethylated DNA, were resuspended/combined in a total of 100uL TE (pH = 8.5). Samples were spec’d:

Results:

R37: MeDIP DNA = 1.393ug recovery. This is ~13% of the input total gDNA (11.25ug) and is ~28% of the total DNA recovered in the procedure (4.935ug). Unmethylated DNA = 3.542ug total recovery. This is ~31% of the input total gDNA (11.25ug) and is ~72% of the total DNA recovered in the procedure (4.935ug). Total DNA recovery = ~44%.

R51: MeDIP DNA = 1.256ug recovery. This is ~14% of the input total gDNA (8.75ug) and is ~23% of the total DNA recovered in the procedure (5.462ug). Unmethylated DNA = 4.206ug total recovery. This is ~48% of the input total gDNA (8.75ug) and is ~77% of the total DNA recovered in the procedure (5.462ug). Total DNA recovery = ~62%.

There definitely seemed to be a high degree of salt carryover from the procedure, despite the phenol:chloroform treatment and EtOH precipitation. As such, I believe this is the reason that the 260/230 ratios are so out of whack. Possibly explains why the 260/280 ratios for the MeDIP DNA are so high, too?

These results demonstrate what we can expect to recover from this procedure, as well as how much DNA gets lost during processing. MeDIP DNA and unmethylated DNA were stored @ -20C.

DNA Methylation Test – Gigas site gDNA (BB & DH) from 20090515

Used BB & DH samples #11-17 for procedure. Followed Epigentek’s protocol. My calcs for dilutions/solutions needed are here. All solutions were made fresh before using, except for Diluted GU1 which was made at the beginning of the procedure and stored on ice in a 50mL conical wrapped in aluminum foil. Used 100ng total (50ng/uL) of each sample gDNA. No standards for a standard curve based on speaking with Mac.

WELL SAMPLE WELL SAMPLE
A01 BB11 A02 DH11
B01 BB12 B02 DH12
C01 BB13 C02 DH13
D01 BB14 D02 DH14
E01 BB15 E02 DH15
F01 BB16 F02 DH16
G01 BB17 G02 DH17
H01 Pos. Control H02 Blank

Results: Above is the graph of the results. Although it’s only a small difference between the two sites, it is statistically significant. The calcs for this graph can be found here (Excel file). It should be noted that this graph was generated using estimated values from the standard curve provided in the manufacturer’s protocol. This was done because 1) I did not run standards to generate my own curve and 2) calculating the “% methylation” not using the formula that utilizes the standard curve was giving ridiculously high values (e.g. 350%).

Here is the raw data generated by the plate reader for a 1s read (Excel file) and a 0.1s (Excel file) read. Both reads have nearly identical values.