Tag Archives: methylation

DNA Methylation Analysis – Olympia Oyster Whole Genome BSseq Bismark Pipeline MethylKit Comparison

I previously ran two variations on the Bismark analysis for our Olympia oyster whole genome bisulfite sequencing data:

20180913 Bismark tweaks

I followed this up using the MethylKit R package to identify differentially modified loci (DML), based on differing amounts of coverage (1x, 3x, 5x, & 10x) and percent methylation differences between the two groups of oysters (25%, 50%, & 75%).

See the project wiki for experimental design info).

Both sets of analyses were documented in R Projects:

Default Bismark settings:

20180912_oly_methylkit (GitHub)

“Relaxed” Bismark settings

20180913_oly_methylkit

RESULTS

BedGraphs (1x coverage, 25% diff in methylation):

Default Bismark settings:

20180912_oly_methylkit/analyses/OlyFbOb_1x_cov_25percentDiff.bed

“Relaxed” Bismark settings:

20180913_oly_methylkit/analyses/OlyFbOb_1x_cov_25percentDiff.bed

The BedGraph outputs from the least stringent coverage/percent difference in methylation for both Bismark pipelines yield suprisingly low numbers of DML.

They yield 22 and 21 DML, respectively. Of course, more stringent BedGraphs have fewer DML, but may be more believable due to having a more robust set of data.

Interestingly, the two analyses reveal that a single contig contains the majority of DML, all within a 1000bp range.

Will continue to examine this data by examining Bismark BedGraphs in IGV, and running some additional MethylKit analysis looking at differentially modified regions(DMRs) to see what we can gleen from this.

DNA Methylation Analysis – Olympia Oyster Whole Genome BSseq Bismark Pipeline Comparison

0000-0002-2747-368X

Ran Bismark using our high performance computing (HPC) node, Mox, with two different bowtie2 settings:

Default settings
–score_min L,0,-0.6

The second setting is a bit less stringent than the default settings and should result in a higher percentage of reads mapping. However, not entirely sure what the actual implications will be (if any) for interpreting the resulting data.

Input data was previously trimmed per Bismark’s recommendation for Illumina TruSeq libraries (TrimGalore! 5′ 10bp):

Sam’s Notebook 20180830

List of input files and Bismark configurations can be seen in the SLURM scripts:

RESULTS

Output folders:

DNA Methylation Analysis – Olympia oyster BSseq MethylKit Analysis

0000-0002-2747-368X

NOTE: IMPORTANT CAVEATS – READ POST BEFORE USING DATA.

I’m posting this for posterity and to provide an overview of what (and whatnot) to do. Plus, this has a good R script for using MethylKit that can be used for subsequent analyses.

The goal of this analysis was to compare the methylation profiles of Olympia oysters originating from a common population (Fidalgo Bay) that were raised in two different locations (Fidalgo Bay & Oyster Bay).

An overview of the experiment can be viewed here:

GitHub Wiki: Whole-genome-BSseq-December-2015

I previously ran all of Olympia oyster DNA methylation sequencing data through the Bismark pipeline, and then processed them using the MethylKit R library.

First mistake (Bismark):

Trimmed FastQ files “incorrectly”.

Bismark provides an excellent user guide and provides a handy table on how to decide on trimming parameters, but I mistakenly trimmed these according to the recommendations for a different library preparation technique. I trimmed based on the Zymo Pico-Methyl Kit (which was used for the other group of data that I processed simultaneously), instead of the TruSeq library prep.

So, “incorrectly” isn’t necessarily the proper term here. The analysis can still be used, however, it’s likely that the excessive trimming results in reducing sequencing coverage, and, in turn, making the downstream analysis result in a highly conservative output. Thus, the data isn’t wrong or bad, it is just very limited.

And, this leads to the second mistake (Bismark):

Bowtie alignment score too strict

There’s a bit of a weird “battle” between Bismark and bowtie2. Bismark uses bowtie2 for generating alignments, but bowtie2’s default cutoff score overrides Bismark’s. So, to adjust the score value, you have to explicitly add the scoring parameters to your Bismark parameters during the alignment step. I did not do this.

Again, it’s not wrong, per se, but leads to a significantly limited set of data in the final analysis.

The data were analyzed based on a minimum of:

3x coverage
25% difference in methylation

RESULTS:

Methylkit analysis (R project; GitHub):

20180827_oly_methylkit

BedGraph file (BED):

OlyFbOb_3xCov_25percentDiff.bed

The analysis resulted in a total of seven (yes, 7) differentially methylated loci (DML) between the two groups. It was this result that made Steven and me revisit the initial Bismark analysis. He has done this previously (but differently) and gotten significantly greater numbers of DML.

Knowing all of this, I will re-trim the data and adjust Bismark alignment score thresholds and then re-analyze with MethylKit.

Regardless here’re some plots to add some visual flair to this notebook entry (these, and more, are available in the GitHub repo):

CLUSTERING DENDROGRAM

PCA PLOT

DNA Methylation Analysis – Bismark Pipeline on All Olympia oyster BSseq Datasets

0000-0002-2747-368X

Bismark analysis of all of our current Olympia oyster (Ostrea lurida) DNA methylation high-throughput sequencing data.

Analysis was run on Emu (Ubuntu 16.04LTS, Apple Xserve). The primary analysis took ~14 days to complete.

All operations are documented in a Jupyter notebook (GitHub):

20180709_emu_oly_methylation_mapping.ipynb

Genome used:

Olurida_v080.fa ( run was initiated prior to creation of v081; see Genomic Resources wiki for more info )

Input files ( see Olympia oyster Genomic GitHub wiki for more info ):

WG BSseq of Fidalgo Bay offspring grown in Fidalgo Bay & Oyster Bay

1_ATCACG_L001_R1_001.fastq.gz
2_CGATGT_L001_R1_001.fastq.gz
3_TTAGGC_L001_R1_001.fastq.gz
4_TGACCA_L001_R1_001.fastq.gz
5_ACAGTG_L001_R1_001.fastq.gz
6_GCCAAT_L001_R1_001.fastq.gz
7_CAGATC_L001_R1_001.fastq.gz
8_ACTTGA_L001_R1_001.fastq.gz

MBDseq of two populations (Hood Canal & Oyster Bay) grown in Clam Bay

zr1394_10_s456.fastq.gz
zr1394_11_s456.fastq.gz
zr1394_12_s456.fastq.gz
zr1394_13_s456.fastq.gz
zr1394_14_s456.fastq.gz
zr1394_15_s456.fastq.gz
zr1394_16_s456.fastq.gz
zr1394_17_s456.fastq.gz
zr1394_18_s456.fastq.gz
zr1394_1_s456.fastq.gz
zr1394_2_s456.fastq.gz
zr1394_3_s456.fastq.gz
zr1394_4_s456.fastq.gz
zr1394_5_s456.fastq.gz
zr1394_6_s456.fastq.gz
zr1394_7_s456.fastq.gz
zr1394_8_s456.fastq.gz
zr1394_9_s456.fastq.gz

RESULTS:

With Bismark complete, these two sets of analyses can now be looked into further (and separately, as they are separate experiments) using things like MethylKit (R package) and
the Integrative Genomics Viewer (IGV).

Output folder:

owl/Athaliana/20180709_oly_methylseq

Bismark Summary Report:

20180709_oly_methylseq/bismark_summary_report.html

Individual Sample Reports:

DNA Methylation Quantification – Acropora cervicornis (Staghorn coral) DNA from Javier Casariego (FIU)

0000-0002-2747-368X

Used the MethylFlash Methylated DNA Quantification Kit (Colorimetric) from Epigentek to quantify methylation in these coral DNA samples.

All samples were run in duplicate <em>except</em> 2h Block 1 due to insufficient DNA.

The following samples were used in a 1:10 dilution (2uL DNA : 18uL NanoPure H2O), due to their relatively high concentrations, to ensure accurate pipetting:

72h Block 4
D14 Block 1
D14 Block 2
D14 Block 3
D14 Block 4
D14 Block 5
D14 Block 6
D14 Block 8
D14 Block 10

All samples were diluted to a final concentration of 9.645ng/uL (154.24ng total; 17.6uL) in NanoPure water, which is equal to 77.12ng of DNA per assay replicate. These numbers were chosen based off of the sample with the lowest concentration.

The following samples were used in their entirety:

2h Block 8
D35 Block 8

Calculations were added to the spreadsheet provided by Javier (Google Sheet): A.cervicornis_DNA_Extractions(May_2017).xlsx

The spreadsheet became overly complicated because I initially forgot to account for the need to run each sample in duplicate.

The kit reagent dilutions were as follows:

Diluted ME1: 52mL of ME1 + 464mL of <em>distilled</em> water
Diluted ME4: 10uL of ME4 + 10uL of TE Buffer (pH=8.0; made by me on 20130408).
Standard curve: Prepped per instruction manual, with double volumes for two plates.
Diluted ME5: 50uL/well x 152well = 7600uL; 7600uL/1000 = 7.6uL; 7.6uL ME5 + 7592.4uL Diluted ME1
Diluted ME6: 50uL/well x 152well = 7600uL; 7600uL/2000 = 3.8uL; 3.8uL ME6 + 7596.2uL Diluted ME1
Diluted ME7: 50uL/well x 152well = 7600uL; 7600uL/5000 = 1.52uL; 1.52uL ME7 + 7598.48uL Diluted ME1

All diluted solutions were stored on ice for duration of procedure.

The remaining Diluted ME1 solution was stored at 4C (FTR 209), and is stable for 6 months, per the manufacturer’s instructions.

See the Results section below for plate layouts.

Plates were read at 450nm on the Seeb Lab Victor 1420 Plate Reader (Perkin Elmer) and the amount of DNA methylation was determined.

Results:

Individual sample methylation quantification (Google Sheet): A.cervicornis_DNA_Extractions(May_2017).xlsx

Plate Reader Output File Plate #1 (Google Sheet): 20170511_coral_DNA_methylation_plate01.xls

Plate Reader Output File Plate #2 (Google Sheet): 20170511_coral_DNA_methylation_plate02.xls

I’m not familiar with the experimental design, so I’m not going to spend time handling any of the in-depth analysis at this point in time. However, here’s the background on how methylation quantification and percent methylation were determined.

Mean absorbance (450nm) was determined for all samples and standard curve samples. It’s important to note that the standard deviation between replicates was not evaluated and there appears to be consistent variability between samples, but I’m not certain how much variation is “acceptable” with and assay of this nature.
The mean absorbance of the standard curve samples were plotted against their corresponding DNA amounts and a linear trendline was fitted to the points.
Per the manufacturer’s recommendations, the four points (including the zero point) that yielded the best linear fit (i.e. best R^2 value) were used and the slope of best fit line for those four points was determined.
This slope was then utilized in the equation provided by the manufacturer (see pg. 8 of the MethylFlash Kit manual).

DNA Methylation Quantification – Coral DNA from Jose M. Eirin-Lopez (Florida International University)

0000-0002-2747-368X

Ran the coral DNA I quantified on 20160630 through the MethylFlash Methylated DNA Quantification Kit [Colorimetric] (Epigentek) kit to quantify global methylation.

Used 100ng of DNA per 8uL per replicate (x2 replicates = total 200ng in 16uL). Calcs are here (Google Sheet): 20160705_coral_DNA_methylation_calcs

Manufacturer’s protocol was followed.

Dilutions of kit reagents:

ME5 (1:1000) 2.6uL ME5 + 2597.4uL diluted ME1

ME6 (1:2000) 1.3uL ME6 + 2598.7uL diluted ME1

ME7 (1:5000) 0.52uL ME7 + 2599.48uL diluted ME1

Samples were quantified on the Seeb’s plate reader @ 450nm (Wallac 1420 Victor 2 [Perkin Elmer])

Results:

Google Sheet: 20160707_coral_DNA_methylflash

sample	treatment	5-mC(ng)
H1_1	nitrogen	0.8712248853
H1_10	nitrogen	0.6917168368
H1_12	control	0.2738478893
H1_5	nitrogen & phosphorous	0.9663585942
H1_6	control	0.6494783825
H1_8	nitrogen & phosphorous	0.4244913398
H24_1	nitrogen	0.372603297
H24_10	nitrogen	0.4237237786
H24_12	control	0.5350511937
H24_5	nitrogen & phosphorous	0.1495527697
H24_6	control	0.2291900804
H24_8	nitrogen & phosphorous	0.2213437801
H5_1	nitrogen	-0.1233169902
H5_10	nitrogen	0.6997668774
H5_12	control	0.2307000493
H5_5	nitrogen & phosphorous	-0.07790933048
H5_6	control	0.4562401662
H5_8	nitrogen & phosphorous	0.5949647121

Overall, it’s difficult to really interpret these results. I believe the data is a time course (e.g. H5 = hour 5, H24 = hour 24). Additionally, looking at treatments, there appear to be replicates, but it’s not clear what type of replicates they are (i.e. technical or biological). Generally, it seems like the control samples have lower quantities of methylated DNA than the treated samples. However, this doesn’t hold true for all three of the groups.

And, not that it really matters, but I don’t even know what species this is…

In any case, this was an attempt to gather some preliminary data for a grant that Steven is attempting to put together, so the original experiment and the subsequent data aren’t as robust as one would expect for a full-blown research project.

MeDIP – SB/WB Fragmented gDNA EtOH precipitation (continued from 20100702)

0000-0002-2747-368X

Finished EtOH precipitation of MeDIP gDNA. Samples were pelleted 16,000g, 4C, 30mins. Supe was discarded. Washed with 1mL 70% EtOH, pelleted 16,000g, 4C, 15mins. Supe discarded. MeDIP DNA was resuspended in 100uL of TE (pH = 8.5). Wash samples, containing unmethylated DNA, were resuspended/combined in a total of 100uL TE (pH = 8.5). Samples were spec’d:

Results:

R37: MeDIP DNA = 1.393ug recovery. This is ~13% of the input total gDNA (11.25ug) and is ~28% of the total DNA recovered in the procedure (4.935ug). Unmethylated DNA = 3.542ug total recovery. This is ~31% of the input total gDNA (11.25ug) and is ~72% of the total DNA recovered in the procedure (4.935ug). Total DNA recovery = ~44%.

R51: MeDIP DNA = 1.256ug recovery. This is ~14% of the input total gDNA (8.75ug) and is ~23% of the total DNA recovered in the procedure (5.462ug). Unmethylated DNA = 4.206ug total recovery. This is ~48% of the input total gDNA (8.75ug) and is ~77% of the total DNA recovered in the procedure (5.462ug). Total DNA recovery = ~62%.

There definitely seemed to be a high degree of salt carryover from the procedure, despite the phenol:chloroform treatment and EtOH precipitation. As such, I believe this is the reason that the 260/230 ratios are so out of whack. Possibly explains why the 260/280 ratios for the MeDIP DNA are so high, too?

These results demonstrate what we can expect to recover from this procedure, as well as how much DNA gets lost during processing. MeDIP DNA and unmethylated DNA were stored @ -20C.

DNA Methylation Test – Gigas site gDNA (BB & DH) from 20090515

0000-0002-2747-368X

Used BB & DH samples #11-17 for procedure. Followed Epigentek’s protocol. My calcs for dilutions/solutions needed are here. All solutions were made fresh before using, except for Diluted GU1 which was made at the beginning of the procedure and stored on ice in a 50mL conical wrapped in aluminum foil. Used 100ng total (50ng/uL) of each sample gDNA. No standards for a standard curve based on speaking with Mac.

WELL	SAMPLE	WELL	SAMPLE
A01	BB11	A02	DH11
B01	BB12	B02	DH12
C01	BB13	C02	DH13
D01	BB14	D02	DH14
E01	BB15	E02	DH15
F01	BB16	F02	DH16
G01	BB17	G02	DH17
H01	Pos. Control	H02	Blank

Results: Above is the graph of the results. Although it’s only a small difference between the two sites, it is statistically significant. The calcs for this graph can be found here (Excel file). It should be noted that this graph was generated using estimated values from the standard curve provided in the manufacturer’s protocol. This was done because 1) I did not run standards to generate my own curve and 2) calculating the “% methylation” not using the formula that utilizes the standard curve was giving ridiculously high values (e.g. 350%).

Here is the raw data generated by the plate reader for a 1s read (Excel file) and a 0.1s (Excel file) read. Both reads have nearly identical values.

Sam's Notebook

University of Washington – Fishery Sciences – Roberts Lab

Tag Archives: methylation

DNA Methylation Analysis – Olympia Oyster Whole Genome BSseq Bismark Pipeline MethylKit Comparison

Default Bismark settings:

“Relaxed” Bismark settings

RESULTS

Default Bismark settings:

“Relaxed” Bismark settings:

DNA Methylation Analysis – Olympia Oyster Whole Genome BSseq Bismark Pipeline Comparison

RESULTS

DNA Methylation Analysis – Olympia oyster BSseq MethylKit Analysis

NOTE: IMPORTANT CAVEATS – READ POST BEFORE USING DATA.

I’m posting this for posterity and to provide an overview of what (and whatnot) to do. Plus, this has a good R script for using MethylKit that can be used for subsequent analyses.

RESULTS:

CLUSTERING DENDROGRAM

PCA PLOT

DNA Methylation Analysis – Bismark Pipeline on All Olympia oyster BSseq Datasets

WG BSseq of Fidalgo Bay offspring grown in Fidalgo Bay & Oyster Bay

MBDseq of two populations (Hood Canal & Oyster Bay) grown in Clam Bay

RESULTS:

DNA Methylation Quantification – Acropora cervicornis (Staghorn coral) DNA from Javier Casariego (FIU)

DNA Methylation Quantification – Coral DNA from Jose M. Eirin-Lopez (Florida International University)

MeDIP – SB/WB Fragmented gDNA EtOH precipitation (continued from 20100702)

DNA Methylation Test – Gigas site gDNA (BB & DH) from 20090515