BS-seq and SNP Analysis - Nextflow EpiDiverse Pipelines Trials and Tribulations

Alrighty, this notebook entry is going to have a lot to unpack, as the process to get these pipelines running and then deal with the actual data we wanted to run them with was quite involved. However, the TL;DR of this all is this:

  • Both EpiDiverse pipelines (wgbs and snp) are running properly on our computer, Raven.

  • The Ostrea lurida (Olympia oyster) data which Steven wanted to identify SNPs in is possibly screwy?

Anyway, now to the meat of everything! If not interested in all the ins/outs, skip down to the Results section to see the various comparisons which were run. This analysis was spurred by this GitHub Issue. Steven wanted to run some Ostrea lurida (Olympia oyster) MBD BSseq data (from December 2015). These data are single-end, 50bp FastQs from ZymoResearch, with files named zr1394*. They were trimmed with TrimGalore on 20180503. Steven ran the trimmed data through Bismark and created deduplicated, sorted BAMs on 2020205. These BAMs were created using our Olurida_v081.fa.

Attempted to run the deduplicated, sorted BAMs through the EpiDiverse/snp Nextflow pipeline, but repeatedly encountered a memory error. Memory limitations were also an issue Steven had also encountered when trying to do the same anlalysis using BS-Snper. So, Steven created a reduced genome and ran the data through Bismark using that genome.

Was finally able to run those BAMs through the EpiDiverse/snp Nextflow pipeline, but the “Substitutions” plot looked like this:

bar plot of SNP substitutions showing a large bar in only C>T SNPs and nothing in other types of substitutions

These results didn’t look like what we’d expected (expected a more equal distribution on SNPs…); which triggered a rabbit hole of exploration and testing. Performed the following runs to see if I could identify the source of this skewed SNP distribution:


RESULTS

Summary in table form might be the easiest way to present this:

Oly Bismark Geoduck Bismark Oly EpiDiverse Adaptor Trim Oly EpiDiverse Adaptor and 10bp 5’ Trim Oly EpiDiverse Adaptor and 10bp 5’/3’ trim
Substitions barplot showing almost only C>T subsitutions Subsitutions barplot showing expected distribution of SNPs Substituions barplot showing better, but still skewed SNP distributions Substitutions barplot showing similar skewing to default EpiDiverse trimming Substitutions barplot showing very similar skew as the original Oly Bismark plot