Tag Archives: MBD-Seq

TrimGalore/FastQC/MultiQC – Trim 10bp 5’/3′ ends C.virginica MBD BS-seq FASTQ data

Steven found out that the Bismarck documentation (Bismarck is the bisulfite aligner we use in our BS-seq pipeline) suggests trimming 10bp from both the 5′ and 3′ ends. Since this is the next step in our pipeline, we figured we should probably just follow their recommendations!

TrimGalore job script:

Standard error was redirected on the command line to this file:

MD5 checksums were generated on the resulting trimmed FASTQ files:

All data was copied to my folder on Owl.

Checksums for FASTQ files were verified post-data transfer (data not shown).

Results:

Output folder:

FastQC output folder:

MultiQC output folder:

MultiQC HTML report:

Hey! Look at that! Everything is much better! Thanks for the excellent documentation and suggestions, Bismarck!

TrimGalore/FastQC/MultiQC – 2bp 3′ end Read 1s Trim C.virginica MBD BS-seq FASTQ data

Earlier today, I ran TrimGalore/FastQC/MultiQC on the Crassostrea virginica MBD BS-seq data from ZymoResearch and hard trimmed the first 14bp from each read. Things looked better at the 5′ end, but the 3′ end of each of the READ1 seqs showed a wonky 2bp blip, so decided to trim that off.

I ran TrimGalore (using the built-in FastQC option), with a hard trim of the last 2bp of each first read set that had previously had the 14bp hard trim and followed up with MultiQC for a summary of the FastQC reports.

TrimGalore job script:

Standard error was redirected on the command line to this file:

MD5 checksums were generated on the resulting trimmed FASTQ files:

All data was copied to my folder on Owl.

Checksums for FASTQ files were verified post-data transfer (data not shown).

Results:

Output folder:

FastQC output folder:

MultiQC output folder:

MultiQC HTML report:

Well, this is a bit strange, but the 2bp trimming on the read 1s looks fine, but now the read 2s are weird in the same region!

Regardless, while this was running, Steven found out that the Bismarck documentation (Bismarck is the bisulfite aligner we use in our BS-seq pipeline) suggests trimming 10bp from both the 5′ and 3′ ends. So, maybe this was all moot. I’ll go ahead and re-run this following the Bismark recommendations.

TrimGalore/FastQC/MultiQC – 14bp Trim C.virginica MBD BS-seq FASTQ data

Yesterday, I ran TrimGalore/FastQC/MultiQC on the Crassostrea virginica MBD BS-seq data from ZymoResearch with the default settings (i.e. “auto-trim”). There was still some variability in the first ~15bp of the reads and Steven wanted to see how a hard trim would change things.

I ran TrimGalore (using the built-in FastQC option), with a hard trim of the first 14bp of each read and followed up with MultiQC for a summary of the FastQC reports.

TrimGalore job script:

Standard error was redirected on the command line to this file:

MD5 checksums were generated on the resulting trimmed FASTQ files:

All data was copied to my folder on Owl.

Checksums for FASTQ files were verified post-data transfer (data not shown).

Results:

Output folder:

FastQC output folder:

MultiQC output folder:

MultiQC HTML report:

OK, this trimming definitely took care of the variability seen in the first ~15bp of all the reads.

However, I noticed that the last 2bp of each of the Read 1 seqs all have some wonky stuff going on. I’m guessing I should probably trim that stuff off, too…

TrimGalore/FastQC/MultiQC – Auto-trim C.virginica MBD BS-seq FASTQ data

Yesterday, I ran FastQC/MultiQC on the Crassostrea virginica MBD BS-seq data from ZymoResearch. Steven wanted to trim it and see how things turned out.

I ran TrimGalore (using the built-in FastQC option) and followed up with MultiQC for a summary of the FastQC reports.

TrimGalore job script:

Standard error was redirected on the command line to this file:

MD5 checksums were generated on the resulting trimmed FASTQ files:

All data was copied to my folder on Owl.

Checksums for FASTQ files were verified post-data transfer.

Results:

Output folder:

FastQC output folder:

MultiQC output folder:

MultiQC HTML report:

Overall, the auto-trim didn’t alter things too much. Specifically, Steven is concerned about the variability in the first 15bp (seen in the Per Base Sequence Content section of the MultiQC output). It was reduced, but not greatly. Will perform an independent run of TrimGalore and employ a hard trim of the first 14bp of each read and see how that looks.

FastQC/MultiQC – C. virginica MBD BS-seq Data

Per Steven’s GitHub Issues request, I ran FastQC on the Eastern oyster MBD bisulfite sequencing data we recently got back from ZymoResearch.

Ran FastQC locally with the following script: 20180409_fastqc_Cvirginica_MBD.sh


#!/bin/bash
/home/sam/software/FastQC/fastqc \
--threads 18 \
--outdir /home/sam/20180409_fastqc_Cvirginica_MBD \
/mnt/owl/nightingales/C_virginica/zr2096_10_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_10_s1_R2.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_1_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_1_s1_R2.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_2_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_2_s1_R2.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_3_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_3_s1_R2.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_4_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_4_s1_R2.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_5_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_5_s1_R2.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_6_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_6_s1_R2.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_7_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_7_s1_R2.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_8_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_8_s1_R2.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_9_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_9_s1_R2.fastq.gz

MultiQC was then run on the FastQC output files.

All files were moved to Owl after the jobs completed.

Results:

FastQC Output folder: 20180409_fastqc_Cvirginica_MBD/

MultiQC Output folder: 20180409_fastqc_Cvirginica_MBD/multiqc_data/

MultiQC report (HTML): 20180409_fastqc_Cvirginica_MBD/multiqc_data/multiqc_report.html

Everything looks good to me.

Steven’s interested in seeing what the trimmed output would look like (and, how it would impact mapping efficiencies). Will initiate trimming.

See the GitHub issue linked above for the full discussion.

Data Management – Concatenate FASTQ files from Oly MBDseq Project

Steven requested I concatenate the MBDseq files we received for this project:

  • concatenate the s4, s5, s6 file sets for each individual

  • concatenate the full file sets for each individual

Ran the concatenations in the Jupyter (iPython) notebook below. All files were saved to Owl/nightingales/O_lurida/2016

Jupyter Notebook: 20160411_Concatenate_Oly_MBDseq.ipynb

NBviewer: 20160411_Concatenate_Oly_MBDseq

Data Received – Ostrea lurida MBD-enriched BS-seq

Received the Olympia oyster, MBD-enriched BS-seq sequencing files (50bp, single read) from ZymoResearch (submitted 20151208). Here’s the sample list:

  • E1_hc1_2B
  • E1_hc1_4B
  • E1_hc2_15B
  • E1_hc2_17
  • E1_hc3_1
  • E1_hc3_5
  • E1_hc3_7
  • E1_hc3_10
  • E1_hc3_11
  • E1_ss2_9B
  • E1_ss2_14B
  • E1_ss2_18B
  • E1_ss3_3B
  • E1_ss3_14B
  • E1_ss3_15B
  • E1_ss3_16B
  • E1_ss3_20
  • E1_ss5_18

 

The 18 samples listed above had previously been MBD-enriched and then sent to ZymoResearch for bisulfite conversion, multiplex library construction, and subsequent sequencing. The library (multiplex of all samples) was sequenced in a single lane, three times. Thus, we would expect 54 FASTQ files. However, ZymoResearch was dissatisfied with the QC of the initial sequencing run (completed on 20160129), so they re-ran the samples (completed on 20160202). This created two sets of data, resulting in a total of 108 FASTQ files.

ZymoResearch data portal does not allow bulk download of files. However, I ended up using Chrono Download Manager extension for Google Chrome to allow for automated downloading of each file (per ZymoResearch recommendation).

After download, the files were moved to their permanent storage location on Owl: http://owl.fish.washington.edu/nightingales/O_lurida/20160203_mbdseq

The readme.md file was updated to include project/file information.

The file manipulations were performed in a Jupyter notebook (see below).

 

Total reads generated for this project: 1,481,836,875

 

Jupyter Notebook file: 20160203_Olurida_Zymo_Data_Handling.ipynb

Notebook Viewer: 20160203_Olurida_Zymo_Data_Handling.ipynb

DNA Quantification – MBD-enriched Olympia oyster DNA

Quantified the MBD enriched samples prepped over the last two days: MBD enrichment, EtOH precipiation.

Samples were quantified using the QuantIT dsDNA BR Kit (Invitrogen) according to the manufacturer’s protocol.

Standards were run in triplicate, samples were run in duplicate.

96-well black (opaque) plate was used.

Fluorescence was measured on the Seeb Lab’s Victor 1420 plate reader (Perkin Elmer).

Results:

Google Sheet: 20151123_MBD_libraries_quantification

Standard curve looked good – R² = 0.999

MBD recovery ranged from ~250 – 600ng.

MBD percent recoveries ranged from ~2 – 20%. Input DNA quantities were taken from Katherine’s numbers (Google Sheet): Silliman-DNA-Samples

Will contact services about getting bisulfite Illumina sequencing performed.

Quality Trimming – LSU C.virginica Oil Spill MBD BS-Seq Data

Jupyter (IPython) Notebook: 20150414_C_virginica_LSU_Oil_Spill_Trimmomatic_FASTQC.ipynb

NBviewer: 20150414_C_virginica_LSU_Oil_Spill_Trimmomatic_FASTQC.ipynb

Trimmed FASTQC

NB3 No oil Index – ACAGTG

20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001_fastqc.html
20150414_trimmed_2112_lane1_ACAGTG_L001_R1_002_fastqc.html

NB6 No oil Index – GCCAAT

20150414_trimmed_2112_lane1_GCCAAT_L001_R1_001_fastqc.html
20150414_trimmed_2112_lane1_GCCAAT_L001_R1_002_fastqc.html

NB11 No oil Index – CAGATC

20150414_trimmed_2112_lane1_CAGATC_L001_R1_001_fastqc.html
20150414_trimmed_2112_lane1_CAGATC_L001_R1_002_fastqc.html
20150414_trimmed_2112_lane1_CAGATC_L001_R1_003_fastqc.html

HB2 25,000ppm oil Index – ATCACG

20150414_trimmed_2112_lane1_ATCACG_L001_R1_001_fastqc.html
20150414_trimmed_2112_lane1_ATCACG_L001_R1_002_fastqc.html
20150414_trimmed_2112_lane1_ATCACG_L001_R1_003_fastqc.html

HB16 25,000ppm oil Index – TTAGGC

20150414_trimmed_2112_lane1_TTAGGC_L001_R1_001_fastqc.html
20150414_trimmed_2112_lane1_TTAGGC_L001_R1_002_fastqc.html

HB30 25,000ppm oil Index – TGACCA

20150414_trimmed_2112_lane1_TGACCA_L001_R1_001_fastqc.html