Sam's Notebook » ZymoResearch

TrimGalore/FastQC/MultiQC – Trim 10bp 5’/3′ ends C.virginica MBD BS-seq FASTQ data

kubu4 — Wed, 11 Apr 2018 21:26:58 +0000

Steven found out that the Bismarck documentation (Bismarck is the bisulfite aligner we use in our BS-seq pipeline) suggests trimming 10bp from both the 5′ and 3′ ends. Since this is the next step in our pipeline, we figured we should probably just follow their recommendations!

TrimGalore job script:

20180410_trimgalore_trim14bp_Cvirginica_MDB.sh

Standard error was redirected on the command line to this file:

20180411_trimgalore_10bp_Cvirginica_MBD/stderr.log

MD5 checksums were generated on the resulting trimmed FASTQ files:

20180411_trimgalore_10bp_Cvirginica_MBD/checksums.md5

All data was copied to my folder on Owl.

Checksums for FASTQ files were verified post-data transfer (data not shown).

Results:

Output folder:

20180411_trimgalore_10bp_Cvirginica_MBD

FastQC output folder:

20180411_trimgalore_10bp_Cvirginica_MBD/20180411_fastqc_trim_10bp_Cvirginica_MBD

MultiQC output folder:

20180411_trimgalore_10bp_Cvirginica_MBD/20180411_fastqc_trim_10bp_Cvirginica_MBD/multiqc_data/

MultiQC HTML report:

20180411_trimgalore_10bp_Cvirginica_MBD/20180411_fastqc_trim_10bp_Cvirginica_MBD/multiqc_data/multiqc_report.html

Hey! Look at that! Everything is much better! Thanks for the excellent documentation and suggestions, Bismarck!

TrimGalore/FastQC/MultiQC – 2bp 3′ end Read 1s Trim C.virginica MBD BS-seq FASTQ data

kubu4 — Wed, 11 Apr 2018 00:00:01 +0000

0000-0002-2747-368X

Earlier today, I ran TrimGalore/FastQC/MultiQC on the Crassostrea virginica MBD BS-seq data from ZymoResearch and hard trimmed the first 14bp from each read. Things looked better at the 5′ end, but the 3′ end of each of the READ1 seqs showed a wonky 2bp blip, so decided to trim that off.

I ran TrimGalore (using the built-in FastQC option), with a hard trim of the last 2bp of each first read set that had previously had the 14bp hard trim and followed up with MultiQC for a summary of the FastQC reports.

TrimGalore job script:

20180410_trimgalore_trim14bp_Cvirginica_MDB.sh

Standard error was redirected on the command line to this file:

20180410_trimgalore_trim14bp5prim_2bp3prime_Cvirginica_MBD/stderr.log

MD5 checksums were generated on the resulting trimmed FASTQ files:

20180410_trimgalore_trim14bp5prim_2bp3prime_Cvirginica_MBD/checksums.md5

All data was copied to my folder on Owl.

Checksums for FASTQ files were verified post-data transfer (data not shown).

Results:

Output folder:

20180410_trimgalore_trim14bp5prim_2bp3prime_Cvirginica_MBD/

FastQC output folder:

20180410_trimgalore_trim14bp5prim_2bp3prime_Cvirginica_MBD/20180410_fastqc_trimgalore_14bp5prime_2bp3prime_Cvirginica_MBD/

MultiQC output folder:

20180410_trimgalore_trim14bp5prim_2bp3prime_Cvirginica_MBD/20180410_fastqc_trimgalore_14bp5prime_2bp3prime_Cvirginica_MBD/multiqc_data/

MultiQC HTML report:

20180410_trimgalore_trim14bp5prim_2bp3prime_Cvirginica_MBD/20180410_fastqc_trimgalore_14bp5prime_2bp3prime_Cvirginica_MBD/multiqc_data/multiqc_report.html

Well, this is a bit strange, but the 2bp trimming on the read 1s looks fine, but now the read 2s are weird in the same region!

Regardless, while this was running, Steven found out that the Bismarck documentation (Bismarck is the bisulfite aligner we use in our BS-seq pipeline) suggests trimming 10bp from both the 5′ and 3′ ends. So, maybe this was all moot. I’ll go ahead and re-run this following the Bismark recommendations.

TrimGalore/FastQC/MultiQC – 14bp Trim C.virginica MBD BS-seq FASTQ data

kubu4 — Tue, 10 Apr 2018 20:40:00 +0000

0000-0002-2747-368X

Yesterday, I ran TrimGalore/FastQC/MultiQC on the Crassostrea virginica MBD BS-seq data from ZymoResearch with the default settings (i.e. “auto-trim”). There was still some variability in the first ~15bp of the reads and Steven wanted to see how a hard trim would change things.

I ran TrimGalore (using the built-in FastQC option), with a hard trim of the first 14bp of each read and followed up with MultiQC for a summary of the FastQC reports.

TrimGalore job script:

20180410_trimgalore_trim14bp_Cvirginica_MDB.sh

Standard error was redirected on the command line to this file:

20180410_trimgalore_trim14bp_Cvirginica_MBD/stderr.log

MD5 checksums were generated on the resulting trimmed FASTQ files:

20180410_trimgalore_trim14bp_Cvirginica_MBD/checksums.md5

All data was copied to my folder on Owl.

Checksums for FASTQ files were verified post-data transfer (data not shown).

Results:

Output folder:

20180410_trimgalore_trim14bp_Cvirginica_MBD/

FastQC output folder:

20180410_trimgalore_trim14bp_Cvirginica_MBD/20180410_fastqc_trimgalore_trim14bp_Cvirginica_MBD/

MultiQC output folder:

20180410_trimgalore_trim14bp_Cvirginica_MBD/20180410_fastqc_trimgalore_trim14bp_Cvirginica_MBD/multiqc_data/

MultiQC HTML report:

20180410_trimgalore_trim14bp_Cvirginica_MBD/20180410_fastqc_trimgalore_trim14bp_Cvirginica_MBD/multiqc_data/multiqc_report.html

OK, this trimming definitely took care of the variability seen in the first ~15bp of all the reads.

However, I noticed that the last 2bp of each of the Read 1 seqs all have some wonky stuff going on. I’m guessing I should probably trim that stuff off, too…

FastQC/MultiQC – C. virginica MBD BS-seq Data

kubu4 — Mon, 09 Apr 2018 22:21:59 +0000

0000-0002-2747-368X

Per Steven’s GitHub Issues request, I ran FastQC on the Eastern oyster MBD bisulfite sequencing data we recently got back from ZymoResearch.

Ran FastQC locally with the following script: 20180409_fastqc_Cvirginica_MBD.sh


#!/bin/bash
/home/sam/software/FastQC/fastqc \
--threads 18 \
--outdir /home/sam/20180409_fastqc_Cvirginica_MBD \
/mnt/owl/nightingales/C_virginica/zr2096_10_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_10_s1_R2.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_1_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_1_s1_R2.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_2_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_2_s1_R2.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_3_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_3_s1_R2.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_4_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_4_s1_R2.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_5_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_5_s1_R2.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_6_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_6_s1_R2.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_7_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_7_s1_R2.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_8_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_8_s1_R2.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_9_s1_R1.fastq.gz \
/mnt/owl/nightingales/C_virginica/zr2096_9_s1_R2.fastq.gz

MultiQC was then run on the FastQC output files.

All files were moved to Owl after the jobs completed.

Results:

FastQC Output folder: 20180409_fastqc_Cvirginica_MBD/

MultiQC Output folder: 20180409_fastqc_Cvirginica_MBD/multiqc_data/

MultiQC report (HTML): 20180409_fastqc_Cvirginica_MBD/multiqc_data/multiqc_report.html

Everything looks good to me.

Steven’s interested in seeing what the trimmed output would look like (and, how it would impact mapping efficiencies). Will initiate trimming.

See the GitHub issue linked above for the full discussion.

Data Received – Crassostrea virginica MBD BS-seq from ZymoResearch

kubu4 — Thu, 29 Mar 2018 17:57:42 +0000

0000-0002-2747-368X

Received the sequencing data from ZymoResearch for the Crassostrea virginica gonad MBD DNA that was sent to them on 20180207 for bisulfite conversion, library construction, and sequencing.

Gzipped FASTQ files were:

downloaded to Owl/nightingales/C_virginica
MD5 checksums verified
MD5 checksums appended to the checksums.md5 file
readme.md file updated
Updated nightingales Google Sheet

Here’s the list of files received:

zr2096_10_s1_R1.fastq.gz
zr2096_10_s1_R2.fastq.gz
zr2096_1_s1_R1.fastq.gz
zr2096_1_s1_R2.fastq.gz
zr2096_2_s1_R1.fastq.gz
zr2096_2_s1_R2.fastq.gz
zr2096_3_s1_R1.fastq.gz
zr2096_3_s1_R2.fastq.gz
zr2096_4_s1_R1.fastq.gz
zr2096_4_s1_R2.fastq.gz
zr2096_5_s1_R1.fastq.gz
zr2096_5_s1_R2.fastq.gz
zr2096_6_s1_R1.fastq.gz
zr2096_6_s1_R2.fastq.gz
zr2096_7_s1_R1.fastq.gz
zr2096_7_s1_R2.fastq.gz
zr2096_8_s1_R1.fastq.gz
zr2096_8_s1_R2.fastq.gz
zr2096_9_s1_R1.fastq.gz
zr2096_9_s1_R2.fastq.gz

Here’s the sample processing history:

Ethanol Precipitation & DNA Quantification – C. virginica MBD DNA from Yaamini

kubu4 — Wed, 07 Feb 2018 20:36:23 +0000

0000-0002-2747-368X

Finished the ethanol precipitation as described in the MethylMiner (Invitrogen) manual which Yaamini had previously initiated: https://yaaminiv.github.io/Virginica-MBDSeq-Day4/

Samples were resuspended in 25μL of Buffer EB (Qiagen) and transferred to 0.5mL snap cap tubes. All tubes were labeled as: MBD CV #

Quantified the Crassostrea virginica MBD-enriched DNA with the Qubit 3.0 (ThermoFisher) and the Qubit dsDNA High Sensitivity (HS) Kit (ThermoFisher).

Used 1uL of template DNA.

Results:

Quantification Spreadsheet (Google Sheet):20180207_qubit_DNA_HS_MBD_virginica

One sample (MBD CV 106) may not be usable due to low yield. However, the remainder should work fine.

I’ve sent them all to ZymoResearch for bisulfite treatment, library construction, and Illumina sequencing.

FedEx tracking: 771429590026

Data Received – Ostrea lurida MBD-enriched BS-seq

kubu4 — Wed, 03 Feb 2016 22:47:09 +0000

0000-0002-2747-368X

Received the Olympia oyster, MBD-enriched BS-seq sequencing files (50bp, single read) from ZymoResearch (submitted 20151208). Here’s the sample list:

E1_hc1_2B
E1_hc1_4B
E1_hc2_15B
E1_hc2_17
E1_hc3_1
E1_hc3_5
E1_hc3_7
E1_hc3_10
E1_hc3_11
E1_ss2_9B
E1_ss2_14B
E1_ss2_18B
E1_ss3_3B
E1_ss3_14B
E1_ss3_15B
E1_ss3_16B
E1_ss3_20
E1_ss5_18

The 18 samples listed above had previously been MBD-enriched and then sent to ZymoResearch for bisulfite conversion, multiplex library construction, and subsequent sequencing. The library (multiplex of all samples) was sequenced in a single lane, three times. Thus, we would expect 54 FASTQ files. However, ZymoResearch was dissatisfied with the QC of the initial sequencing run (completed on 20160129), so they re-ran the samples (completed on 20160202). This created two sets of data, resulting in a total of 108 FASTQ files.

ZymoResearch data portal does not allow bulk download of files. However, I ended up using Chrono Download Manager extension for Google Chrome to allow for automated downloading of each file (per ZymoResearch recommendation).

After download, the files were moved to their permanent storage location on Owl: http://owl.fish.washington.edu/nightingales/O_lurida/20160203_mbdseq

The readme.md file was updated to include project/file information.

The file manipulations were performed in a Jupyter notebook (see below).

Total reads generated for this project: 1,481,836,875

Jupyter Notebook file: 20160203_Olurida_Zymo_Data_Handling.ipynb

Notebook Viewer: 20160203_Olurida_Zymo_Data_Handling.ipynb