Tag Archives: trimming

TrimGalore/FastQC/MultiQC – C.virginica Oil Spill MBDseq Concatenated Sequences

Previously concatenated and analyzed our Crassostrea virginica oil spill MBDseq data with FastQC.

We decided to try improving things by running them through TrimGalore! to remove adapters and poor quality sequences.

Processed the samples on Roadrunner (Apple Xserve; Ubuntu 16.04) using default TrimGalore! settings.

After trimming, TrimGalore! output was summarized using MultiQC. Trimmed FastQ files were then analyzed with FastQC and followed up with MultiQC.

Documented in Jupyter Notebook (see below).

Jupyter Notebook (GitHub):

20180911_roadrunner_virginica_trimgalore.ipynb

RESULTS

TrimGalore! output folder:

20180911_virginica_oil_trimgalore_01/

TrimGalore! MultiQC report (HTML):

20180911_virginica_oil_trimgalore_01/multiqc_report.html

TrimGalore! FastQC output folder:

20180911_virginica_oil_trimgalore_01/20180911_virginica_oil_trimmed_fastqc/

FastQC MultiQC report (HTML:

20180911_virginica_oil_trimgalore_01/20180911_virginica_oil_trimmed_fastqc/multiqc_report.html

Overall, things look a bit better, but there are still some issues. Will likely eliminate sample 2112_lane_1_TGACCA from analysis and apply some additional sequence filtering, based on sequence length.

SEQUENCE CONTENT PLOT

SHORT SEQUENCE CONTAMINATION

FastQC/MultiQC/TrimGalore/MultiQC/FastQC/MultiQC – O.lurida WGBSseq for Methylation Analysis

0000-0002-2747-368X

I previously ran this data through the Bismark pipeline and followed up with MethylKit analysis. MethylKit analysis revealed an extremely low number of differentially methylated loci (DML), which seemed odd.

Steven and I met to discuss and compare our different variations on the analysis and decided to try out different tweaks to evaluate how they affect analysis.

I did the following tasks:

Looked at original sequence data quality with FastQC.
Summarized FastQC analysis with MultiQC.
Trimmed data using TrimGalore!, trimming 10bp from 5′ end of reads (8bp is recommended by Bismark docs).
Summarized trimming stats with MultiQC.
Looked at trimmed sequence quality with FastQC.
Summarized FastQC analysis with MultiQC.

This was run on the Univ. of Washington High Performance Computing (HPC) cluster, Mox.

Mox SBATCH submission script has all details on how the analyses were conducted:

20180830_oly_WGBSseq_trimming.sh

RESULTS

Output folder:

20180830_oly_WGBSseq_trimming/

Raw sequence FastQC output folder:

20180830_oly_WGBSseq_trimming/20180830_fastqc/

Raw sequence MultiQC report (HTML):

20180830_oly_WGBSseq_trimming/20180830_fastqc/multiqc_report.html

TrimGalore! output folder (trimmed FastQ files are here):

20180830_oly_WGBSseq_trimming/20180830_trimgalore/

Trimming MultiQC report (HTML):

20180830_oly_WGBSseq_trimming/20180830_trimgalore/multiqc_report.html

Trimmed FastQC output folder:

20180830_oly_WGBSseq_trimming/20180830_trimmed_fastqc/

Trimmed MultiQC report (HTML):

20180830_oly_WGBSseq_trimming/20180830_trimmed_fastqc/multiqc_report.html

BS-seq Mapping – Olympia oyster bisulfite sequencing: TrimGalore > FastQC > Bismark

0000-0002-2747-368X

Steven asked me to evaluate our methylation sequencing data sets for Olympia oyster.

According to our Olympia oyster genome wiki, we have the following two sets of BS-seq data:

All computing was conducted on our Apple Xserve: emu.

All steps were documented in this Jupyter Notebook (GitHub): 20180503_emu_oly_methylation_mapping.ipynb

NOTE: The Jupyter Notebook linked above is very large in size. As such it will not render on GitHub. It will need to be downloaded to a computer that can run Jupyter Notebooks and viewed that way.

Here’s a brief overview of what was done.

Samples were trimmed with TrimGalore and then evaluated with FastQC. MultiQC was used to generate a nice visual summary report of all samples.

The Olympia oyster genome assembly, pbjelly_sjw_01, was used as the reference genome and was prepared for use in Bismark:


/home/shared/Bismark-0.19.1/bismark_genome_preparation \
--path_to_bowtie /home/shared/bowtie2-2.3.4.1-linux-x86_64/ \
--verbose /home/sam/data/oly_methylseq/oly_genome/ \
2> 20180507_bismark_genome_prep.err

Bismark was run on trimmed samples with the following command:


/home/shared/Bismark-0.19.1/bismark \
--path_to_bowtie /home/shared/bowtie2-2.3.4.1-linux-x86_64/ \
--genome /home/sam/data/oly_methylseq/oly_genome/ \
-u 1000000 \
-p 16 \
--non_directional \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/1_ATCACG_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/2_CGATGT_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/3_TTAGGC_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/4_TGACCA_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/5_ACAGTG_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/6_GCCAAT_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/7_CAGATC_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/8_ACTTGA_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_10_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_11_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_12_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_13_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_14_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_15_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_16_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_17_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_18_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_1_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_2_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_3_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_4_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_5_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_6_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_7_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_8_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_9_s456_trimmed.fq.gz \
2> 20180507_bismark_02.err

Results:

TrimGalore output folder:

20180503_oly_methylseq_trimgalore

FastQC output folder:

20180503_oly_methylseq_trimgalore/20180503_trim_fastqc/

MultiQC output folder:

20180503_oly_methylseq_trimgalore/20180503_trim_fastqc/multiqc_data/

MultiQC Report (HTML):

20180503_oly_methylseq_trimgalore/20180503_trim_fastqc/multiqc_data/multiqc_report.html

Bismark genome folder: 20180503_oly_genome_pbjelly_sjw_01_bismark/

Bismark output folder:

20180507_oly_methylseq_bismark

Whole genome BS-seq (2015)

Prep overview

Library prep: Roberts Lab
Sequencing: Genewiz

Bismark Report	Mapping Percentage
1_ATCACG_L001_R1_001_trimmed_bismark_bt2_SE_report.txt	40.3%
2_CGATGT_L001_R1_001_trimmed_bismark_bt2_SE_report.txt	39.9%
3_TTAGGC_L001_R1_001_trimmed_bismark_bt2_SE_report.txt	40.2%
4_TGACCA_L001_R1_001_trimmed_bismark_bt2_SE_report.txt	40.4%
5_ACAGTG_L001_R1_001_trimmed_bismark_bt2_SE_report.txt	39.9%
6_GCCAAT_L001_R1_001_trimmed_bismark_bt2_SE_report.txt	39.6%
7_CAGATC_L001_R1_001_trimmed_bismark_bt2_SE_report.txt	39.9%
8_ACTTGA_L001_R1_001_trimmed_bismark_bt2_SE_report.txt	39.7%

MBD BS-seq (2015)

Prep overview

MBD: Roberts Lab
Library prep: ZymoResearch
Sequencing: ZymoResearch

Bismark Report	Mapping Percentage
zr1394_1_s456_trimmed_bismark_bt2_SE_report.txt	33.0%
zr1394_2_s456_trimmed_bismark_bt2_SE_report.txt	34.1%
zr1394_3_s456_trimmed_bismark_bt2_SE_report.txt	32.5%
zr1394_4_s456_trimmed_bismark_bt2_SE_report.txt	32.8%
zr1394_5_s456_trimmed_bismark_bt2_SE_report.txt	35.2%
zr1394_6_s456_trimmed_bismark_bt2_SE_report.txt	35.5%
zr1394_7_s456_trimmed_bismark_bt2_SE_report.txt	32.8%
zr1394_8_s456_trimmed_bismark_bt2_SE_report.txt	33.0%
zr1394_9_s456_trimmed_bismark_bt2_SE_report.txt	34.7%
zr1394_10_s456_trimmed_bismark_bt2_SE_report.txt	34.9%
zr1394_11_s456_trimmed_bismark_bt2_SE_report.txt	30.5%
zr1394_12_s456_trimmed_bismark_bt2_SE_report.txt	35.8%
zr1394_13_s456_trimmed_bismark_bt2_SE_report.txt	32.5%
zr1394_14_s456_trimmed_bismark_bt2_SE_report.txt	30.8%
zr1394_15_s456_trimmed_bismark_bt2_SE_report.txt	31.3%
zr1394_16_s456_trimmed_bismark_bt2_SE_report.txt	30.7%
zr1394_17_s456_trimmed_bismark_bt2_SE_report.txt	32.4%
zr1394_18_s456_trimmed_bismark_bt2_SE_report.txt	34.9%

Adapter Trimming and FASTQC – Illumina Geoduck Novaseq Data

0000-0002-2747-368X

We would like to get an assembly of the geoduck NovaSeq data that Illumina provided us with.

Steven previously ran the raw data through FASTQC and there was a significant amount of adapter contamination (up to 44% in some libraries) present (see his FASTQC report here).

So, I trimmed them using TrimGalore and re-ran FASTQC on them.

This required two rounds of trimming using the “auto-detect” feature of Trim Galore.

Round 1: remove NovaSeq adapters
Round 2: remove standard Illumina adapters

See Jupyter notebook below for the gritty details.

Results:

All data for this NovaSeq assembly project can be found here: http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/.

Round 1 Trim Galore reports: [20180125_trim_galore_reports/](http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180125_trim_galore_reports/]
Round 1 FASTQC: 20180129_trimmed_multiqc_fastqc_01
Round 1 FASTQC MultiQC overview: 20180129_trimmed_multiqc_fastqc_01/multiqc_report.html

Round 2 Trim Galore reports: 20180125_geoduck_novaseq/20180205_trim_galore_reports/
Round 2 FASTQC: 20180205_trimmed_fastqc_02/
Round 2 FASTQC MultiQC overview: 20180205_trimmed_multiqc_fastqc_02/multiqc_report.html

For the astute observer, you might notice the “Per Base Sequence Content” generates a “Fail” warning for all samples. Per the FASTQC help, this is likely expected (due to the fact that NovaSeq libraries are prepared using transposases) and doesn’t have any downstream impacts on analyses.

Jupyter Notebook (GitHub): 20180125_roadrunner_trimming_geoduck_novaseq.ipynb

Sam's Notebook

University of Washington – Fishery Sciences – Roberts Lab

Tag Archives: trimming

TrimGalore/FastQC/MultiQC – C.virginica Oil Spill MBDseq Concatenated Sequences

RESULTS

SEQUENCE CONTENT PLOT

SHORT SEQUENCE CONTAMINATION

FastQC/MultiQC/TrimGalore/MultiQC/FastQC/MultiQC – O.lurida WGBSseq for Methylation Analysis

RESULTS

BS-seq Mapping – Olympia oyster bisulfite sequencing: TrimGalore > FastQC > Bismark

Results:

Whole genome BS-seq (2015)

Prep overview

MBD BS-seq (2015)

Prep overview

Adapter Trimming and FASTQC – Illumina Geoduck Novaseq Data

Results: