Tag Archives: TrimGalore!

TrimGalore/FastQC/MultiQC – C.virginica Oil Spill MBDseq Concatenated Sequences

Previously concatenated and analyzed our Crassostrea virginica oil spill MBDseq data with FastQC.

We decided to try improving things by running them through TrimGalore! to remove adapters and poor quality sequences.

Processed the samples on Roadrunner (Apple Xserve; Ubuntu 16.04) using default TrimGalore! settings.

After trimming, TrimGalore! output was summarized using MultiQC. Trimmed FastQ files were then analyzed with FastQC and followed up with MultiQC.

Documented in Jupyter Notebook (see below).

Jupyter Notebook (GitHub):


RESULTS

TrimGalore! output folder:

TrimGalore! MultiQC report (HTML):

TrimGalore! FastQC output folder:

FastQC MultiQC report (HTML:

Overall, things look a bit better, but there are still some issues. Will likely eliminate sample 2112_lane_1_TGACCA from analysis and apply some additional sequence filtering, based on sequence length.


SEQUENCE CONTENT PLOT


SHORT SEQUENCE CONTAMINATION


FastQC/MultiQC/TrimGalore/MultiQC/FastQC/MultiQC – O.lurida WGBSseq for Methylation Analysis

I previously ran this data through the Bismark pipeline and followed up with MethylKit analysis. MethylKit analysis revealed an extremely low number of differentially methylated loci (DML), which seemed odd.

Steven and I met to discuss and compare our different variations on the analysis and decided to try out different tweaks to evaluate how they affect analysis.

I did the following tasks:

  1. Looked at original sequence data quality with FastQC.

  2. Summarized FastQC analysis with MultiQC.

  3. Trimmed data using TrimGalore!, trimming 10bp from 5′ end of reads (8bp is recommended by Bismark docs).

  4. Summarized trimming stats with MultiQC.

  5. Looked at trimmed sequence quality with FastQC.

  6. Summarized FastQC analysis with MultiQC.

This was run on the Univ. of Washington High Performance Computing (HPC) cluster, Mox.

Mox SBATCH submission script has all details on how the analyses were conducted:


RESULTS

Output folder:

Raw sequence FastQC output folder:

Raw sequence MultiQC report (HTML):

TrimGalore! output folder (trimmed FastQ files are here):

Trimming MultiQC report (HTML):

Trimmed FastQC output folder:

Trimmed MultiQC report (HTML):

TrimGalore/FastQC/MultiQC – TrimGalore! RRBS Geoduck BS-seq FASTQ data (directional)

Earlier this week, I ran TrimGalore!, but set the trimming, incorrectly – due to a copy/paste mistake, as --non-directional, so I re-ran with the correct settings.

Steven requested that I trim the Geoduck RRBS libraries that we have, in preparation to run them through Bismark.

These libraries were originally created by Hollie Putnam using the TruSeq DNA Methylation Kit (Illumina):

All analysis is documented in a Jupyter Notebook; see link below.

Overview of process:

  1. Run TrimGalore! with --paired and --rrbs settings.

  2. Run FastQC and MultiQC on trimmed files.

  3. Copy all data to owl (see Results below for link).

  4. Confirm data integrity via MD5 checksums.

Jupyter Notebook:


Results:
TrimGalore! output folder:
FastQC output folder:
MultiQC output folder:
MultiQC report (HTML):

TrimGalore/FastQC/MultiQC – TrimGalore! RRBS Geoduck BS-seq FASTQ data


20180516 – UPDATE!!

THIS WAS RUN WITH THE INCORRECT SETTING IN TRIMGALORE! --non-directional

WILL RE-RUN


Steven requested that I trim the Geoduck RRBS libraries that we have, in preparation to run them through Bismark.

These libraries were originally created by Hollie Putnam using the TruSeq DNA Methylation Kit (Illumina):

All analysis is documented in a Jupyter Notebook; see link below.

Overview of process:

  1. Copy EPI* FastQ files from owl/P_generosa to roadrunner.

  2. Confirm data integrity via MD5 checksums.

  3. Run TrimGalore! with --paired, --rrbs, and --non-directional settings.

  4. Run FastQC and MultiQC on trimmed files.

  5. Copy all data to owl (see Results below for link).

  6. Confirm data integrity via MD5 checksums.

Jupyter Notebook:


Results:
TrimGalore! output folder:
FastQC output folder:
MultiQC output folder:
MultiQC report (HTML):

BS-seq Mapping – Olympia oyster bisulfite sequencing: TrimGalore > FastQC > Bismark

Steven asked me to evaluate our methylation sequencing data sets for Olympia oyster.

According to our Olympia oyster genome wiki, we have the following two sets of BS-seq data:

All computing was conducted on our Apple Xserve: emu.

All steps were documented in this Jupyter Notebook (GitHub): 20180503_emu_oly_methylation_mapping.ipynb

NOTE: The Jupyter Notebook linked above is very large in size. As such it will not render on GitHub. It will need to be downloaded to a computer that can run Jupyter Notebooks and viewed that way.

Here’s a brief overview of what was done.

Samples were trimmed with TrimGalore and then evaluated with FastQC. MultiQC was used to generate a nice visual summary report of all samples.

The Olympia oyster genome assembly, pbjelly_sjw_01, was used as the reference genome and was prepared for use in Bismark:


/home/shared/Bismark-0.19.1/bismark_genome_preparation \
--path_to_bowtie /home/shared/bowtie2-2.3.4.1-linux-x86_64/ \
--verbose /home/sam/data/oly_methylseq/oly_genome/ \
2> 20180507_bismark_genome_prep.err

Bismark was run on trimmed samples with the following command:


/home/shared/Bismark-0.19.1/bismark \
--path_to_bowtie /home/shared/bowtie2-2.3.4.1-linux-x86_64/ \
--genome /home/sam/data/oly_methylseq/oly_genome/ \
-u 1000000 \
-p 16 \
--non_directional \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/1_ATCACG_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/2_CGATGT_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/3_TTAGGC_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/4_TGACCA_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/5_ACAGTG_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/6_GCCAAT_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/7_CAGATC_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/8_ACTTGA_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_10_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_11_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_12_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_13_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_14_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_15_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_16_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_17_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_18_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_1_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_2_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_3_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_4_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_5_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_6_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_7_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_8_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_9_s456_trimmed.fq.gz \
2> 20180507_bismark_02.err

Results:

TrimGalore output folder:

FastQC output folder:

MultiQC output folder:

MultiQC Report (HTML):

Bismark genome folder: 20180503_oly_genome_pbjelly_sjw_01_bismark/

Bismark output folder:


Whole genome BS-seq (2015)

Prep overview
  • Library prep: Roberts Lab
  • Sequencing: Genewiz
Bismark Report Mapping Percentage
1_ATCACG_L001_R1_001_trimmed_bismark_bt2_SE_report.txt 40.3%
2_CGATGT_L001_R1_001_trimmed_bismark_bt2_SE_report.txt 39.9%
3_TTAGGC_L001_R1_001_trimmed_bismark_bt2_SE_report.txt 40.2%
4_TGACCA_L001_R1_001_trimmed_bismark_bt2_SE_report.txt 40.4%
5_ACAGTG_L001_R1_001_trimmed_bismark_bt2_SE_report.txt 39.9%
6_GCCAAT_L001_R1_001_trimmed_bismark_bt2_SE_report.txt 39.6%
7_CAGATC_L001_R1_001_trimmed_bismark_bt2_SE_report.txt 39.9%
8_ACTTGA_L001_R1_001_trimmed_bismark_bt2_SE_report.txt 39.7%

MBD BS-seq (2015)

Prep overview
  • MBD: Roberts Lab
  • Library prep: ZymoResearch
  • Sequencing: ZymoResearch
Bismark Report Mapping Percentage
zr1394_1_s456_trimmed_bismark_bt2_SE_report.txt 33.0%
zr1394_2_s456_trimmed_bismark_bt2_SE_report.txt 34.1%
zr1394_3_s456_trimmed_bismark_bt2_SE_report.txt 32.5%
zr1394_4_s456_trimmed_bismark_bt2_SE_report.txt 32.8%
zr1394_5_s456_trimmed_bismark_bt2_SE_report.txt 35.2%
zr1394_6_s456_trimmed_bismark_bt2_SE_report.txt 35.5%
zr1394_7_s456_trimmed_bismark_bt2_SE_report.txt 32.8%
zr1394_8_s456_trimmed_bismark_bt2_SE_report.txt 33.0%
zr1394_9_s456_trimmed_bismark_bt2_SE_report.txt 34.7%
zr1394_10_s456_trimmed_bismark_bt2_SE_report.txt 34.9%
zr1394_11_s456_trimmed_bismark_bt2_SE_report.txt 30.5%
zr1394_12_s456_trimmed_bismark_bt2_SE_report.txt 35.8%
zr1394_13_s456_trimmed_bismark_bt2_SE_report.txt 32.5%
zr1394_14_s456_trimmed_bismark_bt2_SE_report.txt 30.8%
zr1394_15_s456_trimmed_bismark_bt2_SE_report.txt 31.3%
zr1394_16_s456_trimmed_bismark_bt2_SE_report.txt 30.7%
zr1394_17_s456_trimmed_bismark_bt2_SE_report.txt 32.4%
zr1394_18_s456_trimmed_bismark_bt2_SE_report.txt 34.9%

TrimGalore/FastQC/MultiQC – Trim 10bp 5’/3′ ends C.virginica MBD BS-seq FASTQ data

Steven found out that the Bismarck documentation (Bismarck is the bisulfite aligner we use in our BS-seq pipeline) suggests trimming 10bp from both the 5′ and 3′ ends. Since this is the next step in our pipeline, we figured we should probably just follow their recommendations!

TrimGalore job script:

Standard error was redirected on the command line to this file:

MD5 checksums were generated on the resulting trimmed FASTQ files:

All data was copied to my folder on Owl.

Checksums for FASTQ files were verified post-data transfer (data not shown).

Results:

Output folder:

FastQC output folder:

MultiQC output folder:

MultiQC HTML report:

Hey! Look at that! Everything is much better! Thanks for the excellent documentation and suggestions, Bismarck!

TrimGalore/FastQC/MultiQC – 2bp 3′ end Read 1s Trim C.virginica MBD BS-seq FASTQ data

Earlier today, I ran TrimGalore/FastQC/MultiQC on the Crassostrea virginica MBD BS-seq data from ZymoResearch and hard trimmed the first 14bp from each read. Things looked better at the 5′ end, but the 3′ end of each of the READ1 seqs showed a wonky 2bp blip, so decided to trim that off.

I ran TrimGalore (using the built-in FastQC option), with a hard trim of the last 2bp of each first read set that had previously had the 14bp hard trim and followed up with MultiQC for a summary of the FastQC reports.

TrimGalore job script:

Standard error was redirected on the command line to this file:

MD5 checksums were generated on the resulting trimmed FASTQ files:

All data was copied to my folder on Owl.

Checksums for FASTQ files were verified post-data transfer (data not shown).

Results:

Output folder:

FastQC output folder:

MultiQC output folder:

MultiQC HTML report:

Well, this is a bit strange, but the 2bp trimming on the read 1s looks fine, but now the read 2s are weird in the same region!

Regardless, while this was running, Steven found out that the Bismarck documentation (Bismarck is the bisulfite aligner we use in our BS-seq pipeline) suggests trimming 10bp from both the 5′ and 3′ ends. So, maybe this was all moot. I’ll go ahead and re-run this following the Bismark recommendations.

TrimGalore/FastQC/MultiQC – 14bp Trim C.virginica MBD BS-seq FASTQ data

Yesterday, I ran TrimGalore/FastQC/MultiQC on the Crassostrea virginica MBD BS-seq data from ZymoResearch with the default settings (i.e. “auto-trim”). There was still some variability in the first ~15bp of the reads and Steven wanted to see how a hard trim would change things.

I ran TrimGalore (using the built-in FastQC option), with a hard trim of the first 14bp of each read and followed up with MultiQC for a summary of the FastQC reports.

TrimGalore job script:

Standard error was redirected on the command line to this file:

MD5 checksums were generated on the resulting trimmed FASTQ files:

All data was copied to my folder on Owl.

Checksums for FASTQ files were verified post-data transfer (data not shown).

Results:

Output folder:

FastQC output folder:

MultiQC output folder:

MultiQC HTML report:

OK, this trimming definitely took care of the variability seen in the first ~15bp of all the reads.

However, I noticed that the last 2bp of each of the Read 1 seqs all have some wonky stuff going on. I’m guessing I should probably trim that stuff off, too…

TrimGalore/FastQC/MultiQC – Auto-trim C.virginica MBD BS-seq FASTQ data

Yesterday, I ran FastQC/MultiQC on the Crassostrea virginica MBD BS-seq data from ZymoResearch. Steven wanted to trim it and see how things turned out.

I ran TrimGalore (using the built-in FastQC option) and followed up with MultiQC for a summary of the FastQC reports.

TrimGalore job script:

Standard error was redirected on the command line to this file:

MD5 checksums were generated on the resulting trimmed FASTQ files:

All data was copied to my folder on Owl.

Checksums for FASTQ files were verified post-data transfer.

Results:

Output folder:

FastQC output folder:

MultiQC output folder:

MultiQC HTML report:

Overall, the auto-trim didn’t alter things too much. Specifically, Steven is concerned about the variability in the first 15bp (seen in the Per Base Sequence Content section of the MultiQC output). It was reduced, but not greatly. Will perform an independent run of TrimGalore and employ a hard trim of the first 14bp of each read and see how that looks.

TrimGalore!/FastQC/MultiQC – Illumina HiSeq Genome Sequencing Data Continued

The previous attempt at this was interrupted by a random glitch with our Mox HPC node.

I removed the last files processed by TrimGalore!, just in case they were incomplete. I updated the slurm script to process only the remaining files that had not been processed when the Mox glitch happened (including the files I deemed “incomplete”).

As in the initial run, I kept the option in TrimGalore! to automatically run FastQC on the trimmed output files.

TrimGalore! slurm script: 20180401_trim_galore_illumina_geoduck_hiseq_slurm.sh

MultiQC was run locally once the files were copied to Owl.

Results:

Job completed on 20180404.

Trimmed FASTQs: 20180328_trim_galore_illumina_hiseq_geoduck/

MD5 checksums: 20180328_trim_galore_illumina_hiseq_geoduck/checksums.md5

  • MD5 checksums were generated on Mox node and verified after copying to Owl.

Slurm output file: 20180401_trim_galore_illumina_geoduck_hiseq_slurm.sh

TrimGalore! output: 20180328_trim_galore_illumina_hiseq_geoduck/20180404_trimgalore_reports/

FastQC output: 20180328_trim_galore_illumina_hiseq_geoduck/20180328_fastqc_trimmed_hiseq_geoduck/

MultiQC output: 20180328_trim_galore_illumina_hiseq_geoduck/20180328_fastqc_trimmed_hiseq_geoduck/multiqc_data/

MultiQC HTML report: 20180328_trim_galore_illumina_hiseq_geoduck/20180328_fastqc_trimmed_hiseq_geoduck/multiqc_data/multiqc_report.html

Trimming completed and the FastQC results look much better than before.

Will proceed with full-blown assembly!