Tag Archives: FASTQC

Data Received – Chionoecetes bairdi RNAseq & FastQC Analysis

We received Grace’s 100bp PE NovaSeq (Illumian) RNAseq data from the Northwest Genomics Center today.

Data was downloaded via their Aspera browser plugin and rsynced to:

Owl/nightingales/C_bairdi

MD5 checksums were generated (md5sum on Ubuntu):

checksums.md5


321ec408ba7e0f0be1929ca44871f963  304428_S1_L001_R1_001.fastq.gz
b95c69f755c9c42d9203429119d4234d  304428_S1_L001_R2_001.fastq.gz
a0fd8db312057dedd480231d4d125fd3  304428_S1_L002_R1_001.fastq.gz
c6e70ef7f3c8a866851a1b9453aef36a  304428_S1_L002_R2_001.fastq.gz

FastQC analysis was run, followed by MultiQC.

Output folder (gannet/Atumefaciens):

20181015_Cbairdi_fastqc/

MultiQC Report (HTML):

20181015_Cbairdi_fastqc/multiqc_report.html

Nightingales spreadsheet was updated with file info and FastQC info:

Nightingales (Google Sheet)

TrimGalore/FastQC/MultiQC – C.virginica Oil Spill MBDseq Concatenated Sequences

0000-0002-2747-368X

Previously concatenated and analyzed our Crassostrea virginica oil spill MBDseq data with FastQC.

We decided to try improving things by running them through TrimGalore! to remove adapters and poor quality sequences.

Processed the samples on Roadrunner (Apple Xserve; Ubuntu 16.04) using default TrimGalore! settings.

After trimming, TrimGalore! output was summarized using MultiQC. Trimmed FastQ files were then analyzed with FastQC and followed up with MultiQC.

Documented in Jupyter Notebook (see below).

Jupyter Notebook (GitHub):

20180911_roadrunner_virginica_trimgalore.ipynb

RESULTS

TrimGalore! output folder:

20180911_virginica_oil_trimgalore_01/

TrimGalore! MultiQC report (HTML):

20180911_virginica_oil_trimgalore_01/multiqc_report.html

TrimGalore! FastQC output folder:

20180911_virginica_oil_trimgalore_01/20180911_virginica_oil_trimmed_fastqc/

FastQC MultiQC report (HTML:

20180911_virginica_oil_trimgalore_01/20180911_virginica_oil_trimmed_fastqc/multiqc_report.html

Overall, things look a bit better, but there are still some issues. Will likely eliminate sample 2112_lane_1_TGACCA from analysis and apply some additional sequence filtering, based on sequence length.

SEQUENCE CONTENT PLOT

SHORT SEQUENCE CONTAMINATION

Sequencing Data Analysis – C.virginica Oil Spill MBDseq Concatenation & FastQC

0000-0002-2747-368X

Per Steven’s request, I concatenated our Crassostrea virginica LSU oil spill MBDseq sequencing data and ran FastQC on the concatenated files.

Here’s the list of input files:

2112_lane1_ACAGTG_L001_R1_001.fastq.gz 2112_lane1_ACAGTG_L001_R1_002.fastq.gz 2112_lane1_ATCACG_L001_R1_001.fastq.gz 2112_lane1_ATCACG_L001_R1_002.fastq.gz 2112_lane1_ATCACG_L001_R1_003.fastq.gz 2112_lane1_CAGATC_L001_R1_001.fastq.gz 2112_lane1_CAGATC_L001_R1_002.fastq.gz 2112_lane1_CAGATC_L001_R1_003.fastq.gz 2112_lane1_GCCAAT_L001_R1_001.fastq.gz 2112_lane1_GCCAAT_L001_R1_002.fastq.gz 2112_lane1_TGACCA_L001_R1_001.fastq.gz 2112_lane1_TTAGGC_L001_R1_001.fastq.gz 2112_lane1_TTAGGC_L001_R1_002.fastq.gz

All commands were run on roadrunner (Apple Xserve; Ubuntu 16.04). See Jupyter notebook below for details.

Jupyter notebook (GitHub):

20180910_roadrunner_virginica_fastqc.ipynb

RESULTS:

The concatenated gzip files and FastQC/MultiQC files are in the output folder linked below.

Output folder:

20180910_Cvirginica_oil_fastqc/

MultiQC report (HTML):

20180910_Cvirginica_oil_fastqc/multiqc_report.html

FastQC/MultiQC/TrimGalore/MultiQC/FastQC/MultiQC – O.lurida WGBSseq for Methylation Analysis

0000-0002-2747-368X

I previously ran this data through the Bismark pipeline and followed up with MethylKit analysis. MethylKit analysis revealed an extremely low number of differentially methylated loci (DML), which seemed odd.

Steven and I met to discuss and compare our different variations on the analysis and decided to try out different tweaks to evaluate how they affect analysis.

I did the following tasks:

Looked at original sequence data quality with FastQC.
Summarized FastQC analysis with MultiQC.
Trimmed data using TrimGalore!, trimming 10bp from 5′ end of reads (8bp is recommended by Bismark docs).
Summarized trimming stats with MultiQC.
Looked at trimmed sequence quality with FastQC.
Summarized FastQC analysis with MultiQC.

This was run on the Univ. of Washington High Performance Computing (HPC) cluster, Mox.

Mox SBATCH submission script has all details on how the analyses were conducted:

20180830_oly_WGBSseq_trimming.sh

RESULTS

Output folder:

20180830_oly_WGBSseq_trimming/

Raw sequence FastQC output folder:

20180830_oly_WGBSseq_trimming/20180830_fastqc/

Raw sequence MultiQC report (HTML):

20180830_oly_WGBSseq_trimming/20180830_fastqc/multiqc_report.html

TrimGalore! output folder (trimmed FastQ files are here):

20180830_oly_WGBSseq_trimming/20180830_trimgalore/

Trimming MultiQC report (HTML):

20180830_oly_WGBSseq_trimming/20180830_trimgalore/multiqc_report.html

Trimmed FastQC output folder:

20180830_oly_WGBSseq_trimming/20180830_trimmed_fastqc/

Trimmed MultiQC report (HTML):

20180830_oly_WGBSseq_trimming/20180830_trimmed_fastqc/multiqc_report.html

TrimGalore/FastQC/MultiQC – TrimGalore! RRBS Geoduck BS-seq FASTQ data (directional)

0000-0002-2747-368X

Earlier this week, I ran TrimGalore!, but set the trimming, incorrectly – due to a copy/paste mistake, as --non-directional, so I re-ran with the correct settings.

Steven requested that I trim the Geoduck RRBS libraries that we have, in preparation to run them through Bismark.

These libraries were originally created by Hollie Putnam using the TruSeq DNA Methylation Kit (Illumina):

project_juvenile_geoduck_OA/Sample_Processing (GitHub)

All analysis is documented in a Jupyter Notebook; see link below.

Overview of process:

Run TrimGalore! with --paired and --rrbs settings.
Run FastQC and MultiQC on trimmed files.
Copy all data to owl (see Results below for link).
Confirm data integrity via MD5 checksums.

Jupyter Notebook:

20180516_roadrunner_geoduck_RRBS_trimming.ipynb (GitHub)

Results:

FastQC – RRBS Geoduck BS-seq FASTQ data

0000-0002-2747-368X

Earlier today I finished trimming Hollie’s RRBS BS-seq FastQ data.

However, the original files were never analyzed with FastQC, so I ran it on the original files.

These libraries were originally created by Hollie Putnam using the TruSeq DNA Methylation Kit (Illumina):

project_juvenile_geoduck_OA/Sample_Processing (GitHub)

FastQC was run, followed by MultiQC. Analysis was run on Roadrunner.

All analysis is documented in a Jupyter Notebook; see link below.

Jupyter Notebook:

20180516_roadrunner_geoduck_EPI_fastqc

Results:

FastQC output folder:

20180516_geoduck_EPI_fastqc/

MultiQC output folder:

20180516_geoduck_EPI_fastqc/multiqc_data

MultiQC report (HTML):

multiqc_report.html

TrimGalore/FastQC/MultiQC – TrimGalore! RRBS Geoduck BS-seq FASTQ data

0000-0002-2747-368X

20180516 – UPDATE!!

THIS WAS RUN WITH THE INCORRECT SETTING IN TRIMGALORE! `--non-directional`

WILL RE-RUN

Steven requested that I trim the Geoduck RRBS libraries that we have, in preparation to run them through Bismark.

These libraries were originally created by Hollie Putnam using the TruSeq DNA Methylation Kit (Illumina):

project_juvenile_geoduck_OA/Sample_Processing (GitHub)

All analysis is documented in a Jupyter Notebook; see link below.

Overview of process:

Copy EPI* FastQ files from owl/P_generosa to roadrunner.
Confirm data integrity via MD5 checksums.
Run TrimGalore! with --paired, --rrbs, and --non-directional settings.
Run FastQC and MultiQC on trimmed files.
Copy all data to owl (see Results below for link).
Confirm data integrity via MD5 checksums.

Jupyter Notebook:

20180514_roadrunner_geoduck_RRBS_trimming.ipynb (GitHub)

Results:

TrimGalore! output folder:

20180514_geoduck_trimgalore_rrbs

FastQC output folder:

20180514_geoduck_trimgalore_rrbs/20180514_geoduck_trimmed_fastqc/

MultiQC output folder:

20180514_geoduck_trimgalore_rrbs/20180514_geoduck_trimmed_fastqc/multiqc_data

MultiQC report (HTML):

multiqc_report.html

BS-seq Mapping – Olympia oyster bisulfite sequencing: TrimGalore > FastQC > Bismark

0000-0002-2747-368X

Steven asked me to evaluate our methylation sequencing data sets for Olympia oyster.

According to our Olympia oyster genome wiki, we have the following two sets of BS-seq data:

All computing was conducted on our Apple Xserve: emu.

All steps were documented in this Jupyter Notebook (GitHub): 20180503_emu_oly_methylation_mapping.ipynb

NOTE: The Jupyter Notebook linked above is very large in size. As such it will not render on GitHub. It will need to be downloaded to a computer that can run Jupyter Notebooks and viewed that way.

Here’s a brief overview of what was done.

Samples were trimmed with TrimGalore and then evaluated with FastQC. MultiQC was used to generate a nice visual summary report of all samples.

The Olympia oyster genome assembly, pbjelly_sjw_01, was used as the reference genome and was prepared for use in Bismark:


/home/shared/Bismark-0.19.1/bismark_genome_preparation \
--path_to_bowtie /home/shared/bowtie2-2.3.4.1-linux-x86_64/ \
--verbose /home/sam/data/oly_methylseq/oly_genome/ \
2> 20180507_bismark_genome_prep.err

Bismark was run on trimmed samples with the following command:


/home/shared/Bismark-0.19.1/bismark \
--path_to_bowtie /home/shared/bowtie2-2.3.4.1-linux-x86_64/ \
--genome /home/sam/data/oly_methylseq/oly_genome/ \
-u 1000000 \
-p 16 \
--non_directional \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/1_ATCACG_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/2_CGATGT_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/3_TTAGGC_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/4_TGACCA_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/5_ACAGTG_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/6_GCCAAT_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/7_CAGATC_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/8_ACTTGA_L001_R1_001_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_10_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_11_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_12_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_13_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_14_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_15_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_16_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_17_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_18_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_1_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_2_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_3_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_4_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_5_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_6_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_7_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_8_s456_trimmed.fq.gz \
/home/sam/analyses/20180503_oly_methylseq_trimgalore/zr1394_9_s456_trimmed.fq.gz \
2> 20180507_bismark_02.err

Results:

TrimGalore output folder:

20180503_oly_methylseq_trimgalore

FastQC output folder:

20180503_oly_methylseq_trimgalore/20180503_trim_fastqc/

MultiQC output folder:

20180503_oly_methylseq_trimgalore/20180503_trim_fastqc/multiqc_data/

MultiQC Report (HTML):

20180503_oly_methylseq_trimgalore/20180503_trim_fastqc/multiqc_data/multiqc_report.html

Bismark genome folder: 20180503_oly_genome_pbjelly_sjw_01_bismark/

Bismark output folder:

20180507_oly_methylseq_bismark

Whole genome BS-seq (2015)

Prep overview

Library prep: Roberts Lab
Sequencing: Genewiz

Bismark Report	Mapping Percentage
1_ATCACG_L001_R1_001_trimmed_bismark_bt2_SE_report.txt	40.3%
2_CGATGT_L001_R1_001_trimmed_bismark_bt2_SE_report.txt	39.9%
3_TTAGGC_L001_R1_001_trimmed_bismark_bt2_SE_report.txt	40.2%
4_TGACCA_L001_R1_001_trimmed_bismark_bt2_SE_report.txt	40.4%
5_ACAGTG_L001_R1_001_trimmed_bismark_bt2_SE_report.txt	39.9%
6_GCCAAT_L001_R1_001_trimmed_bismark_bt2_SE_report.txt	39.6%
7_CAGATC_L001_R1_001_trimmed_bismark_bt2_SE_report.txt	39.9%
8_ACTTGA_L001_R1_001_trimmed_bismark_bt2_SE_report.txt	39.7%

MBD BS-seq (2015)

Prep overview

MBD: Roberts Lab
Library prep: ZymoResearch
Sequencing: ZymoResearch

Bismark Report	Mapping Percentage
zr1394_1_s456_trimmed_bismark_bt2_SE_report.txt	33.0%
zr1394_2_s456_trimmed_bismark_bt2_SE_report.txt	34.1%
zr1394_3_s456_trimmed_bismark_bt2_SE_report.txt	32.5%
zr1394_4_s456_trimmed_bismark_bt2_SE_report.txt	32.8%
zr1394_5_s456_trimmed_bismark_bt2_SE_report.txt	35.2%
zr1394_6_s456_trimmed_bismark_bt2_SE_report.txt	35.5%
zr1394_7_s456_trimmed_bismark_bt2_SE_report.txt	32.8%
zr1394_8_s456_trimmed_bismark_bt2_SE_report.txt	33.0%
zr1394_9_s456_trimmed_bismark_bt2_SE_report.txt	34.7%
zr1394_10_s456_trimmed_bismark_bt2_SE_report.txt	34.9%
zr1394_11_s456_trimmed_bismark_bt2_SE_report.txt	30.5%
zr1394_12_s456_trimmed_bismark_bt2_SE_report.txt	35.8%
zr1394_13_s456_trimmed_bismark_bt2_SE_report.txt	32.5%
zr1394_14_s456_trimmed_bismark_bt2_SE_report.txt	30.8%
zr1394_15_s456_trimmed_bismark_bt2_SE_report.txt	31.3%
zr1394_16_s456_trimmed_bismark_bt2_SE_report.txt	30.7%
zr1394_17_s456_trimmed_bismark_bt2_SE_report.txt	32.4%
zr1394_18_s456_trimmed_bismark_bt2_SE_report.txt	34.9%

TrimGalore/FastQC/MultiQC – Trim 10bp 5’/3′ ends C.virginica MBD BS-seq FASTQ data

0000-0002-2747-368X

Steven found out that the Bismarck documentation (Bismarck is the bisulfite aligner we use in our BS-seq pipeline) suggests trimming 10bp from both the 5′ and 3′ ends. Since this is the next step in our pipeline, we figured we should probably just follow their recommendations!

TrimGalore job script:

20180410_trimgalore_trim14bp_Cvirginica_MDB.sh

Standard error was redirected on the command line to this file:

20180411_trimgalore_10bp_Cvirginica_MBD/stderr.log

MD5 checksums were generated on the resulting trimmed FASTQ files:

20180411_trimgalore_10bp_Cvirginica_MBD/checksums.md5

All data was copied to my folder on Owl.

Checksums for FASTQ files were verified post-data transfer (data not shown).

Results:

Output folder:

20180411_trimgalore_10bp_Cvirginica_MBD

FastQC output folder:

20180411_trimgalore_10bp_Cvirginica_MBD/20180411_fastqc_trim_10bp_Cvirginica_MBD

MultiQC output folder:

20180411_trimgalore_10bp_Cvirginica_MBD/20180411_fastqc_trim_10bp_Cvirginica_MBD/multiqc_data/

MultiQC HTML report:

20180411_trimgalore_10bp_Cvirginica_MBD/20180411_fastqc_trim_10bp_Cvirginica_MBD/multiqc_data/multiqc_report.html

Hey! Look at that! Everything is much better! Thanks for the excellent documentation and suggestions, Bismarck!

TrimGalore/FastQC/MultiQC – 2bp 3′ end Read 1s Trim C.virginica MBD BS-seq FASTQ data

0000-0002-2747-368X

Earlier today, I ran TrimGalore/FastQC/MultiQC on the Crassostrea virginica MBD BS-seq data from ZymoResearch and hard trimmed the first 14bp from each read. Things looked better at the 5′ end, but the 3′ end of each of the READ1 seqs showed a wonky 2bp blip, so decided to trim that off.

I ran TrimGalore (using the built-in FastQC option), with a hard trim of the last 2bp of each first read set that had previously had the 14bp hard trim and followed up with MultiQC for a summary of the FastQC reports.

TrimGalore job script:

20180410_trimgalore_trim14bp_Cvirginica_MDB.sh

Standard error was redirected on the command line to this file:

20180410_trimgalore_trim14bp5prim_2bp3prime_Cvirginica_MBD/stderr.log

MD5 checksums were generated on the resulting trimmed FASTQ files:

20180410_trimgalore_trim14bp5prim_2bp3prime_Cvirginica_MBD/checksums.md5

All data was copied to my folder on Owl.

Checksums for FASTQ files were verified post-data transfer (data not shown).

Results:

Output folder:

20180410_trimgalore_trim14bp5prim_2bp3prime_Cvirginica_MBD/

FastQC output folder:

20180410_trimgalore_trim14bp5prim_2bp3prime_Cvirginica_MBD/20180410_fastqc_trimgalore_14bp5prime_2bp3prime_Cvirginica_MBD/

MultiQC output folder:

20180410_trimgalore_trim14bp5prim_2bp3prime_Cvirginica_MBD/20180410_fastqc_trimgalore_14bp5prime_2bp3prime_Cvirginica_MBD/multiqc_data/

MultiQC HTML report:

20180410_trimgalore_trim14bp5prim_2bp3prime_Cvirginica_MBD/20180410_fastqc_trimgalore_14bp5prime_2bp3prime_Cvirginica_MBD/multiqc_data/multiqc_report.html

Well, this is a bit strange, but the 2bp trimming on the read 1s looks fine, but now the read 2s are weird in the same region!

Regardless, while this was running, Steven found out that the Bismarck documentation (Bismarck is the bisulfite aligner we use in our BS-seq pipeline) suggests trimming 10bp from both the 5′ and 3′ ends. So, maybe this was all moot. I’ll go ahead and re-run this following the Bismark recommendations.

RESULTS

SEQUENCE CONTENT PLOT

SHORT SEQUENCE CONTAMINATION

RESULTS:

RESULTS

Results:

TrimGalore! output folder:

FastQC output folder:

MultiQC output folder:

MultiQC report (HTML):

Jupyter Notebook:

Results:

FastQC output folder:

MultiQC output folder:

MultiQC report (HTML):

20180516 – UPDATE!!

THIS WAS RUN WITH THE INCORRECT SETTING IN TRIMGALORE! --non-directional

WILL RE-RUN

Results:

TrimGalore! output folder:

FastQC output folder:

MultiQC output folder:

MultiQC report (HTML):

Results:

Prep overview

Prep overview

Results:

Results:

THIS WAS RUN WITH THE INCORRECT SETTING IN TRIMGALORE! `--non-directional`