Category Archives: Miscellaneous

Software Installation – RepeatMasker v4.0.7 on Emu/Roadrunner Continued

After yesterday’s difficulties getting RMblast to compile, I deleted the folder and went through the build process again.

This time it worked, but it did not put rmblastn in the specified location (/home/shared/rmblast).

This fact took me a fair amount of time to figure out. Finally, after a couple of different re-builds, I ran find to see if rmblastn existed somewhere I wasn’t looking:

Additionally, I couldn’t find the location of the various BLAST executables. Some internet sleuthing led me to the NCBI page on installing BLAST+ from source, which indicates that the executables are stored in:

ncbi-blast-VERSION+-src/c++/ReleaseMT/bin/

How intuitive! /s

In order to improve readability and usability of the /home/shared/ directory, I renamed the /home/shared/rmblast directory to reflect the BLAST version and created a symbolic link in that directory to the rmlbastn executable:

Symbolic link to RMBLAST

Initiate RepeatMasker configuration


Confirm perl install location:


Confirm RepeatMasker install location:


Specify TRF install location:


Hmmm, TRF error. Looking for file called trf:


Renamed TRF file to trf and now it’s automatically found:


Set RMBlast as search engine:


Set RMBlast install location:


Set RMBlast as default search engine:


Confirmation of RMBlast as default search engine and successful installation of RepeatMasker:


Software Installation – RepeatMasker v4.0.7 on Emu/Roadrunner

Steven asked that I re-run some Olympia oyster transposable elements analysis using RepeatMasker and a newer version of our Olympia oyster genome assembly.

Installed the software on both of the Apple Xserves (Emu and Roadrunner) running Ubuntu 16.04.

Followed the instructions outlined here:

Starting with the prerequisites:

1. Download and install RMBlast

  • NCBI Blast 2.6.0 source

  • isb 2.6.0 patch

Unfortunately, the make command continually failed:

cd /home/shared/ncbi-blast-2.6.0+-src/c++
make

While trying to troubleshoot this issue, continued with the other prerequisites:

2. Downloaded Tandem Repeat Finder v.4.09

  • Saved file (trf409.linux64) to /home/shared/bin. NOTE: /home/shared/bin is part of the system PATH. See the /etc/environment file.

  • Changed permissions to be executable:

sudo chmod 775 trf409.linux64

3. Downloaded RepBase RepeatMasker Edition 20170127 (NOTE: This requires registration in order to obtain a username/password to download the file).

Installed RepeatMasker:

4. Downloaded RepeatMasker 4.0.7

  • Saved to /home/shared/RepeatMasker-4.0.7

5. Installed RepBase RepeatMasker Edition 20170127 in /home/shared//home/shared/RepeatMasker-4.0.7/Libraries

Currently re-building RMBlast and it takes forever… Will report back when I have it running.

TrimGalore/FastQC/MultiQC – TrimGalore! RRBS Geoduck BS-seq FASTQ data (directional)

Earlier this week, I ran TrimGalore!, but set the trimming, incorrectly – due to a copy/paste mistake, as --non-directional, so I re-ran with the correct settings.

Steven requested that I trim the Geoduck RRBS libraries that we have, in preparation to run them through Bismark.

These libraries were originally created by Hollie Putnam using the TruSeq DNA Methylation Kit (Illumina):

All analysis is documented in a Jupyter Notebook; see link below.

Overview of process:

  1. Run TrimGalore! with --paired and --rrbs settings.

  2. Run FastQC and MultiQC on trimmed files.

  3. Copy all data to owl (see Results below for link).

  4. Confirm data integrity via MD5 checksums.

Jupyter Notebook:


Results:
TrimGalore! output folder:
FastQC output folder:
MultiQC output folder:
MultiQC report (HTML):

FastQC – RRBS Geoduck BS-seq FASTQ data

Earlier today I finished trimming Hollie’s RRBS BS-seq FastQ data.

However, the original files were never analyzed with FastQC, so I ran it on the original files.

These libraries were originally created by Hollie Putnam using the TruSeq DNA Methylation Kit (Illumina):

FastQC was run, followed by MultiQC. Analysis was run on Roadrunner.

All analysis is documented in a Jupyter Notebook; see link below.

Jupyter Notebook:

Results:
FastQC output folder:
MultiQC output folder:
MultiQC report (HTML):

TrimGalore/FastQC/MultiQC – TrimGalore! RRBS Geoduck BS-seq FASTQ data


20180516 – UPDATE!!

THIS WAS RUN WITH THE INCORRECT SETTING IN TRIMGALORE! --non-directional

WILL RE-RUN


Steven requested that I trim the Geoduck RRBS libraries that we have, in preparation to run them through Bismark.

These libraries were originally created by Hollie Putnam using the TruSeq DNA Methylation Kit (Illumina):

All analysis is documented in a Jupyter Notebook; see link below.

Overview of process:

  1. Copy EPI* FastQ files from owl/P_generosa to roadrunner.

  2. Confirm data integrity via MD5 checksums.

  3. Run TrimGalore! with --paired, --rrbs, and --non-directional settings.

  4. Run FastQC and MultiQC on trimmed files.

  5. Copy all data to owl (see Results below for link).

  6. Confirm data integrity via MD5 checksums.

Jupyter Notebook:


Results:
TrimGalore! output folder:
FastQC output folder:
MultiQC output folder:
MultiQC report (HTML):

DNA Isolation & Quantification – Metagenomics Water Filters

After discussing the preliminary DNA isolation attemp with Steven & Emma, we decided to proceed with DNA isolations on the remaining 0.22μm filters.

Isolated DNA from the following five filters:

DNA was isolated with the DNeasy Blood & Tissue Kit (Qiagen), following a modified version of the Gram-Positive Bacteria protocol:

  • filters were unfolded and unceremoniously stuffed into 1.7mL snap cap tubes
  • did not perform enzymatic lysis step
  • filters were incubated with 400μL of Buffer AL and 50μL of Proteinase K (both are double the volumes listed in the kit and are necessary to fully coat the filter in a 1.7mL snap cap tube)
  • 56oC incubations were performed overnight
  • 400μL of 100% ethanol was added to each after the 56oC incubation
  • samples were eluted in 50μL of Buffer AE
  • all spins were performed at 20,000g

Samples were quantified with the Roberts Lab Qubit 3.0 and the Qubit 1x dsDNA HS Assay Kit.

Used 5μL of each sample for measurement (see Results for update).

Results:

Raw data (Google Sheet): 20180426_qubit_metagenomics_filters

Sample Concentration(ng/μL) Initial_volume(μL) Yield(ng)
Filter #10 pH 7.1 5/15/17 0.296 50 14.65
Filter #7 pH 8.2 5/15/17 8.44 50 422
Filter #7 pH 8.2 5/1917 2.52 50 126
Filter #10 pH 7.1 5/22/17 2.0 50 100
Filter #10 pH 7.1 5/26/17 11.9 50 595

Samples were stored Sam gDNA Box #2, positions G8 – H3. (FTR 213, #27 (small -20oC frezer))

Total Alkalinity Calculations – Yaamini’s Ocean Chemistry Samples

I ran a subset of Yaamini’s ocean chemistry samples on our T5 Excellence titrator (Mettler Toledo) at the beginning of April. The subset were samples taken from the beginning, middle, and end of the experiment. The rationale for this was to assess whether or not total alkalinity (TA) varied across the experiment. If there was little variation, then there’d likely be no need to run all of the samples. However, should there be temporal differences, then all samples should be processed.

Data analysis was performed in the following R Project:

The R Project above was initially copied from the R Project for our titrator on GitHub:

Three separate, data-file-specific versions of the TA_calculations.R script were created and run:

Salinity values (PSU) were collected from the following spreadsheet (Google Sheet) and manually entered in each of the R scripts:

Specifically, the TA calculations were performed using the seacarb library, with the at() function.

Results:
sample_names TA_values (μmol/kg)
H1 A 2/20/17 2390.88423
H2 A 2/20/17 2393.39207
T1 A 2/20/17 2367.78791
T2 A 2/20/17 2319.39360
T3 A 2/20/17 2309.88602
T4 A 2/20/17 2287.72108
T5 A 2/20/17 2336.14773
T6 A 2/20/17 2298.36327
H1 A 3/20/17 2870.73309
H2 A 3/20/17 2760.49972
T1 A 3/20/17 2930.29308
T2 A 3/20/17 2925.95472
T3 A 3/20/17 2896.55123
T4 A 3/20/17 2769.72514
T5 A 3/20/17 2743.33934
T6 A 3/20/17 2727.94064
H1 A 4/4/17 2770.20971
H2 A 4/4/17 2656.27437
T1 A 4/4/17 2801.77913
T2 A 4/4/17 2822.51611
T3 A 4/4/17 2800.87387
T4 A 4/4/17 2584.60933
T5 A 4/4/17 2645.37017
T6 A 4/4/17 2604.22677

Well, it certainly looks like there’s some variation across the experiment. It’s likely that all remaining samples will need to be processed. Will pass along data to Yaamini for her to evaluate.

TrimGalore/FastQC/MultiQC – Trim 10bp 5’/3′ ends C.virginica MBD BS-seq FASTQ data

Steven found out that the Bismarck documentation (Bismarck is the bisulfite aligner we use in our BS-seq pipeline) suggests trimming 10bp from both the 5′ and 3′ ends. Since this is the next step in our pipeline, we figured we should probably just follow their recommendations!

TrimGalore job script:

Standard error was redirected on the command line to this file:

MD5 checksums were generated on the resulting trimmed FASTQ files:

All data was copied to my folder on Owl.

Checksums for FASTQ files were verified post-data transfer (data not shown).

Results:

Output folder:

FastQC output folder:

MultiQC output folder:

MultiQC HTML report:

Hey! Look at that! Everything is much better! Thanks for the excellent documentation and suggestions, Bismarck!

DNA Isolation & Quantification – Metagenomics Water Filters

Isolated DNA from the following two filters:

DNA was isolated with the DNeasy Blood & Tissue Kit (Qiagen), following a modified version of the Gram-Positive Bacteria protocol:

  • filters were unfolded and unceremoniously stuffed into 1.7mL snap cap tubes
  • did not perform enzymatic lysis step
  • filters were incubated with 400μL of Buffer AL and 50μL of Proteinase K (both are double the volumes listed in the kit and are necessary to fully coat the filter in a 1.7mL snap cap tube)
  • 56oC incubations were performed overnight
  • 400μL of 100% ethanol was added to each after the 56oC incubation
  • samples were eluted in 50μL of Buffer AE
  • all spins were performed at 20,000g

Samples were quantified with the Roberts Lab Qubit 3.0 and the Qubit 1x dsDNA HS Assay Kit.

Used 10μL of each sample for measurement (see Results for update).

Results:

Raw data (Google Sheet): 20180411_qubit_metagenomics_filters

Sample Concentration(ng/μL) Initial_volume(μL) Yield(ng)
filter 5/22 #7 pH8.2 20.8 50 1040
filter 5/26 #7 pH8.2 11.6 50 580

NOTE: For “filter 5/22 #7 pH8.2″ the initial quantification using 10μL ended up being too concentrated. Re-ran using 5μL.

Both samples have yielded DNA. This is, obviously, an improvement over the previous attempts to isolate DNA from ammonium bicarbonate filter rinses that Emma supplied me with.

Will discuss with Steven and get an idea of which filters to isolate additional DNA from.

Samples were stored Sam gDNA Box #2, positions G6 & G7. (FTR 213, #27 (small -20oC frezer)

TrimGalore/FastQC/MultiQC – 2bp 3′ end Read 1s Trim C.virginica MBD BS-seq FASTQ data

Earlier today, I ran TrimGalore/FastQC/MultiQC on the Crassostrea virginica MBD BS-seq data from ZymoResearch and hard trimmed the first 14bp from each read. Things looked better at the 5′ end, but the 3′ end of each of the READ1 seqs showed a wonky 2bp blip, so decided to trim that off.

I ran TrimGalore (using the built-in FastQC option), with a hard trim of the last 2bp of each first read set that had previously had the 14bp hard trim and followed up with MultiQC for a summary of the FastQC reports.

TrimGalore job script:

Standard error was redirected on the command line to this file:

MD5 checksums were generated on the resulting trimmed FASTQ files:

All data was copied to my folder on Owl.

Checksums for FASTQ files were verified post-data transfer (data not shown).

Results:

Output folder:

FastQC output folder:

MultiQC output folder:

MultiQC HTML report:

Well, this is a bit strange, but the 2bp trimming on the read 1s looks fine, but now the read 2s are weird in the same region!

Regardless, while this was running, Steven found out that the Bismarck documentation (Bismarck is the bisulfite aligner we use in our BS-seq pipeline) suggests trimming 10bp from both the 5′ and 3′ ends. So, maybe this was all moot. I’ll go ahead and re-run this following the Bismark recommendations.