Tag Archives: roadrunner

FastQC – RRBS Geoduck BS-seq FASTQ data

Earlier today I finished trimming Hollie’s RRBS BS-seq FastQ data.

However, the original files were never analyzed with FastQC, so I ran it on the original files.

These libraries were originally created by Hollie Putnam using the TruSeq DNA Methylation Kit (Illumina):

FastQC was run, followed by MultiQC. Analysis was run on Roadrunner.

All analysis is documented in a Jupyter Notebook; see link below.

Jupyter Notebook:

FastQC output folder:
MultiQC output folder:
MultiQC report (HTML):

TrimGalore/FastQC/MultiQC – TrimGalore! RRBS Geoduck BS-seq FASTQ data

20180516 – UPDATE!!



Steven requested that I trim the Geoduck RRBS libraries that we have, in preparation to run them through Bismark.

These libraries were originally created by Hollie Putnam using the TruSeq DNA Methylation Kit (Illumina):

All analysis is documented in a Jupyter Notebook; see link below.

Overview of process:

  1. Copy EPI* FastQ files from owl/P_generosa to roadrunner.

  2. Confirm data integrity via MD5 checksums.

  3. Run TrimGalore! with --paired, --rrbs, and --non-directional settings.

  4. Run FastQC and MultiQC on trimmed files.

  5. Copy all data to owl (see Results below for link).

  6. Confirm data integrity via MD5 checksums.

Jupyter Notebook:

TrimGalore! output folder:
FastQC output folder:
MultiQC output folder:
MultiQC report (HTML):

NovaSeq Assembly – Trimmed Geoduck NovaSeq with Meraculous

Attempted to use Meraculous to assemble the trimmed geoduck NovaSeq data.

Here’s the Meraculous manual (PDF).

After a bunch of various issues (running out of hard drive space – multiple times, config file issues, typos), I’ve finally given up on running meraculous. It failed, again, saying it couldn’t find a file in a directory that meraculous created! I’ve emailed the authors and if they have an easy fix, I’ll implement it and see what happens.

Anyway, it’s all documented in the Jupyter Notebook below.

One good thing came out of all of it is that I had to run kmergenie to identify an appopriate kmer size to use for assembly, as well as estimated genome size (this info is needed for both meraculous and SOAPdeNovo (which I’ll be trying next)):

kmergenie output folder: http://owl.fish.washington.edu/Athaliana/20180125_geoduck_novaseq/20180206_kmergenie/
kmergenie HTML report (doesn’t display histograms for some reason): 20180206_kmergenie/histograms_report.html
kmer size: 117
Est. genome size: 2.17Gbp

Jupyter Notebook (GitHub): 20180205_roadrunner_meraculous_geoduck_novaseq.ipynb

Software Installation – ALPACA on Roadrunner

List of software that needed installing to run ALPACA:

Installed all software in:


Had to change permissions on /home/shared/. Used the following to change permissions recursively (-R) to allow all admin (i.e. sudo group) users to read/write in this directory:

$sudo chown -R :sudo /home/shared

Compiled Celera Assembler from source (per the ALPACA requirements). This is the source file that I used: https://sourceforge.net/projects/wgs-assembler/files/wgs-assembler/wgs-8.3/wgs-8.3rc2.tar.bz2/download

Added all software to my system PATH by adding the following to my ~./bashrc file:

## Add bioinformatics softwares to PATH

export PATH=${PATH}:

After adding that info to the bottom of my ~./bashrc file, I re-loaded the file into system memory by sourcing the file:

$source ~/.bashrc

Followed the ALPACA test instructions to confirm proper installation. More specific test instructions are actually located at the top of this file: /home/shared/alpaca/scripts/run_example.sh

Changed Celera Assembler directory name:

$mv /home/shared/wgs-8.3rc2 /home/shared/wgs-assembler
Step 1.
$mkdir /home/shared/test
Step 2.
$cd /home/shared/test/
Step 3.

Step three failed (which executes the run_example.sh script) due to permission problems.

Realized the script file didn’t have execute perimssions so I added execute permissions with the following command:

$sudo chmod +x /home/shared/alpaca/scripts/run_example.sh
Step 4. Continued with ALPACA Tests 2 & 3.

Everything tested successfully. Will try to get an assembly running with our PacBio and Illumina data.

Computer Management – Additional Configurations for Reformatted Xserves

Sean got the remaining Xserves configured to run independently from the master node of the cluster they belonged to and installed OS X 10.11 (El Capitan).

The new computer names are Ostrich (formerly node004) and Emu (formerly node002).


He enabled remote screen sharing and remote access for them.

Sean also installed a working hard drive on Roadrunner and got that back up and running.

I went through this morning and configured the computers with some other changes (some for my user account, others for the entire computer):

  • Renamed computers to reflect just the corresponding bird name (hostnames had been labeled as “bird name’s Xserve”)

  • Created srlab user accounts

  • Changed srlab user accounts to Standard instead of Administrative

  • Created steven user account

  • Turned on Firewalls

  • Granted remote login access to all users (instead of just Administrators)

  • Installed Docker Toolbox

  • Changed power settings to start automatically after power failure

  • Added computer name to login screen via Terminal:

sudo defaults write /Library/Preferences/com.ap\ple.loginwindow LoginwindowText "TEXT GOES HERE"
  • Changed computer HostName via Terminal so that Terminal displays computer name:
sudo scutil --set HostName "TEXT GOES HERE"
  • Installed Mac Homebrew (I don’t know if installation of Homebrew is “global” – i.e. installs for all users)

  • Used Mac Homebrew to install wget

  • Used Mac Homebrew to install tmux

Docker – VirtualBox Defaults on OS X

I noticed a discrepancy between what system info is detected natively on Roadrunner (Apple Xserve) and what was being shown when I started a Docker container.

Here’s what Roadrunner’s system info looks like outside of a Docker container:


However, here’s what is seen when running a Docker container:



It’s important to notice the that the Docker container is only seeing 2 CPUs. Ideally, the Docker container would see that this system has 8 cores available. By default, however, it does not. In order to remedy this, the user has to adjust settings in VirtualBox. VirtualBox is a virtual machine thingy that gets installed with the Docker Toolbox for OS X. Apparently, Docker runs within VirtualBox, but this is not really transparent to a beginner Docker user on OS X.

To change the way VirtualBox (and, in turn, Docker) can access the full system hardware, you must launch the VirtualBox application (if you installed Docker using Docker Toolbox, you should be able to find this in your Applications folder). Once you’ve launched VirtualBox, you’ll have to turn off the virtual machine that’s currently running. Once that’s been accomplished, you can make changes and then restart the virtual machine.


Shutdown VirtualBox machine before you can make changes:


Here are the default CPU settings that VirtualBox is using:



Maxed out the CPU slider:




Here are the default RAM settings that VirtualBox is using:




Changed RAM slider to 24GB:




Now, let’s see what the Docker container reports for system info after making these changes:


Looking at the CPUs now, we see it has 8 listed (as opposed to only 2 initially). I think this means that Docker now has full access to the hardware on this machine.

This situation is a weird shortcoming of Docker (and/or VirtualBox). Additionally, I think this issue might only exist on the OS X and Windows versions of Docker, since they require the installation of the Docker Toolbox (which installs VirtualBox). I don’t think Linux installations suffer from this issue.

Docker – One liner to create Docker container

One liner to create Docker container for Jupyter notebook usage and data analysis on roadrunner (Xserve):

docker run -p 8888:8888 -v /Users/sam/gitrepos/LabDocs/jupyter_nbs/sam/:/notebooks -v /Users/sam/data/:/data -v /Users/sam/analysis/:/analysis -it kubu4/bioinformatics:v11 /bin/bash

This does the following:

  • Maps roadrunner port 8888 to Docker container port 8888 (for Jupyter notebook access outside of the Docker container)
  • Mounts my local Jupyter notebooks directory to the

    directory in the Docker container

  • Mounts my local data directory to the

    directory in the Docker container

  • Mounts my local analysis directory to the

    directory in the Docker container

These commands allow me to interact with data outside of the Docker container.

Computer Setup – Cluster Node003 Conversion

Here’s an overview of some of the struggles getting node003 converted/upgraded to function as an independent computer (as opposed to a slave node in the Apple computer cluster).

  • 6TB HDD
  • Only 2.2TB recognized when connected to Hummingbird via Firewire – internet suggests that is max for Xserve; USB might recognize full drive) – Hummingbird is a converted Xserve running Mavericks
  • Reformatted on different Mac and full drive size recognized
  • Connected to Hummingbird (via USB) and full 6TB recognized
  • Connected to Mac Mini to install OS X
  • Tried installing OS X 10.8.5 (Mountain Lion) via CMD+r at boot, but failed partway through installation
  • Tried and couldn’t reformat drive through CMD+r at boot with Disk Utility
  • Broken partition tables identified on Linux, used GParted to establish partition table, back to Mac Mini and OS X (Mountain Lion) install worked
  • Upgraded to OS X 10.11.5 (El Capitan)
  • Inserted drive to Mac cluster node003 – wouldn’t boot all the way – Apple icon, progress bar > Do Not Enter symbol
  • Removed drive, put original back in, connected 6TB HDD via USB, but booting from USB not an option (when booting and holding Option key)
  • Probably due to node003 being part of cluster – reformatted original node003 drive with clean install of OS X Server.
  • Booting from USB now an option and worked with 6TB HDD!
  • Put 6TB HDD w/El Capitan in internal sled and won’t boot! Apple icon, progress bar > Do Not Enter symbol
  • Installed OS X 10.11.5 (El Capitan) on old 1TB drive and inserted into node003 – worked perfectly!
  • Will just use 1TB boot drive and figure out another use for 6TB HDD
  • Renamed node003 to roadrunner
  • Current plan is to upgrade from 12GB to 48GB of RAM and then automate moving data off this drive to long-term storage on Owl (Synology server).