DNA Isolation – Ostrea lurida DNA for PacBio Sequencing

In an attempt to improve upon the partial genome assembly we received from BGI, we will be sending DNA to the UW PacBio core facility for additional sequencing.

Isolated DNA from mantle tissue from the same Ostrea lurida individual used for the BGI sequencing efforts. Tissue was collected by Brent & Steven on 20150812.

Used the E.Z.N.A. Mollusc Kit (Omega) to isolate DNA from two separate 50mg pieces of mantle tissue according to the manufacturer’s protocol, with the following changes:

  • Samples were homogenized with plastic, disposable pestle in 350μL of ML1 Buffer
  • Incubated homogenate at 60C for 1.5hrs
  • No optional steps were used
  • Performed three rounds of 24:1 chloroform:IAA treatment
  • Eluted each in 50μL of Elution Buffer and pooled into a single sample

Quantified the DNA using the Qubit dsDNA BR Kit (Invitrogen). Used 1μL of DNA sample.

Concentration = 326ng/μL (Quant data is here [Google Sheet]: 20161214_gDNA_Olurida_qubit_quant

Yield is good and we have more than enough (~5μg is required for sequencing) to proceed with sequencing.

Evaluated gDNA quality (i.e. integrity) by running ~500ng (1.5μL) of sample on 0.8% agarose, low-TAE gel stained with ethidium bromide.

Used 5μL of O’GeneRuler DNA Ladder Mix (ThermoFisher).

Results:

 

 

Overall, the gel looks OK. A fair amount of smearing, but a strong, high molecular weight band is present. The intensity of the smearing is likely due to the fact that the gel is overloaded for this particular well size. If I had used a broader comb and/or loaded less DNA, the band would be more defined and the smearing would be less prominent.

Will submit sample to the UW PacBio facility tomorrow!

Data Managment – Trim Output Cells from Jupyter Notebook

Last week I downloaded the final BGI data files and assemblies for Olympia oyster and geoduck genome sequencing projects. However, the output from the download command made the Jupyter Notebook files too large to view and/or upload to GitHub. So, I had to trim the output cells from that notebook in order to render it usable/viewable.

The notebook below details how I did that and also examines the original version of that jumbo notebook to give some idea of what the command outputs were, for posterity.

Jupyter Notebook: 20161214_docker_notebook_trimming.ipynb

Data Management – Download Final BGI Genome & Assembly Files

We received info to download the final data and genome assembly files for geoduck and Olympia oyster from BGI.

In total, the downloads took a little over three days to complete!

The notebook detailing how the files were downloaded is below, but it should be noted that I had to strip the output cells because the output from the download command made the file too large to upload to GitHub, and the size of the notebook file would constantly crash the browser/computer that it was opened in. So, the notebook below is here for posterity.

Jupyter Notebook: 20161206_docker_BGI_genome_downloads.ipynb

 

Goals – December 2016

Well, I’ve finally progressed with the Olly GBS analysis!

I’m nearly finished with the analysis of the de novo PyRad data. Next, I’ll run PyRad using our Oly partial genome that we have from BGI. This will allow a more descriptive evaluation of SNP loci, since we’ll actually be able to associate the SNPs with various gene annotations, thus providing more meaningful insight.

On the Oly genome front, I also need to submit samples for PacBio sequencing. This will be an attempt to fill in the gaps of the Oly genome scaffold we currently have.

Finally, if all goes well, I’ll get something written up and submitted to Scientific Data.

Data Analysis – Continued O.lurida Fst Analysis from GBS Data

Continued the analysis I started the other day. Still following Katherine Silliman’s notebook for guidance.

Quick summary of this analysis:

  • Mean Fst comparing all populations = 0.139539326187024
  • Mean Fst HL vs NF = 0.143075552548742
  • Mean Fst HL vs SN = 0.155234939276722
  • Mean Fst NF vs SN = 0.117889300124951

NOTE: Mean Fst values were calculated after replacing negative Fst values with 0. Thus, the means are higher than they would be had the raw data been used. I followed Katherine’s notebook and she doesn’t explicitly explain why she does this, nor what the potential implications are for interpreting the data. Will have to discus her rationale behind this with her.

Jupyter notebook: 20161201_docker_oly_vcf_analysis_R.ipynb

Computing – An Excercise in Futility

Trying to continue my Oly GBS analsyis from the other day and follow along with Katherine Silliman’s notebook

However, I hit a major snag: I can’t seem to run R in my Jupyter notebook! This is a major pain, since the notebook needs to be able to switch between languages; that’s the beauty of using these notebooks.

Below, is a documentation of my torments today.

Currently, I’m creating a new Docker image that uses the Debian Apt repository to install R version 3.1.1. Going through Apt instead of installing from source (as I had been previously done in the Dockerfile) should install all the necessary dependencies for R and get resolve some of the error messages I’m seeing.

Otherwise, the last resort will be to use R outside of the notebook and document that process separately.

Anyway, this is the kind of stuff that is immensely time consuming and frustrating that most people don’t realize goes on with all of this computing stuff…

Notebook: 20161129_docker_R_magics_failure.ipynb

Data Analysis – Initial O.lurida Fst Determination from GBS Data

Finally running some analysis on the output from my PyRad analysison 20160727.

I’m following Katherine Silliman’s Jupyter notebook (2bRAD Subset Population Structure Analysis.ipynb) as a guide.

The initial analysis (which isn’t much) is in the Jupyter notebook below. The analysis will be continued on a later date.

Jupyter notebook: 20161117_docker_oly_vcf_analysis.ipynb

I’ve embedded the notebook below, but it’s much easier to view (there are many lengthy commands/filenames that wrap lines in the embedded version below) the actual file linked above.

Data Management – Tracking O.lurida FASTQ File Corruption

UPDATE 20170104 – These two corrupt files have been replaced with non-corrupt files.


 

Sean identified an issue with one of the original FASTQ files provided to use by BGI. Additionally, Steven had (unknowingly) identified the same corrupt file, as well as a second corrupt file in the set of FASTQ files. The issue is discussed here: https://github.com/sr320/LabDocs/issues/334

Steven noticed the two files when he ran the program FASTQC and two files generated no output (but no error message!).

The two files in question are:

  • 151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz
  • 151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz

This post is an attempt to document where things went wrong, but having glanced through this data a bit already, it won’t provide any answers.

I originally downloaded the data on 20160127 to my home folder on Owl (this is detailed in the Jupyter notebook in that post) and generated/compared MD5 checksum values. The values matched at that time.

So, let’s investigate a bit further…

Launch Docker container

docker run - p 8888:8888 -v /Users/sam/data/:/data -v /Users/sam/owl_home/:/owl_home -v /Users/sam/owl_web/:owl_web -v /Users/sam/gitrepos/LabDocs/jupyter_nbs/sam/:/jupyter_nbs -it 0ba43904567e

The command allows access to Jupyter Notebook over port 8888 and makes my Jupyter Notebook GitHub repo and my data files accessible to the Docker container.

Once the container was started, started Jupyter Notebook with the following command inside the Docker container:

jupyter notebook

This command is configured in the Docker container to launch a Jupyter Notebook without a browser on port 8888.

Jupyter notebook file: 20161117_docker_oly_genome_fastq_corruption.ipynb

I’ve embedded the notebook below, but it’s much easier to view (there are many lengthy commands/filenames that wrap lines in the embedded version below) the actual file linked above.

Computer Management – Additional Configurations for Reformatted Xserves

Sean got the remaining Xserves configured to run independently from the master node of the cluster they belonged to and installed OS X 10.11 (El Capitan).

The new computer names are Ostrich (formerly node004) and Emu (formerly node002).

 

He enabled remote screen sharing and remote access for them.

Sean also installed a working hard drive on Roadrunner and got that back up and running.

I went through this morning and configured the computers with some other changes (some for my user account, others for the entire computer):

  • Renamed computers to reflect just the corresponding bird name (hostnames had been labeled as “bird name’s Xserve”)

  • Created srlab user accounts

  • Changed srlab user accounts to Standard instead of Administrative

  • Created steven user account

  • Turned on Firewalls

  • Granted remote login access to all users (instead of just Administrators)

  • Installed Docker Toolbox

  • Changed power settings to start automatically after power failure

  • Added computer name to login screen via Terminal:

sudo defaults write /Library/Preferences/com.ap\ple.loginwindow LoginwindowText "TEXT GOES HERE"
  • Changed computer HostName via Terminal so that Terminal displays computer name:
sudo scutil --set HostName "TEXT GOES HERE"
  • Installed Mac Homebrew (I don’t know if installation of Homebrew is “global” – i.e. installs for all users)

  • Used Mac Homebrew to install wget

  • Used Mac Homebrew to install tmux

Data Management – Modify Eagle/Owl Cloud Sync Account

Re-examining our backup options for our two Synology servers (Eagle & Owl), I realized that they were both backing up to the just my account on UW’s unlimited Google Drive storage.

The desired backup was to go to our shared UW account, so that others in the lab would have access to the backups.

Strangely, I could not add the shared UW account (srlab) to my list of Google accounts. In order to verify the shared UW account with Google, I had to connect to the servers’ web interfaces in a private browsing session and then I was able to provide the correct user account info/permissions.

Anyway, it’s all going to our shared UW account now.

 

SELECT GOOGLE DRIVE AS THE SYNC PROVIDER:

 

 

 

 

SHARED UW ACCOUNT IS NOT A CHOICE:

 

 

TRY “ADD ACCOUNT”:

 

BUT ADD ACCOUNT DOESN’T WORK (DROP-DOWN MENU DOESN’T OFFER SRLAB AS A CHOICE)”

 

 

 

REPEAT STEPS, BUT CONNECT TO SYNOLOGY VIA PRIVATE BROWSING SESSION AND IT’S GOOD TO GO:

 

 

SET LOCAL AND REMOTE FOLDERS:

 

 

CONFIRMATION THAT IT’S SET UP:

 

 

AND, IT’S RUNNING: