Whoops! It’s already September 6th! The 1st of the month came and went without me noticing.
One goal for this month: Write up and submit Olympia oyster genotype-by-sequencing (GBS) data to Scientific Data for publication.
Whoops! It’s already September 6th! The 1st of the month came and went without me noticing.
One goal for this month: Write up and submit Olympia oyster genotype-by-sequencing (GBS) data to Scientific Data for publication.
Unfortunately, most of this month’s goals are the same as last months!
The computer became completely unresponsive (for the second time in less than 24hrs). Maybe the problem is Docker. Maybe it’s creating a remote tunnel to a Docker container. Maybe it’s running Jupyter Notebook through a remote tunnel into a Docker container? I don’t know. At this point, I’ll just install PyRad directly on Roadrunner and try to get the analysis done that way. It certainly isn’t convenient because it means I have to be physically present at Roadrunner to execute commands and check on things…
Quantify coral DNA methylation. This should be straightforward and completed on Tuesday, 20160705.
Current goals are as follows:
Complete Oly GBS data analysis. This is getting closer to actually being done. Had some issues with an external hard drive crashing. I’ve since replaced that and the analysis is running (it takes multiple days per stage of the analysis on Hummingbird).
Configure computing instances on Amazon AWS to improve our ability to handle these large data sets in a more timely fashion.
Begin using the UW’s Hyak computing cluster to improve our ability to handles these large data sets in a more timely fashion.
Well, I guess the first goal is to remember to be more consistent about writing monthly goals…
Anyway, here they are – short and sweet. Most of them are really part of a to-do list, as opposed to goals, but I’ll still put them down.
I’d review last month’s goals, but I completely forgot to post them!
However, I did accomplish the two most important goals that I needed to get done last month:
Prep geoduck (Panopea generosa) gDNA for genome sequencing
For this month, I’m looking at tackling the following:
RNA has been isolated and DNased from Jake’s mechanically stressed samples.
Improved organization of -80C, added frequently used protocols to the Roberts Lab GitHub Wiki, connected BGI rep with purchasing to get the PO situation figured out for Olympia oyster and geoduck genome sequencing, sent samples to BGI for Olympia oyster genome-by-sequencing (GBS).
Need to quantify the DNased RNA from Jake’s mechanically stressed oysters and then verify that the DNase treatment worked. Will then proceed with reverse transcription.
Continue work on -80C organization, continue creating “readme” files for folders on our server(s), continue migration from Wikispaces to GitHub, attempt to combine our PrimerDatabase and our Primer Stocks spreadsheets into a single document (and create a SQL database from that), fix the shortfalls from our EH&S lab inspection.
Before we check out this month’s goals, let’s have a quick review of last month’s goals and which, if any, I was able to accomplish:
Done. Data from this was received 20150629.
Little progress. No response from Univ. of Oregon. However, it seems that Mac has been having a similar issue with libraries constructed using Epigentek kits. She has contacted Illumina and Epigentek for help.
No progress. Purchasing personnel (both departmental and university) appear to have had difficulty contacting BGI. I have put both purchasing personnel in contact with the BGI rep (Frank Hu), so things are starting to progress. Still not certain why they were unable to accomplish this.
Goal(s):
Status:
wget
command for offline notebook backups
Need to isolate RNA from a set of Jake’s mechanically-stressed oyster tissue. Need to clarify with Steven if we want the 1hr post-mechanically stressed to “match” with the 1hr heat shock samples that were previously processed, or if we want the 24hr post-mechanically stressed samples.
Need to address short-falls from the EH&S annual lab inspection. Most are minor, easily addressed issues and won’t take much time (e.g. print lab safety signs in color). The various computer tasks still stand: notebook migration from Wikispaces to this notebook, transitioning lab resources from Wikispaces to GitHub, creating/updating “readme” files for directories on our servers, etc. Lab organization goals: improve -80C organization and create an online inventory of the -80C, establish an online inventory of lab supply locations (i.e. thermometers can be found in FTR209, Drawer #02), general clean up.
Before we check out this month’s goals, let’s have a quick review of last month’s goals and which, if any, I was able to accomplish.
Goal(s): Isolate RNA from geoduck histology blocks
Status: Accomplished!
Goal(s): Glean additional info about this data set and our ability/inability to create our own BS-seq libraries.
Status: Still a mystery. Currently reaching out to Doug Turnbull at the Univ. of Oregon Genomics Core Facility to see if he can provide any insight as to why our data looks the way it does, which might help us figure out why we’re having such difficulty mapping our reads to the C.gigas genome…
Goal(s): BS-seq Claire’s samples.
Status: Untouched. Is dependent upon whether or not we can successfully create our own high-throughput sequencing libraries (see above).
Goal(s):
Status:
wget
command for offline notebook backups
This project is progressing relatively smoothly. Finished RNA isolations from all samples and checked their qualities via Bioanalyzer. Steven and Brent selected samples of males and females to pool for RNA-seq. Goal is to have these two pools sent off to GENEWIZ, Inc. for RNA-seq. Currently awaiting a quote adjustment as well as an answer regarding sample quantity requirements. Hope to have these sent off later today and data back by the end of the month. This data will be used alongside proteomics data that Emma is currently generating.
The troubleshooting for the data from these “homemade” libraries continues. We’ve tried various approaches to trimming the data, but Steven’s mapping attempts are still not yielding great results. I’ve contacted Univ. of Oregon Genomics Core Facility to see if they can provide insight, but haven’t gotten a response. Will hit them up again to see if I can get a response (and some help).
We have quotes from BGI Americas for genome sequencing for both of these organisms. Currently, we’re awaiting for funding to be processed, but expect it to be available this month. Hope to send out samples this month.
This is still dependent upon our ability to make our own BS-seq libraries. Until, then, this project will likely be on the back burner for awhile.
I’d like to continue to contribute to our GitHub code repository with various command line tips and tricks. Additionally, I do need to actually spend some time creating/updating README files for our servers. We have a ton of folders that need some sort of descriptor file in them so users know what to expect to find in those folders. Additionally, we have a ton of data that needs descriptions and/or links to the projects from which the data was generated to serve as a means for people to know how/why/from what the data was generated. This has been done for newer data sets, but there’s a tremendous amount of data sets that have no information about them available in the README files. Also along the data management front, I’d like to tackle a bit of a reorganization, particularly re-establishing the go-to resource for lab members to find “stuff.” For example, Jake recently needed to know where/if we had some software and had to ask about it. Better organization on our part would eliminate him wasting time trying to track down this sort of thing. Part of the organizational issue is that we’ve partially transitioned over to using GitHub instead of Wikispaces. However, the transition hasn’t been fully realized/implemented and the result is fragmentation and confusion on where to find lab info. Oh, one last “digital” note. I’ll be teaching the Unix Shell lesson at Software Carpentry on June 25 – 26, so I have to get prepped for that (not on work time, of course).
In the lab, I still need to tackle some lab cleanup tasks that I neglected to deal with last month (autoclaving, glass disposal). Additionally, I need to continue helping Jonathan with his Capstone project, but I need to manage my time with him better.
Here are the things I plan to tackle throughout the month of May:
My primary goal for this project is to successfully isolate RNA from the remaining, troublesome paraffin blocks that have yet to yield any usable RNA. The next approach to obtain usable quantities of RNA is to directly gouge tissue from the blocks instead of sectioning the blocks (as recommended in the PAXgene Tissue RNA Kit protocol). Hopefully this approach will eliminate excess paraffin, while increasing the amount of input tissue. Once I have RNA from the entire suite of samples, I’ll check the RNA integrity via Bioanalyzer and then we’ll decide on a facility to use for high-throughput sequencing.
Currently, there are two projects that we have performed BS-Seq with (Crassostrea gigas larvae OA (2011) bisulfite sequencing and LSU C.virginica Oil Spill MBD BS Sequencing) and we’re struggling to align sequences to the C.gigas genome. Granted, the LSU samples are C.virginica, but the C.gigas larvae libraries are not aligning to the C.gigas genome via standard BLASTn or using a dedicated bisulfite mapper (e.g. BS-Map). I’m currently BLASTing a de-novo assembly of the C.gigas larvae OA 400ppm sequencing that Steven made against the NCBI nt DB in an attempt to assess the taxonomic distribution of the sequences we received back. I’ll also try using a different bisulfite mapper, bismark, that Mackenzie Gavery has previously used and has had better results with than BS-Map.
As part of Claire’s project, there’s still some BS-Seq data that would be nice to have to complement the data she generated via microarray. It would be nice to make a decision about how to proceed with the samples. However, part of our decision on how to proceed is governed by the results we get from the two projects above. Why do those two projects impact the decision(s) regarding this project? They impact this project because in the two projects above, we produced our own BS-Seq libraries. This is extremely cost effective. However, if we can’t obtain usable data from doing the library preps in-house, then that means we have to use an external service provider. Using an external company to do this is significantly more expensive. Additionally, not all companies can perform bisulfite treatment, which limits our choices (and, in turn, pricing options) on where to go for sequencing.
When I have some down time, I’ll continue working on migrating my Wikispaces notebook to this notebook. I only have one year left to go and it’d be great is all my notebook entries were here so they’d all be tagged/categorized and, thus, be more searchable. I’d also like to work on adding README files to our plethora of electronic data folders. Having these in place will greatly facilitate the ability of people to quickly and more easily figure out what these folders contain, file formats within those folders, etc. I also have a few computing tips/tricks that I’d like to add to our Github “Code” page. Oh, although this isn’t really lab related, I was asked to teach the Unix shell lesson (or, at least, part of it) at the next Software Carpentry Workshop that Ben Marwick is setting up at UW in early June. So, I’m thinking that I’ll try to incorporate some of the data handling stuff I’ve been tackling in lab in to the lesson I end up teaching. Additionally, going through the Software Carpentry materials will help reinforce some of the “fundamental” tasks that I can do with the shell (like find, cut and grep).
In the lab, I plan on sealing up our nearly overflowing “Broken Glass” box and establishing a new one. I need to autoclave, and dispose of, a couple of very full biohazard bags. I’m also going to vow that I will get Jonathan to finally obtain a successful PCR from his sea pen RNA.