Category Archives: 2bRAD Library Tests for Sequencing at Genewiz

Data Management – O.lurida 2bRAD Dec2015 Undetermined FASTQ files

An astute observation by Katherine Silliman revealed that the FASTQ files I had moved to our high-throughput sequencing archive on our server Owl, only had two FASTQ files labeled as “undetermined”. Based on the number of lanes we had sequenced, we should have had many more. Turns out that the undetermined FASTQ files that were present in different sub-folders of the Genewiz project data were not uniquely named. Thus, when I moved them over (via a bash script), the undetermined files were continually overwritten, until we were left with only two FASTQ files labeled as undetermined.

So, I re-downloaded the entire project folder from Genewiz servers and renamed the FASTQ files labeled as undetermined and then copied them over to the archive on Owl:

I also zipped up the raw data project from Genewiz and moved it to the same archive location and updated the checksum.md5 and files.

Details can be found in the Jupyter (iPython) notebook below.

Jupyter Notebook file: 20160308_find_rename_2bRAD_undetermined_fastqs.ipynb

Notebook Viewer: 20160308_find_rename_2bRAD_undetermined_fastqs.ipynb



Data Received – Oly 2bRAD Illumina Sequencing from Genewiz

The data was made available to use on 20151224 and took two days to download.

The full list of samples (and the individual samples/libraries/indexes) submitted to Genewiz for this project by Katherine Silliman & me can be seen here (Google Sheet): White_BS1511196_R2_barcodes

The data supplied were all of the Illumina output files (currently not entirely sure where/how we want to store all of this, but we’ll probably want to use them for attempting our own demultiplexing since there were a significant amount of reads that Genewiz was unable to demultiplex), in addition to demultiplexed FASTQ files. The FASTQ files were buried in inconvenient locations, and there are over 300 of them, so I used the power of the command line to find them and copy them to a single location:

Find and copy all FASTQ files:

find /run/user/1000/gvfs/smb-share\:server\\,share\=home/ -name '*.fastq.*' -exec cp -n '{}' /run/user/1000/gvfs/smb-share\:server\\,share\=web/nightingales/O_lurida/ \;

Code explanation:

  • Command line program used for searching for files
  • Location of the files I wanted to search through. The path looks a little crazy because I was working remotely and had the server share mounted.
-name '*.fastq.*'
  • The name argument tells the find command to look for filenames that have “.fastq” in them.
-exec cp -n '{}'
  • The exec option tells the find command to execute a subsequent action upon finding a match. In this case, I’m using the copy command (cp) and telling the program not to overwrite (clobber, -n) any duplicate files.
/run/user/1000/gvfs/smb-share\:server\\,share\=web/nightingales/O_lurida/2bRAD_Dec2015 \;
  • The location where I want the matched files copied.

I created a readme file in the directory directory with these files:

I wanted to add some information about the project to the readme file, like total number of sequencing reads generated and the number of reads in each FASTQ file.

Here’s how to count the total of all reads generated in this project

totalreads=0; for i in *.gz; do linecount=`gunzip -c "$i" | wc -l`; readcount=$((linecount/4)); totalreads=$((readcount+totalreads)); done; echo $totalreads

Total Reads: 588,396,334

Code explanation:

  • Creates variable called “totalreads” and initializes value to 0.
for i in *.gz;
  • Initiates a for loop to process any filenames that end with “.gz”. The FASTQ files have been compressed with gzip and end with the .gz extension.
do linecount=
  • Creates variable called “linecount” that stores the results of the following command:
`gunzip -c "$i" | wc -l`;
  • Unzips the files ($i) to stdout (-c) instead of actually uncompressing them. This is piped to the word count command, with the line flag (wc -l) to count the number of lines in the files.
  • Divides the value stored in linecount by 4. This is because an entry for a single Illumina read comprises four lines. This value is stored in the “readcount” variable.
  • Adds the readcount for the current file and adds the value to totalreads.
  • End the for loop.
echo $totalreads
  • Prints the value of totalreads to the screen.

Next, I wanted to generate list of the FASTQ files and corresponding read counts, and append this information to the readme file.

for i in *.gz; do linecount=`gunzip -c "$i" | wc -l`; readcount=$(($linecount/4)); printf "%s\t%s\n%s\t\t\n" "$i" "$readcount" >>; done

Code explanation:

for i in *.gz; do linecount=`gunzip -c "$i" | wc -l`; readcount=$(($linecount/4));
  • Same for loop as above that calculates the number of reads in each FASTQ file.
printf "%s\t%s\n\n" "$i" "$readcount" >>;
  • This formats the the printed output. The “%s\t%s\n\n” portion prints the value in $i as a string (%s), followed by a tab (\t), followed by the value in $readcount as a string (%s), followed by two consecutive newlines (\n\n) to provide an empty line between the entries. See the readme file linked above to see how the output looks.
>>; done
  • This appends the result from each loop to the file and ends the for loop (done).

qPCR – Oly RAD-Seq Library Quantification

After yesterday’s attempt at quantification revealed insufficient dilution of the libraries, I repeated the qPCRs using 1:100000 dilutions of each of the libraries. Used the KAPA Illumina Quantification Kit (KAPA Biosystems) according to the manufacturer’s protocol.

Made 1:100000 dilutions of each library were made with NanoPure H2O.

Ran all samples, including standards, in triplicate on the Roberts Lab Opticon2 (BioRad).

Plate set up and master mix can be found here: 20151117_qPCR_plate_layout_Oly_RAD.JPG



qPCR Data File (Opticon2): Sam_20151117_100745.tad

qPCR Data (Google Sheet): 20151117_RAD_qPCR_data

Overall, the new dilutions worked well, with all the library samples coming up between Ct 9 – 15, which is well within the range of the standard curve.

Manually adjusted the baseline threshold to be above any background fluorescence (see images below).

All samples, except Oly RAD 30, exhibit two peaks in the melt curve indicating contaminating primer dimers. Additionally, the peak heights appear to be roughly equivalent. Can we use this fact to effectively “halve” the concentration of our sample to make a rough estimate of library-only PCR products?


Here are the calculated library concentrations, based on the KAPA Biosystems formulas

Library Library Stock Conc. (nM) Stock Halved (nM)
Oly RAD 02 46.70 23.35
Oly RAD 03 79.35 39.67
Oly RAD 04 61.35 30.67
Oly RAD 06 30.61 15.30
Oly RAD 07 477.05 238.53
Oly RAD 08 46.32 23.16
Oly RAD 14 224.91 112.46
Oly RAD 17 24.56 12.28
Oly RAD 23 49.56 24.78
Oly RAD 30 11.19  NA


Amplification plots of standard curve samples:



Melt curve plots of standard curve samples. Shows expected “shoulder” to the left of the primary peak:




Amplification plots of RAD library samples:



Melt curve plots of RAD library samples. Peak on the right corresponds to primer dimer. Peak heights between primer dimer and desired PCR product are nearly equivalent for each respective sample, suggesting that each product is contributing equally to the fluorescence generated in the reactions:



Melt curve plot of Oly RAD library 30. Notice there’s only a single peak due to the lack of primer dimers in this sample:

qPCR – Oly RAD-Seq Library Quantification

The final step before sequencing these 2bRAD libraries is to quantify them. Used the KAPA Illumina Quantification Kit (KAPA Biosystems) according to the manufacturer’s protocol.

Made 1:4 dilutions of each library to use as template.

Ran all samples, including standards, in triplicate on the Roberts Lab Opticon2 (BioRad).

Plate set up and master mix can be found here: 20151116_qPCR_plate_layout_Oly_RAD.JPG



qPCR Data File (TAD): Sam_20151116_144718.tad

The take home messages from this qPCR are this:

  • The amplification plots that are pushed up against the left side of the graph (essentially at ~ cycle 1) are all of the libraries. A 1:4 dilution was insufficient to have the libraries amplify within the range of the standard curve.
  • All libraries except one (Oly RAD Library 30) have detectable levels of primer dimer. This confounds library quantification (because both the intended PCR product and the primer dimers contribute to the fluorescence accumulation), as well as potentially interfering with the subsequent Illumina sequencing (primer dimers will be sequenced and contain no insert sequence).

Will repeat the qPCR with more appropriately diluted libraries.

See the info below for more deets on this run.



Default analysis settings need to be adjusted to account for how early the standard curve comes up. Otherwise, the Opticon software sets the baseline incorrectly:




The KAPA Quantification Kit indicates that the baseline calculations need to be extended to cycles 1 through 3. This allows the software to set the baseline threshold correctly:




Melt curve analysis of the standard curve shows the expected profile – slight hump leading into the peak:




Melt curve analysis of the libraries. Dual peaks indicate primer dimer contamination:



Melt curve analysis of Oly RAD Library 30. Shows the desired single peak, suggesting library is free of primer dimers:

Gel Extraction – Oly RAD-Seq Prep Scale PCR

Extracted the PCR products from the gel slices from 20151113 using the QIAQuick Gel Extraction Kit (Qiagen) according to the manufacturer’s protocol. Substituted MiniElute columns so that I could elute with a smaller volume than what is used in the QIAQuick standard protocol.

Samples were eluted with 20μL of Buffer EB.

Will quantify these libraries via qPCR.

PCR – Oly RAD-seq Prep Scale PCR

Continuing with the RAD-seq library prep. Following the Meyer Lab 2bRAD protocol.
After determining the minimum number of PCR cycles to run to generate a visible, 166bp band on a gel yesterday, ran a full library “prep scale” PCR.


Template 40 NA
ILL-HT1 (1μM) 5 55
ILL-BC# (1μM) 5 NA
NanoPure H2O 5 55
dNTPs (1mM) 20 220
ILL-LIB1 (10μM) 2 22
ILL-LIB2 (10μM) 2 22
5x Q5 Reaction Buffer 20 220
Q5 DNA Polymerase 1 11
TOTAL 100 550


Combined the following for PCR reactions:

  • 55μL PCR master mix
  • 40μL ligation mix
  • 5μL of ILL-BC# (1μM) – The barcode number and the respective sample are listed below.


Oly RAD 02  1  CGTGAT
Oly RAD 03  2  ACATCG
Oly RAD 04  3  GCCTAA
Oly RAD 06  4  TGGTCA
Oly RAD 07  5  CACTGT
Oly RAD 08  6  ATTGGC
Oly RAD 14  7  GATCTG
Oly RAD 17  8  TCAAGT
Oly RAD 23  9  CTGATC
Oly RAD 30 10 AAGCTA


Cycling was performed on a PTC-200 (MJ Research) with a heated lid:

Initial Denaturation
  • 98
  • 30
17 cycles
  • 98
  • 60
  • 72
  • 5
  • 20
  • 10


After cycling, added 16μL of 6x loading dye to each sample.

Loaded 10μL of ladder on each of the two gels.



Things looked fine. Excised the bands from each sample indicated by the green arrow. Before and after gel images show regions excised. Will purify the bands and quantify library yields.

PCR – Oly RAD-seq Test-scale PCR

Continuing with the RAD-seq library prep. Following the Meyer Lab 2bRAD protocol.

Prior to generating full-blown libraries, we needed to run a “test-scale” PCR to identify the minimum number of cycles needed to produce the intended product size (166bp).

I ran PCR reactions on a subset (Sample #: 2, 3, 17, & 30) of the 10 samples that I performed adaptor ligations on 20151029.

PCR reactions were set up on ice in 0.5mL PCR tubes.

Template 8 NA
NanoPure H2O 1 4.4
dNTPs (1mM) 4 17.6
ILL-LIB1 (10μM) 0.4 1.76
ILL-LIB2 (10μM) 0.4 1.76
ILL-HT1 (1μM) 1 4.4
ILL-BC1 (1μM) 1 4.4
5x Q5 Reaction Buffer 4 17.6
Q5 DNA Polymerase 0.2 0.88
TOTAL 20 52.8


Combined 12μL of master mix with 8μL of the ligation reaction from earlier today.

Cycling was performed on a PTC-200 (MJ Research) with a heated lid:

Initial Denaturation
  • 98
  • 30
27 cycles
  • 98
  • 60
  • 72
  • 5
  • 20
  • 10

We’re following the “1/4 reduced representation” aspect of the protocol. As such, 5μL of each reaction was pulled immediately after the extension (72C – machine was paused) of cycles 12, 17, 22, & 27 in order to determine the ideal number of cycles to use. Also ran the ligation reactions (labeled “Ligations” on the gel below) of the samples as a pre-PCR comparison. Treated them the same as the PCR reactions: mixed 8μL of the ligation with 12μL of H2O, used 5μL of that mix to load on gel.

These samples were run on a 1x modified TAE 1.2% agarose gel (w/EtBr).



Gel image denoting sample numbers within each cycle number. Green arrow indicates the expected migration of our target band size of 166bp.

Looks like cycle 17 is the minimum cycle number with which we begin to see a consistent ~166bp band. Will continue on with the “prep-scale” PCR using 17 cycles.

Adaptor Ligation – Oly AlfI-Digested gDNA for RAD-seq

Continued to follow the 2bRAD protocol (PDF) developed by Eli Meyer’s lab.

Digested DNA from yesterday was heat inactivated for 10mins @ 65C and was not run out on a gel due to the fact that the input gDNA was degraded and a shift in the high molecular weight band (indicating the digestion was successful) would not exist because a high molecular weight band is absent in these samples.


Anneal Adaptors

After preparing the two adaptors below, they were incubated for 10mins @ RT:

  • Adaptor 1 (2μM final concentration of each oligo): 1.5μL of 5ILL-NR (100μM) + 1.5μL of anti-ILL (100μM) + 72μL H2O = 75μL total
  • Adaptor 2 (2μM final concentration of each oligo): 1.5μL of 3ILL-NR (100μM) + 1.5μL of anti-ILL (100μM) + 72μL H2O = 75μL total

After annealing, the adaptors were stored on ice.


Adaptor Ligation

All components were stored on ice. Ligation reactions were prepared on ice and performed in 0.5mL snap cap tubes.

Digested DNA 10 NA
ATP (10mM) 1 11
10x T4 Ligase Buffer 4 44
Adaptor 1 (2μM) 5 55
Adaptor 2 (2μM) 5 55
T4 DNA Ligase 1 11
NanoPure H2O 24 264
TOTAL 50 440

Combined 40μL of the master mix with 10μL of AlfI-digested DNA in a 0.5mL snap cap tube.

Incubated ligation reaction @ 16C O/N in PTC-200 thermal cycler (MJ Research) – no heated lid.

Ligations will be stored @ -20C until I can continue working with them on Tuesday.

Restriction Digest – Oly gDNA for RAD-seq w/AlfI

The previous attempt at making these RAD libraries failed during the prep-scale PCR, likely due to a discrepancy in the version of the Meyer Lab protocol I was following, so I have to start at the beginning to try to make these libraries again.

Since the input DNA is so degraded, I’ve repeated this using 9μg of input DNA (instead of the recommended 1.2μg). This should increase the number of available cleavage sites for AlfI, thus improving the number of available ligation sites for the adaptors.

Used a subset (10 samples) from the Ostrea lurida gDNA isolated 20150916 to prepare RAD libraries.

Followed the 2bRAD protocol (PDF) developed by Eli Meyer’s lab.

Prepared 9.0μg of each of the following samples in a volume of 9.5μL:

Google Sheet: 20151028_RADseq_DNA_calcs



Prepared master mix for restriction enzyme reaction:

DNA 9.5 NA
10x Buffer R 1.2μL 13.2μL
150μM SAM 0.8μL 8.8μL
AlfI 0.5μL 5.5μL


Combined 2.5μL of the master mix with 9.5μL of each DNA sample in 0.5mL snap cap tubes. Incubated @ 37C O/N in thermal cycler (PTC-200; no heated lid).