Tag Archives: 2bRAD

Read Mapping – Olympia oyster 2bRAD Data with Bowtie2 (on Mox)

Per Steven’s request, mapped our Olympia oyster 2bRAD data.

Mapped to:

This was run on our Mox computing node.

Slurm script: 20180515_oly_2bRAD_bowtie2_mapping.sh

The script is far too long to paste here, due to the shear number of input files. However, here’s a snippet to show the command and options that were used:


/gscratch/srlab/programs/bowtie2-2.3.4.1-linux-x86_64/bowtie2 \
--threads 24 \
--no-unal \
--score-min L,16,1 \
--local \
-L 16 \
-S /gscratch/srlab/sam/outputs/20180515_oly_2bRAD_bowtie2_mapping/20180515_oly_2bRAD_bowtie2_mapping.sam \
-x /gscratch/srlab/sam/data/O_lurida/oly_genome_assemblies/20180515_oly_bowtie2_pbjelly_sjw_01_genome_index/pbjelly_sjw_01_ref \
-U 

See the linked Slurm script above for the entire thing.


RESULTS:

Output folder:

SAM file (104GB)

Mapping summary:


20180515_oly_2bRAD_bowtie2_mapping$ cat slurm-180337.out 
729797535 reads; of these:
  729797535 (100.00%) were unpaired; of these:
    273989476 (37.54%) aligned 0 times
    310581308 (42.56%) aligned exactly 1 time
    145226751 (19.90%) aligned >1 times
62.46% overall alignment rate

Data Management – O.lurida 2bRAD Dec2015 Undetermined FASTQ files

An astute observation by Katherine Silliman revealed that the FASTQ files I had moved to our high-throughput sequencing archive on our server Owl, only had two FASTQ files labeled as “undetermined”. Based on the number of lanes we had sequenced, we should have had many more. Turns out that the undetermined FASTQ files that were present in different sub-folders of the Genewiz project data were not uniquely named. Thus, when I moved them over (via a bash script), the undetermined files were continually overwritten, until we were left with only two FASTQ files labeled as undetermined.

So, I re-downloaded the entire project folder from Genewiz servers and renamed the FASTQ files labeled as undetermined and then copied them over to the archive on Owl:

http://owl.fish.washington.edu/nightingales/O_lurida/2bRAD_Dec2015/

I also zipped up the raw data project from Genewiz and moved it to the same archive location and updated the checksum.md5 and readme.md files.

Details can be found in the Jupyter (iPython) notebook below.

Jupyter Notebook file: 20160308_find_rename_2bRAD_undetermined_fastqs.ipynb

Notebook Viewer: 20160308_find_rename_2bRAD_undetermined_fastqs.ipynb

 

 

Data Received – Oly 2bRAD Illumina Sequencing from Genewiz

The data was made available to use on 20151224 and took two days to download.

The full list of samples (and the individual samples/libraries/indexes) submitted to Genewiz for this project by Katherine Silliman & me can be seen here (Google Sheet): White_BS1511196_R2_barcodes

The data supplied were all of the Illumina output files (currently not entirely sure where/how we want to store all of this, but we’ll probably want to use them for attempting our own demultiplexing since there were a significant amount of reads that Genewiz was unable to demultiplex), in addition to demultiplexed FASTQ files. The FASTQ files were buried in inconvenient locations, and there are over 300 of them, so I used the power of the command line to find them and copy them to a single location: http://owl.fish.washington.edu/nightingales/O_lurida/2bRAD_Dec2015/

Find and copy all FASTQ files:

find /run/user/1000/gvfs/smb-share\:server\=owl.fish.washington.edu\,share\=home/ -name '*.fastq.*' -exec cp -n '{}' /run/user/1000/gvfs/smb-share\:server\=owl.fish.washington.edu\,share\=web/nightingales/O_lurida/ \;

Code explanation:

find
  • Command line program used for searching for files
/run/user/1000/gvfs/smb-share\:server\=owl.fish.washington.edu\,share\=home/ 
  • Location of the files I wanted to search through. The path looks a little crazy because I was working remotely and had the server share mounted.
-name '*.fastq.*'
  • The name argument tells the find command to look for filenames that have “.fastq” in them.
-exec cp -n '{}'
  • The exec option tells the find command to execute a subsequent action upon finding a match. In this case, I’m using the copy command (cp) and telling the program not to overwrite (clobber, -n) any duplicate files.
/run/user/1000/gvfs/smb-share\:server\=owl.fish.washington.edu\,share\=web/nightingales/O_lurida/2bRAD_Dec2015 \;
  • The location where I want the matched files copied.

I created a readme file in the directory directory with these files: readme.md

I wanted to add some information about the project to the readme file, like total number of sequencing reads generated and the number of reads in each FASTQ file.

Here’s how to count the total of all reads generated in this project

totalreads=0; for i in *.gz; do linecount=`gunzip -c "$i" | wc -l`; readcount=$((linecount/4)); totalreads=$((readcount+totalreads)); done; echo $totalreads

Total Reads: 588,396,334

Code explanation:

totalreads=0;
  • Creates variable called “totalreads” and initializes value to 0.
for i in *.gz;
  • Initiates a for loop to process any filenames that end with “.gz”. The FASTQ files have been compressed with gzip and end with the .gz extension.
do linecount=
  • Creates variable called “linecount” that stores the results of the following command:
`gunzip -c "$i" | wc -l`;
  • Unzips the files ($i) to stdout (-c) instead of actually uncompressing them. This is piped to the word count command, with the line flag (wc -l) to count the number of lines in the files.
readcount=$((linecount/4));
  • Divides the value stored in linecount by 4. This is because an entry for a single Illumina read comprises four lines. This value is stored in the “readcount” variable.
totalreads=$((readcount+totalreads));
  • Adds the readcount for the current file and adds the value to totalreads.
done;
  • End the for loop.
echo $totalreads
  • Prints the value of totalreads to the screen.

Next, I wanted to generate list of the FASTQ files and corresponding read counts, and append this information to the readme file.

for i in *.gz; do linecount=`gunzip -c "$i" | wc -l`; readcount=$(($linecount/4)); printf "%s\t%s\n%s\t\t\n" "$i" "$readcount" >> readme.md; done

Code explanation:

for i in *.gz; do linecount=`gunzip -c "$i" | wc -l`; readcount=$(($linecount/4));
  • Same for loop as above that calculates the number of reads in each FASTQ file.
printf "%s\t%s\n\n" "$i" "$readcount" >> readme.md;
  • This formats the the printed output. The “%s\t%s\n\n” portion prints the value in $i as a string (%s), followed by a tab (\t), followed by the value in $readcount as a string (%s), followed by two consecutive newlines (\n\n) to provide an empty line between the entries. See the readme file linked above to see how the output looks.
>> readme.md; done
  • This appends the result from each loop to the readme.md file and ends the for loop (done).

qPCR – Oly RAD-Seq Library Quantification

After yesterday’s attempt at quantification revealed insufficient dilution of the libraries, I repeated the qPCRs using 1:100000 dilutions of each of the libraries. Used the KAPA Illumina Quantification Kit (KAPA Biosystems) according to the manufacturer’s protocol.

Made 1:100000 dilutions of each library were made with NanoPure H2O.

Ran all samples, including standards, in triplicate on the Roberts Lab Opticon2 (BioRad).

Plate set up and master mix can be found here: 20151117_qPCR_plate_layout_Oly_RAD.JPG

 

Results:

qPCR Data File (Opticon2): Sam_20151117_100745.tad

qPCR Data (Google Sheet): 20151117_RAD_qPCR_data

Overall, the new dilutions worked well, with all the library samples coming up between Ct 9 – 15, which is well within the range of the standard curve.

Manually adjusted the baseline threshold to be above any background fluorescence (see images below).

All samples, except Oly RAD 30, exhibit two peaks in the melt curve indicating contaminating primer dimers. Additionally, the peak heights appear to be roughly equivalent. Can we use this fact to effectively “halve” the concentration of our sample to make a rough estimate of library-only PCR products?

 

Here are the calculated library concentrations, based on the KAPA Biosystems formulas

Library Library Stock Conc. (nM) Stock Halved (nM)
Oly RAD 02 46.70 23.35
Oly RAD 03 79.35 39.67
Oly RAD 04 61.35 30.67
Oly RAD 06 30.61 15.30
Oly RAD 07 477.05 238.53
Oly RAD 08 46.32 23.16
Oly RAD 14 224.91 112.46
Oly RAD 17 24.56 12.28
Oly RAD 23 49.56 24.78
Oly RAD 30 11.19  NA

 

Amplification plots of standard curve samples:

 

 

Melt curve plots of standard curve samples. Shows expected “shoulder” to the left of the primary peak:

 

 

 

Amplification plots of RAD library samples:

 

 

Melt curve plots of RAD library samples. Peak on the right corresponds to primer dimer. Peak heights between primer dimer and desired PCR product are nearly equivalent for each respective sample, suggesting that each product is contributing equally to the fluorescence generated in the reactions:

 

 

Melt curve plot of Oly RAD library 30. Notice there’s only a single peak due to the lack of primer dimers in this sample:

qPCR – Oly RAD-Seq Library Quantification

The final step before sequencing these 2bRAD libraries is to quantify them. Used the KAPA Illumina Quantification Kit (KAPA Biosystems) according to the manufacturer’s protocol.

Made 1:4 dilutions of each library to use as template.

Ran all samples, including standards, in triplicate on the Roberts Lab Opticon2 (BioRad).

Plate set up and master mix can be found here: 20151116_qPCR_plate_layout_Oly_RAD.JPG

 

Results:

qPCR Data File (TAD): Sam_20151116_144718.tad

The take home messages from this qPCR are this:

  • The amplification plots that are pushed up against the left side of the graph (essentially at ~ cycle 1) are all of the libraries. A 1:4 dilution was insufficient to have the libraries amplify within the range of the standard curve.
  • All libraries except one (Oly RAD Library 30) have detectable levels of primer dimer. This confounds library quantification (because both the intended PCR product and the primer dimers contribute to the fluorescence accumulation), as well as potentially interfering with the subsequent Illumina sequencing (primer dimers will be sequenced and contain no insert sequence).

Will repeat the qPCR with more appropriately diluted libraries.

See the info below for more deets on this run.

 

 

Default analysis settings need to be adjusted to account for how early the standard curve comes up. Otherwise, the Opticon software sets the baseline incorrectly:

 

 

 

The KAPA Quantification Kit indicates that the baseline calculations need to be extended to cycles 1 through 3. This allows the software to set the baseline threshold correctly:

 

 

 

Melt curve analysis of the standard curve shows the expected profile – slight hump leading into the peak:

 

 

 

Melt curve analysis of the libraries. Dual peaks indicate primer dimer contamination:

 

 

Melt curve analysis of Oly RAD Library 30. Shows the desired single peak, suggesting library is free of primer dimers:

Gel Extraction – Oly RAD-Seq Prep Scale PCR

Extracted the PCR products from the gel slices from 20151113 using the QIAQuick Gel Extraction Kit (Qiagen) according to the manufacturer’s protocol. Substituted MiniElute columns so that I could elute with a smaller volume than what is used in the QIAQuick standard protocol.

Samples were eluted with 20μL of Buffer EB.

Will quantify these libraries via qPCR.

Restriction Digest – Oly gDNA for RAD-seq w/AlfI

The previous attempt at making these RAD libraries failed during the prep-scale PCR, likely due to a discrepancy in the version of the Meyer Lab protocol I was following, so I have to start at the beginning to try to make these libraries again.

Since the input DNA is so degraded, I’ve repeated this using 9μg of input DNA (instead of the recommended 1.2μg). This should increase the number of available cleavage sites for AlfI, thus improving the number of available ligation sites for the adaptors.

Used a subset (10 samples) from the Ostrea lurida gDNA isolated 20150916 to prepare RAD libraries.

Followed the 2bRAD protocol (PDF) developed by Eli Meyer’s lab.

Prepared 9.0μg of each of the following samples in a volume of 9.5μL:

Google Sheet: 20151028_RADseq_DNA_calcs

 

 

Prepared master mix for restriction enzyme reaction:

REAGENT SINGLE REACTION (μL) x11
DNA 9.5 NA
10x Buffer R 1.2μL 13.2μL
150μM SAM 0.8μL 8.8μL
AlfI 0.5μL 5.5μL

 

Combined 2.5μL of the master mix with 9.5μL of each DNA sample in 0.5mL snap cap tubes. Incubated @ 37C O/N in thermal cycler (PTC-200; no heated lid).