Bioanalyzer Data – Geoduck RNA from Histology Blocks

I received the Bioanalyzer data back for the geoduck foot RNA samples I submitted 20150422. The two samples were run on the RNA Pico chip assay.

 

Results:

Bioanalzyer 2100 Data File (XAD): SamWhite_Eukaryote Total RNA Pico_2015-04-23_13-04-16.xad

Data file requires 2100_Expert_B0208_SI648_SR2 version of the software (Windows).

Gel Representation

 

Electropherogram

 

The samples look really good! As we’ve seen previously in shellfish RNA, there is a single, prominent rRNA band/peak with very little degradation (smearing and/or no prominent peak/band).

Will proceed with RNA isolation from the remaining histology blocks.

Bioanalyzer Submission – Geoduck Gonad RNA from Histology Blocks

Submitted 3μL (~75ng) of RNA from each of the two gonad samples isolated from foot tissue embedded in paraffin histology blocks 20150408 (to assess quality of RNA) to Jesse Tsai at University of Washington Department of Environmental and Occupational Health Science Functional Genomics Laboratory:

  • Geoduck Block 34
  • Geoduck Block 42

Jesse will determine if the samples should be run on the RNA Pico or the RNA Nano chips.

Quality Trimming – C.gigas Larvae OA BS-Seq Data

Jupyter (IPython) Notebook: 20150414_C_gigas_Larvae_OA_Trimmomatic_FASTQC.ipynb

NBviewer: 20150414_C_gigas_Larvae_OA_Trimmomatic_FASTQC.ipynb

 

Trimmed FASTQC

400ppm Index – GCCAAT

20150414_trimmed_2212_lane2_GCCAAT_L002_R1_001_fastqc.html
20150414_trimmed_2212_lane2_GCCAAT_L002_R1_002_fastqc.html
20150414_trimmed_2212_lane2_GCCAAT_L002_R1_003_fastqc.html
20150414_trimmed_2212_lane2_GCCAAT_L002_R1_004_fastqc.html
20150414_trimmed_2212_lane2_GCCAAT_L002_R1_005_fastqc.html
20150414_trimmed_2212_lane2_GCCAAT_L002_R1_006_fastqc.html

1000ppm Index – CTTGTA

20150414_trimmed_2212_lane2_CTTGTA_L002_R1_001_fastqc.html
20150414_trimmed_2212_lane2_CTTGTA_L002_R1_002_fastqc.html
20150414_trimmed_2212_lane2_CTTGTA_L002_R1_003_fastqc.html
20150414_trimmed_2212_lane2_CTTGTA_L002_R1_004_fastqc.html

 

 

Quality Trimming – LSU C.virginica Oil Spill MBD BS-Seq Data

Jupyter (IPython) Notebook: 20150414_C_virginica_LSU_Oil_Spill_Trimmomatic_FASTQC.ipynb

NBviewer: 20150414_C_virginica_LSU_Oil_Spill_Trimmomatic_FASTQC.ipynb

Trimmed FASTQC

NB3 No oil Index – ACAGTG

20150414_trimmed_2112_lane1_ACAGTG_L001_R1_001_fastqc.html
20150414_trimmed_2112_lane1_ACAGTG_L001_R1_002_fastqc.html

NB6 No oil Index – GCCAAT

20150414_trimmed_2112_lane1_GCCAAT_L001_R1_001_fastqc.html
20150414_trimmed_2112_lane1_GCCAAT_L001_R1_002_fastqc.html

NB11 No oil Index – CAGATC

20150414_trimmed_2112_lane1_CAGATC_L001_R1_001_fastqc.html
20150414_trimmed_2112_lane1_CAGATC_L001_R1_002_fastqc.html
20150414_trimmed_2112_lane1_CAGATC_L001_R1_003_fastqc.html

HB2 25,000ppm oil Index – ATCACG

20150414_trimmed_2112_lane1_ATCACG_L001_R1_001_fastqc.html
20150414_trimmed_2112_lane1_ATCACG_L001_R1_002_fastqc.html
20150414_trimmed_2112_lane1_ATCACG_L001_R1_003_fastqc.html

HB16 25,000ppm oil Index – TTAGGC

20150414_trimmed_2112_lane1_TTAGGC_L001_R1_001_fastqc.html
20150414_trimmed_2112_lane1_TTAGGC_L001_R1_002_fastqc.html

HB30 25,000ppm oil Index – TGACCA

20150414_trimmed_2112_lane1_TGACCA_L001_R1_001_fastqc.html

Sequence Data Analysis – LSU C.virginica Oil Spill MBD BS-Seq Data

Performed some rudimentary data analysis on the new, demultiplexed data downloaded earlier today:

2112_lane1_ACAGTG_L001_R1_001.fastq.gz
2112_lane1_ACAGTG_L001_R1_002.fastq.gz
2112_lane1_ATCACG_L001_R1_001.fastq.gz
2112_lane1_ATCACG_L001_R1_002.fastq.gz
2112_lane1_ATCACG_L001_R1_003.fastq.gz
2112_lane1_CAGATC_L001_R1_001.fastq.gz
2112_lane1_CAGATC_L001_R1_002.fastq.gz
2112_lane1_CAGATC_L001_R1_003.fastq.gz
2112_lane1_GCCAAT_L001_R1_001.fastq.gz
2112_lane1_GCCAAT_L001_R1_002.fastq.gz
2112_lane1_TGACCA_L001_R1_001.fastq.gz
2112_lane1_TTAGGC_L001_R1_001.fastq.gz
2112_lane1_TTAGGC_L001_R1_002.fastq.gz

 

Compared total amount of data (in gigabytes) generated from each index. The commands below send the output of the ‘ls -l’ command to awk. Awk sums the file sizes, found in the 5th field ($5) of the ‘ls -l’ command, then prints the sum, divided by 1024^3 to convert from bytes to gigabytes.

Index: ACAGTG

$ls -l 2112_lane1_AC* | awk '{sum += $5} END {print sum/1024/1024/1024}'
1.49652

 

Index: ATCACG

$ls -l 2112_lane1_AT* | awk '{sum += $5} END {print sum/1024/1024/1024}'
3.02269

 

Index: CAGATC

$ls -l 2112_lane1_CA* | awk '{sum += $5} END {print sum/1024/1024/1024}'
3.49797

 

Index: GCCAAT

$ls -l 2112_lane1_GC* | awk '{sum += $5} END {print sum/1024/1024/1024}'
2.21379

 

Index: TGACCA

$ls -l 2112_lane1_TG* | awk '{sum += $5} END {print sum/1024/1024/1024}'
0.687374

 

Index: TTAGGC

$ls -l 2112_lane1_TT* | awk '{sum += $5} END {print sum/1024/1024/1024}'
2.28902

 

Ran FASTQC on the following files downloaded earlier today. The FASTQC command is below. This command runs FASTQC in a for loop over any files that begin with “2212_lane2_C” or “2212_lane2_G” and outputs the analyses to the Arabidopsis folder on Eagle:

$for file in /Volumes/nightingales/C_virginica/2112_lane1_[ATCG]*; do fastqc "$file" --outdir=/Volumes/Eagle/Arabidopsis/; done

 

From within the Eagle/Arabidopsis folder, I renamed the FASTQC output files to prepend today’s date:

$for file in 2112_lane1_[ATCG]*; do mv "$file" "20150413_$file"; done

 

Then, I unzipped the .zip files generated by FASTQC in order to have access to the images, to eliminate the need for screen shots for display in this notebook entry:

$for file in 20150413_2112_lane1_[ATCG]*.zip; do unzip "$file"; done

 

The unzip output retained the old naming scheme, so I renamed the unzipped folders:

$for file in 2112_lane1_[ATCG]*; do mv "$file" "20150413_$file"; done

 

The FASTQC results are linked below:

20150413_2112_lane1_ACAGTG_L001_R1_001_fastqc.html
20150413_2112_lane1_ACAGTG_L001_R1_002_fastqc.html
20150413_2112_lane1_ATCACG_L001_R1_001_fastqc.html
20150413_2112_lane1_ATCACG_L001_R1_002_fastqc.html
20150413_2112_lane1_ATCACG_L001_R1_003_fastqc.html
20150413_2112_lane1_CAGATC_L001_R1_001_fastqc.html
20150413_2112_lane1_CAGATC_L001_R1_002_fastqc.html
20150413_2112_lane1_CAGATC_L001_R1_003_fastqc.html
20150413_2112_lane1_GCCAAT_L001_R1_001_fastqc.html
20150413_2112_lane1_GCCAAT_L001_R1_002_fastqc.html
20150413_2112_lane1_TGACCA_L001_R1_001_fastqc.html
20150413_2112_lane1_TTAGGC_L001_R1_001_fastqc.html
20150413_2112_lane1_TTAGGC_L001_R1_002_fastqc.html

 

Sequence Data Analysis – C.gigas Larvae OA BS-Seq Data

Compared total amount of data generated from each index. The commands below send the output of the ‘ls -l’ command to awk. Awk sums the file sizes, found in the 5th field ($5) of the ‘ls -l’ command, then prints the sum, divided by 1024^3 to convert from bytes to gigabytes.

Index: CTTGTA

$ ls -l 2212_lane2_[C]* | awk '{sum += $5} END {print sum/1024/1024/1024}'
5.33341

Index: GCCAAT
$ ls -l 2212_lane2_[G]* | awk '{sum += $5} END {print sum/1024/1024/1024}'
7.00596

There’s ~1.4x data in the GCCAAT files.

 

Ran FASTQC on the following files downloaded earlier today:

2212_lane2_CTTGTA_L002_R1_001.fastq.gz
2212_lane2_CTTGTA_L002_R1_002.fastq.gz
2212_lane2_CTTGTA_L002_R1_003.fastq.gz
2212_lane2_CTTGTA_L002_R1_004.fastq.gz
2212_lane2_GCCAAT_L002_R1_001.fastq.gz
2212_lane2_GCCAAT_L002_R1_002.fastq.gz
2212_lane2_GCCAAT_L002_R1_003.fastq.gz
2212_lane2_GCCAAT_L002_R1_004.fastq.gz
2212_lane2_GCCAAT_L002_R1_005.fastq.gz
2212_lane2_GCCAAT_L002_R1_006.fastq.gz

 

The FASTQC command is below. This command runs FASTQC in a for loop over any files that begin with “2212_lane2_C” or “2212_lane2_G” and outputs the analyses to the Arabidopsis folder on Eagle:

$for file in /Volumes/nightingales/C_gigas/2212_lane2_[CG]*; do fastqc "$file" --outdir=/Volumes/Eagle/Arabidopsis/; done

 

From within the Eagle/Arabidopsis folder, I renamed the FASTQC output files to prepend today’s date:

$for file in 2212_lane2_[GC]*; do mv "$file" "20150413_$file"; done

 

Then, I unzipped the .zip files generated by FASTQC in order to have access to the images, to eliminate the need for screen shots for display in this notebook entry:

$for file in 20150413_2212_lane2_[CG]*.zip; do unzip "$file"; done

 

The unzip output retained the old naming scheme, so I renamed the unzipped folders:

$for file in 2212_lane2_[GC]*; do mv “$file” “20150413_$file”; done

 

The FASTQC results are linked below:

20150413_2212_lane2_CTTGTA_L002_R1_001_fastqc.html

20150413_2212_lane2_CTTGTA_L002_R1_002_fastqc.html
20150413_2212_lane2_CTTGTA_L002_R1_003_fastqc.html
20150413_2212_lane2_CTTGTA_L002_R1_004_fastqc.html
20150413_2212_lane2_GCCAAT_L002_R1_001_fastqc.html
20150413_2212_lane2_GCCAAT_L002_R1_002_fastqc.html
20150413_2212_lane2_GCCAAT_L002_R1_003_fastqc.html
20150413_2212_lane2_GCCAAT_L002_R1_004_fastqc.html
20150413_2212_lane2_GCCAAT_L002_R1_005_fastqc.html
20150413_2212_lane2_GCCAAT_L002_R1_006_fastqc.html

 

Sequence Data – C.gigas OA Larvae BS-Seq Demultiplexed

I had previously contacted Doug Turnbull at the Univ. of Oregon Genomics Core Facility for help demultiplexing this data, as it was initially returned to us as a single data set with “no index” (i.e. barcode) set for any of the libraries that were sequenced. As it turns out, when multiplexed libraries are sequenced using the Illumina platform, an index read step needs to be “enabled” on the machine for sequencing. Otherwise, the machine does not perform the index read step (since it wouldn’t be necessary for a single library). Surprisingly, the sample submission form for the Univ. of Oregon Genomics Core Facility  doesn’t request any information regarding whether or not a submitted sample has been multiplexed. However, by default, they enable the index read step on all sequencing runs. I provided them with the barcodes and they demultiplexed them after the fact.

I downloaded the new, demultiplexed files to Owl/nightingales/C_gigas:

lane2_CTTGTA_L002_R1_001.fastq.gz
lane2_CTTGTA_L002_R1_002.fastq.gz
lane2_CTTGTA_L002_R1_003.fastq.gz
lane2_CTTGTA_L002_R1_004.fastq.gz
lane2_GCCAAT_L002_R1_001.fastq.gz
lane2_GCCAAT_L002_R1_002.fastq.gz
lane2_GCCAAT_L002_R1_003.fastq.gz
lane2_GCCAAT_L002_R1_004.fastq.gz
lane2_GCCAAT_L002_R1_005.fastq.gz
lane2_GCCAAT_L002_R1_006.fastq.gz

Notice that the file names now contain the corresponding index!

Renamed the files, to append the order number to the beginning of the file names:

$for file in lane2*; do mv "$file" "2212_$file"; done

New file names:

2212_lane2_CTTGTA_L002_R1_001.fastq.gz
2212_lane2_CTTGTA_L002_R1_002.fastq.gz
2212_lane2_CTTGTA_L002_R1_003.fastq.gz
2212_lane2_CTTGTA_L002_R1_004.fastq.gz
2212_lane2_GCCAAT_L002_R1_001.fastq.gz
2212_lane2_GCCAAT_L002_R1_002.fastq.gz
2212_lane2_GCCAAT_L002_R1_003.fastq.gz
2212_lane2_GCCAAT_L002_R1_004.fastq.gz
2212_lane2_GCCAAT_L002_R1_005.fastq.gz
2212_lane2_GCCAAT_L002_R1_006.fastq.gz

Updated the checksums.md5 file to include the new files (the command is written to exclude the previously downloaded files that are named “2212_lane2_NoIndex_”; the [^N] regex excludes any files that have a capital ‘N’ at that position in the file name):

$for file in 2212_lane2_[^N]*; do md5 "$file" >> checksums.md5; done

Updated the readme.md file to reflect the addition of these new files.

Sequence Data – LSU C.virginica Oil Spill MBD BS-Seq Demultiplexed

I had previously contacted Doug Turnbull at the Univ. of Oregon Genomics Core Facility for help demultiplexing this data, as it was initially returned to us as a single data set with “no index” (i.e. barcode) set for any of the libraries that were sequenced. As it turns out, when multiplexed libraries are sequenced using the Illumina platform, an index read step needs to be “enabled” on the machine for sequencing. Otherwise, the machine does not perform the index read step (since it wouldn’t be necessary for a single library). Surprisingly, the sample submission form for the Univ. of Oregon Genomics Core Facility  doesn’t request any information regarding whether or not a submitted sample has been multiplexed. However, by default, they enable the index read step on all sequencing runs. I provided them with the barcodes and they demultiplexed them after the fact.

I downloaded the new, demultiplexed files to Owl/nightingales/C_virginica:

lane1_ACAGTG_L001_R1_001.fastq.gz
lane1_ACAGTG_L001_R1_002.fastq.gz
lane1_ATCACG_L001_R1_001.fastq.gz
lane1_ATCACG_L001_R1_002.fastq.gz
lane1_ATCACG_L001_R1_003.fastq.gz
lane1_CAGATC_L001_R1_001.fastq.gz
lane1_CAGATC_L001_R1_002.fastq.gz
lane1_CAGATC_L001_R1_003.fastq.gz
lane1_GCCAAT_L001_R1_001.fastq.gz
lane1_GCCAAT_L001_R1_002.fastq.gz
lane1_TGACCA_L001_R1_001.fastq.gz
lane1_TTAGGC_L001_R1_001.fastq.gz
lane1_TTAGGC_L001_R1_002.fastq.gz

Notice that the file names now contain the corresponding index!

Renamed the files, to append the order number to the beginning of the file names:

$for file in lane1*; do mv "$file" "2112_$file"; done

New file names:

2112_lane1_ACAGTG_L001_R1_001.fastq.gz
2112_lane1_ACAGTG_L001_R1_002.fastq.gz
2112_lane1_ATCACG_L001_R1_001.fastq.gz
2112_lane1_ATCACG_L001_R1_002.fastq.gz
2112_lane1_ATCACG_L001_R1_003.fastq.gz
2112_lane1_CAGATC_L001_R1_001.fastq.gz
2112_lane1_CAGATC_L001_R1_002.fastq.gz
2112_lane1_CAGATC_L001_R1_003.fastq.gz
2112_lane1_GCCAAT_L001_R1_001.fastq.gz
2112_lane1_GCCAAT_L001_R1_002.fastq.gz
2112_lane1_TGACCA_L001_R1_001.fastq.gz
2112_lane1_TTAGGC_L001_R1_001.fastq.gz
2112_lane1_TTAGGC_L001_R1_002.fastq.gz

Updated the checksums.md5 file to include the new files (the command is written to exclude the previously downloaded files that are named “2112_lane1_NoIndex_”; the [^N] regex excludes any files that have a capital ‘N’ at that position in the file name):

$for file in 2112_lane1_[^N]*; do md5 "$file" >> checksums.md5; done

Updated the readme.md file to reflect the addition of these new files.

 

RNA Isolation – Geoduck Gonad in Paraffin Histology Blocks

Isolated RNA from geoduck gonad previously preserved with the PAXgene Tissue Fixative and Stabilizer and then embedded in paraffin blocks. See Grace’s notebook for full details on samples and preservation.

 

RNA was isolated from only two samples using the PAXgene Tissue RNA Kit (Qiagen) from the following geoduck sample blocks to test out the kit:

  • 34
  • 42

IMPORTANT:

  • Prior to beginning, I prepared Buffer TR1 by adding 10μL of β-mercaptoethanol (β-ME) to 1000μL of Buffer TR1). This will be good for up to six weeks at RT.
  • Reconstituted DNase I with 550μL of RNase-free H2O. Aliquoted in 100μL volumes and stored @ -20C in the “-20C Kit Components” box.

Five 5μm sections were taken from each block.

Isolated RNA according to the PAXgene Tissue RNA Kit protocol with the following alterations:

  • “Max speed” spins were performed at 19,000g.
  • Tissue disruption was performed with the Disruptor Genie @ 45C for 15mins.
  • Shaking incubation step was performed with Disruptor Genie
  • Samples were eluted with 34μL of Buffer TR4, incubated @ 65C for 5mins, immediately placed on ice and quantified on the Roberts Lab NanoDrop1000.

Samples were stored at -80C in Shellfish RNA Box #5.

NOTE: The spreadsheet linked indicates other samples exist in the slots that I placed these two samples. Will need to update the spreadsheet to be accurate.

Results:

 

 

Looks like the kit worked! Yields are pretty good (~800ng) from each. The 260/280 ratios are great for both samples. Oddly, the 260/230 ratios for the two samples are pretty much polar opposites of each other; not sure why.

Will proceed with the remainder of the samples that were selected by Steven and Brent. Or, maybe I should try to make some cDNA from these RNA samples to verify the integrity of the RNA…

Sequencing Data – C.gigas Larvae OA

Our sequencing data (Illumina HiSeq2500, 100SE) for this project has completed by Univ. of Oregon Genomics Core Facility (order number 2212).

Samples sequenced/pooled for this run:

Sample Treatment Barcode
400ppm 400ppm GCCAAT
1000ppm 1000ppm CTTGTA

 

All code listed below was run on OS X 10.9.5

Ran a bash script called “download.sh” to download all the files. The script contents were:

#!/bin/bash
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_001.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_002.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_003.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_004.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_005.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_006.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_007.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_008.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_009.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_010.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_011.fastq.gz
curl -O http://gcf.uoregon.edu:8080/job/download/2212?fileName=lane2_NoIndex_L002_R1_012.fastq.gz

 

Downloaded all 12 fastq.gz files to Owl/web/nightingales/C_gigas

Renamed all files by removing the beginning of each file name (2112?fileName=) and replacing that with 2212_:

$for file in 2212*lane2_NoIndex_L002_R1_0*; do mv "$file" "${file/#2212?fileName=/2212_}"; done

 

Created a directory readme.md (markdown) file to list & describe directory contents: readme.md

$ls *.gz >> readme.md

Note: In order for the readme file to appear in the web directory listing, the file cannot be all upper-case.

 

Create MD5 checksums for each the files: checkums.md5

$md5 2212* >> checksums.md5