Tag Archives: Illumina

Data Received – Geoduck RRBS Sequencing Data

Hollie Putnam prepared some reduced representation bisulfite Illumina libraries and had them sequenced by Genewiz.

The data was downloaded and MD5 checksums were generated.

IMPORTANT: MD5 checksums have not yet been provided by Genewiz! We cannot verify the integrity of these data files at this time! Checksums have been requested. Will create new notebook entry (and add link to said entry) once the checksums have been received and we can compare them.

UPDATE 20161230 – Have received and verified checksums.

 

Jupyter notebook: 20161229_docker_genewiz_geoduck_RRBS_data.ipynb

RNAseq Data Receipt – Geoduck Gonad RNA 100bp PE Illumina

Received notification that the samples sent on 20150601 for RNAseq were completed.

Downloaded the following files from the GENEWIZ servers using FileZilla FTP and stored them on our server (owl/web/nightingales/P_generosa):

Geo_Pool_F_GGCTAC_L006_R1_001.fastq.gz
Geo_Pool_F_GGCTAC_L006_R2_001.fastq.gz
Geo_Pool_M_CTTGTA_L006_R1_001.fastq.gz
Geo_Pool_M_CTTGTA_L006_R2_001.fastq.gz

Generated md5 checksums for each file:

$for i in *; do md5 $i >> checksums.md5; done

Made a readme.md file for the directory.

Sample Submission – Geoduck Gonad for RNA-seq

Prepared two pools of geoduck RNA for RNA-seq (Illumina HiSeq2500, 100bp, PE) with GENEWIZ, Inc.

I pooled a set of female and a set of male RNAs that had been selected by Steven based on the Bioanalyzer results from Friday.

The female RNA pool used 210ng of each sample, with the exception being sample #08. This sample used 630ng. The reason for this was due to the fact that there weren’t any other female samples to use from this developmental time point. The two other developmental time points each had three samples contributing to the pool. So, three times the quantity of the other individual samples was used to help equalize the time point contribution to the pooled sample. Additionally, 630ng used the entirety of sample #08.

The male RNA pool used 315ng of each sample. This number differs from the 210ng used for the female RNAs so that the two pools would end up with the same total quantity of RNA. However, now that I’ve typed this, this doesn’t matter since the libraries will be equalized before being run on the Illumina HiSeq2500. Oh well. As long as each sample in each pool contributed to the total amount of RNA, then it’s all good.

The two pools were shipped O/N on dry ice.

  • Geo_pool_M
  • Geo_pool_F

Calculations (Google Sheet): 20150601_Geoduck_GENEWIZ_calcs

Bioinformatics – Trimmomatic/FASTQC on C.gigas Larvae OA NGS Data

Previously trimmed the first 39 bases of sequence from reads from the BS-Seq data in an attempt to improve our ability to map the reads back to the C.gigas genome. However, Mac (and Steven) noticed that the last ~10 bases of all the reads showed a steady increase in the %G, suggesting some sort of bias (maybe adaptor??):

Although I didn’t mention this previously, the figure above also shows an odd “waves” pattern that repeats in all bases except for G. Not sure what to think of that…

Quick summary of actions taken (specifics are available in Jupyter notebook below):

  • Trim first 39 bases from all reads in all raw sequencing files.
  • Trim last 10 bases from all reads in raw sequencing files
  • Concatenate the two sets of reads (400ppm and 1000ppm treatments) into single FASTQ files for Steven to work with.

Raw sequencing files:

Notebook Viewer: 20150521_Cgigas_larvae_OA_Trimmomatic_FASTQC

Jupyter (IPython) notebook: 20150521_Cgigas_larvae_OA_Trimmomatic_FASTQC.ipynb

 

 

Output files

Trimmed, concatenated FASTQ files
20150521_trimmed_2212_lane2_400ppm_GCCAAT.fastq.gz
20150521_trimmed_2212_lane2_1000ppm_CTTGTA.fastq.gz

 

FASTQC files
20150521_trimmed_2212_lane2_400ppm_GCCAAT_fastqc.html
20150521_trimmed_2212_lane2_400ppm_GCCAAT_fastqc.zip

20150521_trimmed_2212_lane2_1000ppm_CTTGTA_fastqc.html
20150521_trimmed_2212_lane2_1000ppm_CTTGTA_fastqc.zip

 

Example of FASTQC analysis pre-trim:

 

 

Example FASTQC post-trim (from 400ppm data):

 

Trimming has removed the intended bad stuff (inconsistent sequence in the first 39 bases and rise in %G in the last 10 bases). Sequences are ready for further analysis for Steven.

However, we still see the “waves” pattern with the T, A and C. Additionally, we still don’t know what caused the weird inconsistencies, nor what sequence is contained therein that might be leading to that. Will contact the sequencing facility to see if they have any insight.

Illumina RNAseq Library Construction – 32 C.gigas Individuals

Took heat-fragmented RNA provided by Emma (see Emma’s Notebook, 7/3/2011) and proceeded to make first strand cDNA, as described in the Eli Meyer protocol for Illumina HiSeq. Master mix calcs are here. Samples were stored @ -20C after the reverse transcription and library construction will be continued tomorrow.

Oligo Reconstitution – Illumina RNAseq Library Oligos and Barcodes

Reconstituted all of the oligos and barcodes for library construction in TE (pH = 8.0) to a final concentration of 100uM. Created 10uM working stocks of all oligos and barcodes. All samples (stocks and working stocks) are stored @ -80C in their own box (Illumina Library Oligos & Barcodes) due to the fact that one of the oligos is an RNA oligo and requires storage at -80C.