Tag Archives: MBD-Seq

Sequence Data Analysis – LSU C.virginica Oil Spill MBD BS-Seq Data

Performed some rudimentary data analysis on the new, demultiplexed data downloaded earlier today:

2112_lane1_ACAGTG_L001_R1_001.fastq.gz
2112_lane1_ACAGTG_L001_R1_002.fastq.gz
2112_lane1_ATCACG_L001_R1_001.fastq.gz
2112_lane1_ATCACG_L001_R1_002.fastq.gz
2112_lane1_ATCACG_L001_R1_003.fastq.gz
2112_lane1_CAGATC_L001_R1_001.fastq.gz
2112_lane1_CAGATC_L001_R1_002.fastq.gz
2112_lane1_CAGATC_L001_R1_003.fastq.gz
2112_lane1_GCCAAT_L001_R1_001.fastq.gz
2112_lane1_GCCAAT_L001_R1_002.fastq.gz
2112_lane1_TGACCA_L001_R1_001.fastq.gz
2112_lane1_TTAGGC_L001_R1_001.fastq.gz
2112_lane1_TTAGGC_L001_R1_002.fastq.gz

 

Compared total amount of data (in gigabytes) generated from each index. The commands below send the output of the ‘ls -l’ command to awk. Awk sums the file sizes, found in the 5th field ($5) of the ‘ls -l’ command, then prints the sum, divided by 1024^3 to convert from bytes to gigabytes.

Index: ACAGTG

$ls -l 2112_lane1_AC* | awk '{sum += $5} END {print sum/1024/1024/1024}'
1.49652

 

Index: ATCACG

$ls -l 2112_lane1_AT* | awk '{sum += $5} END {print sum/1024/1024/1024}'
3.02269

 

Index: CAGATC

$ls -l 2112_lane1_CA* | awk '{sum += $5} END {print sum/1024/1024/1024}'
3.49797

 

Index: GCCAAT

$ls -l 2112_lane1_GC* | awk '{sum += $5} END {print sum/1024/1024/1024}'
2.21379

 

Index: TGACCA

$ls -l 2112_lane1_TG* | awk '{sum += $5} END {print sum/1024/1024/1024}'
0.687374

 

Index: TTAGGC

$ls -l 2112_lane1_TT* | awk '{sum += $5} END {print sum/1024/1024/1024}'
2.28902

 

Ran FASTQC on the following files downloaded earlier today. The FASTQC command is below. This command runs FASTQC in a for loop over any files that begin with “2212_lane2_C” or “2212_lane2_G” and outputs the analyses to the Arabidopsis folder on Eagle:

$for file in /Volumes/nightingales/C_virginica/2112_lane1_[ATCG]*; do fastqc "$file" --outdir=/Volumes/Eagle/Arabidopsis/; done

 

From within the Eagle/Arabidopsis folder, I renamed the FASTQC output files to prepend today’s date:

$for file in 2112_lane1_[ATCG]*; do mv "$file" "20150413_$file"; done

 

Then, I unzipped the .zip files generated by FASTQC in order to have access to the images, to eliminate the need for screen shots for display in this notebook entry:

$for file in 20150413_2112_lane1_[ATCG]*.zip; do unzip "$file"; done

 

The unzip output retained the old naming scheme, so I renamed the unzipped folders:

$for file in 2112_lane1_[ATCG]*; do mv "$file" "20150413_$file"; done

 

The FASTQC results are linked below:

20150413_2112_lane1_ACAGTG_L001_R1_001_fastqc.html
20150413_2112_lane1_ACAGTG_L001_R1_002_fastqc.html
20150413_2112_lane1_ATCACG_L001_R1_001_fastqc.html
20150413_2112_lane1_ATCACG_L001_R1_002_fastqc.html
20150413_2112_lane1_ATCACG_L001_R1_003_fastqc.html
20150413_2112_lane1_CAGATC_L001_R1_001_fastqc.html
20150413_2112_lane1_CAGATC_L001_R1_002_fastqc.html
20150413_2112_lane1_CAGATC_L001_R1_003_fastqc.html
20150413_2112_lane1_GCCAAT_L001_R1_001_fastqc.html
20150413_2112_lane1_GCCAAT_L001_R1_002_fastqc.html
20150413_2112_lane1_TGACCA_L001_R1_001_fastqc.html
20150413_2112_lane1_TTAGGC_L001_R1_001_fastqc.html
20150413_2112_lane1_TTAGGC_L001_R1_002_fastqc.html

 

Sequence Data – LSU C.virginica Oil Spill MBD BS-Seq Demultiplexed

I had previously contacted Doug Turnbull at the Univ. of Oregon Genomics Core Facility for help demultiplexing this data, as it was initially returned to us as a single data set with “no index” (i.e. barcode) set for any of the libraries that were sequenced. As it turns out, when multiplexed libraries are sequenced using the Illumina platform, an index read step needs to be “enabled” on the machine for sequencing. Otherwise, the machine does not perform the index read step (since it wouldn’t be necessary for a single library). Surprisingly, the sample submission form for the Univ. of Oregon Genomics Core Facility  doesn’t request any information regarding whether or not a submitted sample has been multiplexed. However, by default, they enable the index read step on all sequencing runs. I provided them with the barcodes and they demultiplexed them after the fact.

I downloaded the new, demultiplexed files to Owl/nightingales/C_virginica:

lane1_ACAGTG_L001_R1_001.fastq.gz
lane1_ACAGTG_L001_R1_002.fastq.gz
lane1_ATCACG_L001_R1_001.fastq.gz
lane1_ATCACG_L001_R1_002.fastq.gz
lane1_ATCACG_L001_R1_003.fastq.gz
lane1_CAGATC_L001_R1_001.fastq.gz
lane1_CAGATC_L001_R1_002.fastq.gz
lane1_CAGATC_L001_R1_003.fastq.gz
lane1_GCCAAT_L001_R1_001.fastq.gz
lane1_GCCAAT_L001_R1_002.fastq.gz
lane1_TGACCA_L001_R1_001.fastq.gz
lane1_TTAGGC_L001_R1_001.fastq.gz
lane1_TTAGGC_L001_R1_002.fastq.gz

Notice that the file names now contain the corresponding index!

Renamed the files, to append the order number to the beginning of the file names:

$for file in lane1*; do mv "$file" "2112_$file"; done

New file names:

2112_lane1_ACAGTG_L001_R1_001.fastq.gz
2112_lane1_ACAGTG_L001_R1_002.fastq.gz
2112_lane1_ATCACG_L001_R1_001.fastq.gz
2112_lane1_ATCACG_L001_R1_002.fastq.gz
2112_lane1_ATCACG_L001_R1_003.fastq.gz
2112_lane1_CAGATC_L001_R1_001.fastq.gz
2112_lane1_CAGATC_L001_R1_002.fastq.gz
2112_lane1_CAGATC_L001_R1_003.fastq.gz
2112_lane1_GCCAAT_L001_R1_001.fastq.gz
2112_lane1_GCCAAT_L001_R1_002.fastq.gz
2112_lane1_TGACCA_L001_R1_001.fastq.gz
2112_lane1_TTAGGC_L001_R1_001.fastq.gz
2112_lane1_TTAGGC_L001_R1_002.fastq.gz

Updated the checksums.md5 file to include the new files (the command is written to exclude the previously downloaded files that are named “2112_lane1_NoIndex_”; the [^N] regex excludes any files that have a capital ‘N’ at that position in the file name):

$for file in 2112_lane1_[^N]*; do md5 "$file" >> checksums.md5; done

Updated the readme.md file to reflect the addition of these new files.

 

Bisulfite NGS Library – LSU C.virginica Oil Spill MBD Bisulfite DNA Sequencing Submission

Combined the following libraries in equal quantities (17ng each) to create a single, multiplexed sample for sequencing (LSU_Oil_01):

  • HB2 – 1 (ATCACG)
  • HB16 – 3 (TTAGGC)
  • HB30 – 4 (TGACCA)
  • NB3 – 5 (ACAGTG)
  • NB6 – 6 (GCCAAT)
  • NB11 – 7 (CAGATC)

Quantified pooled libraries using the Quant-iT dsDNA BR Kit (Invitrogen) with a FLx800 plate reader (BioTek). Used 1μL of the pooled sample, run in duplicate. Used 1uL of standards, run in duplicate.

Results:

pooled libraries = 6.575ng/μL

Will submit to University of Oregon Genomics Core Facility for 100bp, single end Illumina HiSeq2500 sequencing. They need 10nM of sample. For a library with average size range of 300-400bp, this requires a sample volume of 20uL with a concentration of 2.28ng/μL in a solution of 0.1% Tween20 in Buffer EB (Qiagen).

Combined 6.94μL of pooled libraries with 13.06 of 0.1% Tween20/EB solution.

Submitted sample LSU_Oil_01 to University of Oregon Genomics Core Facility via O/N FedEx on dry ice. Sample was assigned order # 2112.

Bisulfite NGS Library Prep – LSU C.virginica Oil Spill Bisulfite DNA and Emma’s C.gigas Larvae OA Bisulfite DNA

Constructed next generation libraries (Illumina) using the bisulfite-treated DNA from yesterday using the EpiNext Post-Bisulfite DNA Library Preparation Kit – Illumina (Epigentek). Samples were processed according to the manufacturer’s protocol up to Section 8 (Library Amplification) with the following changes:

– Skipped Section 7.1 (recommended to do so in the protocol due to low quantity of input DNA)

Samples were stored O/N @ -20C.

dA Tailing Master Mix

10x Tailing Buffer 1.5uL x 17.6 = 26.4uL

Klenow 1uL x 17.6 = 17.6uL

H2O 0.5uL x 17.6 = 8.8uL

Add 3uL of master mix to each sample

Adaptor Ligation

2x Ligation Buffer 17uL x 17.6 – 299.2uL

T4 DNA Ligase 1uL x 17.6uL = 17.6uL

Adaptors 1uL x 17.6 = 17.6uL

Added 19uL of master mix to each sample

dsDNA Conversion Master Mix

5x Conversion Buffer 4uL x 17.6 = 70.4uL

C.P. 2uL x 17.6 = 35.2uL

H2O 3uL x 18.6 = 52.8uL

Add 9uL of master mix to each sample

End Repair

10x Buffer 2uL x 17.6 = 35.2uL

Enzyme 1uL x 17.6 = 17.6uL

H2O 5uL x 17.6 = 88uL

Added 8uL of master mix to each sample

Bisulfite Conversion – LSU C.virginica Oil Spill MBD DNA and Emma’s C.gigas Larvae OA DNA

Performed bisulfite conversion on MBD DNA samples from LSU C.virginica oil spill samples (see 201411202 and 20141126) and Emma’s C.gigas larvae OA DNA samples (see 20141121) with the Methylamp DNA Modification Kit (Epigentek).

Added 4uL of H2O to each of Emma’s DNA samples to bring them up to 24uL.

Samples were processed according to the manufacturer’s protocol.

Samples were eluted with 10uL of Solution R6 and stored @ -20C.

EtOH Precipitation – LSU C.virginica Oil Spill MBD Continued (from 20141126)

Precipitation was continued according to the MethylMiner Methylated DNA Enrichment Kit (Invitrogen). Since I will need sample volumes of 24uL for the subsequent bisulfite conversion, I resuspended the samples in 29uL of water (will use 2.5uL x 2 reps for quantification).

Samples to be quantified:

NC = non-captured (i.e. non-methylated)

E = eluted (i.e. methylated)

  • HB2 NC
  • HB5 NC
  • HB16 NC
  • HB30 NC
  • NB3 NC
  • NB6 NC
  • NB11 NC
  • NB21 NC
  • HB2 E
  • HB5 E
  • HB16 E
  • HB30 E
  • NB3 E
  • NB6 E
  • NB11 E
  • NB21 E
  • Control NC
  • Control E

Samples were quantified using the Quant-IT BS Kit (Invitrogen) with a plate reader (BioTek). All samples were run in duplicate. Used 2.5uL of each sample for quantification.

Samples were stored in @ -20C (FTR 209) in the bisulfite seq box created by Claire for this project.

Results:

20141202_LSU_Virginica_MBD:

https://docs.google.com/spreadsheets/d/1NrrVmYsUQcstnrt4583mYN2PeVav54luyFvVUEkcjWE/edit?usp=sharing

Methylated DNA Enrichment (MBD) – LSU C.virginica Oil Spill gDNA

Enrichment was performed using the MethylMiner Methylated DNA Enrichment Kit (Invitrogen) according to the manufacturer’s protocol with the following changes:

– Used 25uL of Dynabeads M-280 (10uL/ug of input DNA) and 15uL of MBD-Biotin Protein (7uL/ug of input DNA).

– Followed the corresponding instructions for the volumes listed above and for quantities of input DNA > 1ug – 10ug

– A single elution with 2000mM NaCl was performed

– EtOH precipitation: Samples were incubated over the long weekend at -80C.

Gel – Sheared LSU C.virginica Oil Spill gDNA (from yesterday)

Ran ~250ng of sheared C.virginica gDNA from yesterday’s shearing.

Results:

Ladder used: O’GeneRuler 100bp Ladder (ThermoFisher)

The shearing is, surprisingly, very inconsistent across the samples. The target average fragment size was ~350bp. However, most of these samples are <250bp. The MethylMiner Kit (Invitrogen) suggests that an average fragment length of 100 – 200bp is ideal for short-read high-throughput sequencing, but we’re going to perform a bisulfite conversion on these which will result in some additional fragmentation, further reducing the average fragment size. Will proceed with methylated DNA enrichment.

DNA Shearing – LSU C.virginica Oil Spill gDNA

Used the remainder of the “sheared” samples (see today’s earlier entry; ~2750ng). Brought the volumes up to 80uL and transferred to 0.5mL snap cap tubes. The volume of 80uL was selected because it’s above the minimum volume required for shearing in 0.5mL tubes (10uL according to the Biorupter 300 manual) and the MethylMiner Kit (Invitrogen) requires the input DNA volume to be <= 80uL.

DNA was sheared with the following parameters:

Low power

30 cycles of:

30s on

30s off

Target average fragment size is ~350bp.

See tomorrow (20141126) for the gel.

Gel – Sheared gDNA

Ran ~250ng (out of 3000ng, according to Claire) of LSU C.virginica oil spill gDNA on a gel that was previously sheared by Claire to verify that shearing was successful.

Ran unsheared side-by-side with sheared gDNA for comparison.

Note: HB16 and NB3 did not have any unsheared gDNA left in their tubes, so nothing was run on a gel.

Results:

Ladder used: O’GeneRuler 100bp Ladder (ThermoFisher)

Well, it’s rather obvious that the initial shearing did NOT work. Will re-shear the samples.

UPDATE: Looking at the Biorupter (Diagenode) manual, it turns out that shearing samples in a 1.5mL tube (in which these were sheared) requires a minimum volume of 100uL. All the samples were far below this minimum volume. Additionally, the recommendations in the manual to reach the target size range are significantly longer (30 – 40 cycles) than what was applied (4 cycles). The combination of these two factors are likely the reason that shearing didn’t take place.