Tag Archives: redundans

Genome Assembly – Olympia oyster Illumina & PacBio Reads Using Redundans

Had problems with Docker and Jupyter Notebook inexplicably dying and deleting all the files in the working directory of the Jupyter Notebook (which also happened to be the volume mounted in the Docker container).

So, I ran this on my computer, but didn’t have Jupyter installed (yet).

This utilized the Canu contigs file (FASTA) that I generated on 20171018.

Here’s the input command:

sudo python /home/sam/software/redundans/redundans.py -t 24 -l m130619_081336_42134_c100525122550000001823081109281326_s1_p0.fastq.gz m170211_224036_42134_c101073082550000001823236402101737_s1_X0_filtered_subreads.fastq.gz m170301_100013_42134_c101174162550000001823269408211761_s1_p0_filtered_subreads.fastq.gz m170301_162825_42134_c101174162550000001823269408211762_s1_p0_filtered_subreads.fastq.gz m170301_225711_42134_c101174162550000001823269408211763_s1_p0_filtered_subreads.fastq.gz m170308_163922_42134_c101174252550000001823269408211742_s1_p0_filtered_subreads.fastq.gz m170308_230815_42134_c101174252550000001823269408211743_s1_p0_filtered_subreads.fastq.gz m170315_001112_42134_c101169372550000001823273008151717_s1_p0_filtered_subreads.fastq.gz m170315_063041_42134_c101169382550000001823273008151700_s1_p0_filtered_subreads.fastq.gz m170315_124938_42134_c101169382550000001823273008151701_s1_p0_filtered_subreads.fastq.gz m170315_190851_42134_c101169382550000001823273008151702_s1_p0_filtered_subreads.fastq.gz -i 151114_I191_FCH3Y35BCXX_L1_wHAIPI023992-37_1.fq.gz 151114_I191_FCH3Y35BCXX_L1_wHAIPI023992-37_2.fq.gz 151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_1.fq.gz 151114_I191_FCH3Y35BCXX_L2_wHAMPI023991-66_2.fq.gz 151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_1.fq.gz 151118_I137_FCH3KNJBBXX_L5_wHAXPI023905-96_2.fq.gz 160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCABDLAAPEI-62_1.fq.gz 160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCABDLAAPEI-62_2.fq.gz 160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCACDTAAPEI-75_1.fq.gz 160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCACDTAAPEI-75_2.fq.gz 160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCABDLAAPEI-62_1.fq.gz 160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCABDLAAPEI-62_2.fq.gz 160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCACDTAAPEI-75_1.fq.gz 160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCACDTAAPEI-75_2.fq.gz 160103_I137_FCH3V5YBBXX_L5_WHOSTibkDCAADWAAPEI-74_1.fq.gz 160103_I137_FCH3V5YBBXX_L5_WHOSTibkDCAADWAAPEI-74_2.fq.gz 160103_I137_FCH3V5YBBXX_L6_WHOSTibkDCAADWAAPEI-74_1.fq.gz 160103_I137_FCH3V5YBBXX_L6_WHOSTibkDCAADWAAPEI-74_2.fq.gz -f 20171018_oly_pacbio.contigs.fasta -o /home/data/20171024_docker_oly_redundans_01/

This completed in just over 19hrs.

Copied output files to Owl: http://owl.fish.washington.edu/Athaliana/20171024_docker_oly_redundans_01/

Here’s the desired output file (FASTA): scaffolds.reduced.fa

Will add to our genome assemblies table.

Ran Quast on 20171103 for some assembly stats.

Quast output is here: http://owl.fish.washington.edu/Athaliana/quast_results/results_2017_11_03_22_43_06/

Genome Assembly – Olympia oyster Redundans/Canu vs. Redundans/Racon

Decided to compare the Redundans using Canu as reference and Redundans using Racon as reference. Both reference assemblies were just our PacBio data.

Jupyter notebook (GitHub): 20171005_docker_oly_redundans.ipynb

Notebook is also embedded at the end of this post.

Results:

It should be noted that the paired reads for each of the BGI mate-pair Illumina data did not assemble, just like last time I used them:

  • 160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCABDLAAPEI-62_2.fq.gz
  • 160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCACDTAAPEI-75_2.fq.gz
  • 160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCABDLAAPEI-62_2.fq.gz
  • 160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCACDTAAPEI-75_2.fq.gz
  • 160103_I137_FCH3V5YBBXX_L5_WHOSTibkDCAADWAAPEI-74_2.fq.gz
  • 160103_I137_FCH3V5YBBXX_L6_WHOSTibkDCAADWAAPEI-74_2.fq.gz

Redundans with Canu is better, suggesting that the Canu assembly is the better of the two PacBio assemblies (which we had already suspected).

QUAST comparison using default settings:

Interactive link:http://owl.fish.washington.edu/Athaliana/quast_results/results_2017_10_06_22_21_06/report.html

QUAST comparison using –scaffolds setting:

Interactive link: http://owl.fish.washington.edu/Athaliana/quast_results/results_2017_10_06_22_27_26/report.html

Genome Assembly – Olympia Oyster Redundans with Illumina + PacBio

Redundans should assemble both Illumina and PacBio data, so let’s do that.

Sean had previously performed this – twice actually:

It wasn’t entirely clear how he had run Redundans the first time and the second time he used his Platinus contig FASTA file as the necessary reference assembly when running Redundans.

Since he had produced a good looking assembly from PacBio data using Canu, I decided to give Redundans a rip using that assembly.

I then compared all three Redundans runs using QUAST.

Jupyter notebook (GitHub): 20171004_docker_oly_redundans.ipynb

Notebook is also embedded at the bottom of this notebook entry (but, it should be easier to view at the link provided above).

Of note, is that Redundans didn’t find any alignments for the paired reads for each of the BGI mate-pair Illumina data:

  • 160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCABDLAAPEI-62_2.fq.gz
  • 160103_I137_FCH3V5YBBXX_L3_WHOSTibkDCACDTAAPEI-75_2.fq.gz
  • 160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCABDLAAPEI-62_2.fq.gz
  • 160103_I137_FCH3V5YBBXX_L4_WHOSTibkDCACDTAAPEI-75_2.fq.gz
  • 160103_I137_FCH3V5YBBXX_L5_WHOSTibkDCAADWAAPEI-74_2.fq.gz
  • 160103_I137_FCH3V5YBBXX_L6_WHOSTibkDCAADWAAPEI-74_2.fq.gz

First, I ran QUAST with the default settings:

Interactive link: http://owl.fish.washington.edu/Athaliana/quast_results/results_2017_10_05_14_21_50/report.html

Using that Canu assembly with Redundans certainly seems to results in a better assembly.

Decided to run QUAST with the –scaffolds option to see what happened:

Interactive link: http://owl.fish.washington.edu/Athaliana/quast_results/results_2017_10_05_14_28_51/report.html

The scaffolds with the “Ns” removed from them are appended with “_broken” – meaning the scaffolds were broken apart into contigs. Things are certainly cleaner when using the --scaffolds option, however, as far as I can tell, QUAST doesn’t actually generate a FASTA file with the “_broken” scaffolds!