Tag Archives: QUAST

Assembly Stats – Geoduck Hi-C Final Assembly Comparison

We received the final geoduck genome assembly data from Phase Genomics, in which they updated the assembly by performing some manual curation:

geoduck_manual_scaffolds.fasta

There are additional assembly files that provide some additional assembly data. See the following directory:

20180822_phase_genomics_geoduck_Results/geoduck_manual/

Actual sequencing data and two previous assemblies were previously received on 20180421.

All assembly data (both old and new) from Phase Genomics was downloaded in full from the Google Drive link provided by them and stored here on Owl:

20180822_phase_genomics_geoduck_Results/

Ran Quast to compare all three assemblies provided (command run on Swoose):


/home/sam/software/quast-4.5/quast.py \
-t 24 \
--labels 20180403_pga,20180421_pga,20180810_geo_manual \
/mnt/owl/Athaliana/20180421_geoduck_hi-c/Results/geoduck_roberts results 2018-04-03 11:05:41.596285/PGA_assembly.fasta \ /mnt/owl/Athaliana/20180421_geoduck_hi-c/Results/geoduck_roberts results 2018-04-21 18:09:04.514704/PGA_assembly.fasta \ /mnt/owl/Athaliana/20180822_phase_genomics_geoduck_Results/geoduck_manual/geoduck_manual_scaffolds.fasta

Results:

Quast output folder: results_2018_08_23_07_38_28/

Quast report (HTML): results_2018_08_23_07_38_28/report.html

Assembly – Geoduck Hi-C Assembly Subsetting

0000-0002-2747-368X

Steven asked me to create a couple of subsets of our Phase Genomics Hi-C geoduck genome assembly (pga_02):

Contigs >10kbp
Contigs >30kbp

I used pyfaidx on Roadrunner and the following commands:

faidx --size-range 10000,100000000 PGA_assembly.fasta > PGA_assembly_10k_plus.fasta

faidx --size-range 30000,100000000 PGA_assembly.fasta > PGA_assembly_30k_plus.fasta

Ran Quast afterwards to get stats on the new FastA files just to confirm that the upper cutoff value was correct and didn’t get rid of the largest contig(s).

Results:

faidx Output folder: 20180512_geoduck_fasta_subsets/

10kbp contigs (FastA): 20180512_geoduck_fasta_subsets/PGA_assembly_10k_plus.fasta

30kbp contigs (FastA): 20180512_geoduck_fasta_subsets/PGA_assembly_30k_plus.fasta

Quast output folder: results_2018_05_14_06_26_26/

Quast report (HTML): results_2018_05_14_06_26_26/report.html

Everything looks good. The main thing I wanted to confirm by running Quast was that the largest contig in each subset was the same as the original PGA assembly (95,480,635bp.

Assembly & Stats – SparseAssembler (k95) on Geoduck Sequence Data > Quast for Stats

0000-0002-2747-368X

Had a successful assembly with SparseAssembler k101, but figured I’d just tweak the kmer setting and throw it in the queue and see how it compares; minimal effort/time needed.

Initiatied an assembly run using SparseAssembler on our Mox HPC node on all of our geoduck genomic sequencing data:

Kmer size set to 95.

Slurm script: 20180423_sparse_assembler_kmer95_geoduck_slurm.sh

After the run finished, I copied the files to our server (Owl) and then ran Quast on my computer to gather some assembly stats, using the following command:


/home/sam/software/quast-4.5/quast.py \
-t 24 \
--labels 20180423_sparse_k95 \
/mnt/owl/Athaliana/20180423_sparseassembler_kmer95_geoduck/Contigs.txt \

Results:

SparseAssembler output folder: 20180423_sparseassembler_kmer95_geoduck/

SparseAsembler assembley (FastA; 15GB): 20180423_sparseassembler_kmer95_geoduck/Contigs.txt

Quast output folder: quast_results/results_2018_05_10_15_04_07

Quast report (HTML): quast_results/results_2018_05_10_15_04_07/report.html

I’ve embedded the Quast HTML report below, but it may be easier to view by using the link above.

Well, it’s remarkable how different this is than the previous SparseAssembler with k101 setting!

This assembly doesn’t have a single contig >50,000bp, while the previous one has four contigs over that threshold!

Definitely shows what a large impact the kmer setting in assembly software can have on the final assembly!

Assembly Stats – Geoduck Genome Assembly Comparisons w/Quast – SparseAssembler, SuperNova, Hi-C

0000-0002-2747-368X

Steven requested a comparison of geoduck genome assemblies.

Ran the following Quast command:

/home/sam/software/quast-4.5/quast.py \
-t 24 \
--labels 20180405_sparse_kmer101,supernova_pseudohap_duck4-p,20180421_Hi-C \
/mnt/owl/Athaliana/20180405_sparseassembler_kmer101_geoduck/Contigs.txt \
/mnt/owl//halfshell/bu-mox/analyses/0305b/duck4-p.fasta.gz \
/mnt/owl/Athaliana/20180419_geoduck_hi-c/Results/geoduck_roberts\ results\ 2018-04-21\ 18\:09\:04.514704/PGA_assembly.fasta

Results:

Quast output folder: results_2018_04_30_08_00_42/

Quast report (HTML): results_2018_04_30_08_00_42/report.html

The data’s pretty interesting and cool!

SparseAssembler has over 2x the amount of data (in bas pairs), yet produces the worst assembly.

SuperNova and Hi-C assemblies are very close in nearly all categories. This isn’t surprising, as the SuperNova assembly was used as a reference assembly for the Hi-C assembly.

However, the Hi-C assembly is insanely better than the SuperNova assembly! For example:

Largest contig is ~7x larger than the SuperNova assembly.
The N50 size is ~243x larger than the SuperNova assembly!!
L50 is only 18, 46x smaller than the SuperNova assembly!

This is pretty amazing, honestly. Even more amazing is that this data was sent over to us as some “preliminary” data for us to take a peak at!

Assembly Stats – Geoduck Hi-C Assembly Comparison

0000-0002-2747-368X

Ran the following Quast command to compare the two geoduck assemblies provided to us by Phase Genomics:

/home/sam/software/quast-4.5/quast.py \
-t 24 \
--labels 20180403_pga,20180421_pga \
/mnt/owl/Athaliana/20180421_geoduck_hi-c/Results/geoduck_roberts\ results\ 2018-04-03\ 11\:05\:41.596285/PGA_assembly.fasta \
/mnt/owl/Athaliana/20180421_geoduck_hi-c/Results/geoduck_roberts\ results\ 2018-04-21\ 18\:09\:04.514704/PGA_assembly.fasta

Results:

Quast Output folder: results_2018_04_30_11_16_04/

Quast report (HTML): results_2018_04_30_11_16_04/report.html

The two assemblies are nearly identical. Interesting…

Assembly Stats – Quast Stats for Geoduck SparseAssembler Job from 20180405

0000-0002-2747-368X

The geoduck genome assembly started 20180405 completed this weekend.

This assembly utilized the BGI data and all of the Illumina project data (NMP and NovaSeq) with a kmer 101 setting.

I ran Quast to gather some assembly stats, using the following command:

python /home/sam/software/quast-4.5/quast.py -t 24 /mnt/owl/Athaliana/20180405_sparseassembler_kmer101_geoduck/Contigs.txt

Results:

Quast output folder: results_2018_04_15_13_45_03/

Quast report (HTML): results_2018_04_15_13_45_03/report.html

I’ve embedded the Quast HTML report below, but it may be easier to view by using the link above.

Assembly – Geoduck NovaSeq using SparseAssembler kmer = 101

0000-0002-2747-368X

The prior run used a kmer size of 61, and the resulting assembly was rather poor (small N50).

For this run, I arbitrarily increased the kmer size to 101, in hopes that this will improve the assembly.

The job was run on our Mox node.

Here’s the batch script to initiate the job:

20180322_SparseAssembler_novaseq_geoduck_slurm.sh


#SBATCH --job-name=20180322_sparse_assembler_geo_novaseq
## Allocation Definition 
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes (We only get 1, so this is fixed)
#SBATCH --nodes=1   
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=30-00:00:00
## Memory per node
#SBATCH --mem=500G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/scrubbed/samwhite/20180322_SparseAssembler_novaseq_geoduck

/gscratch/srlab/programs/SparseAssembler/SparseAssembler \
LD 0 \
NodeCovTh 1 \
EdgeCovTh 0 \
k 101 \
g 15 \
PathCovTh 100 \
GS 2200000000 \
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/AD002_S9_L001_R1_001_val_1_val_1.fastq \
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/AD002_S9_L001_R2_001_val_2_val_2.fastq \
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/AD002_S9_L002_R1_001_val_1_val_1.fastq \
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/AD002_S9_L002_R2_001_val_2_val_2.fastq \
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR005_S4_L001_R1_001_val_1_val_1.fastq \
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR005_S4_L001_R2_001_val_2_val_2.fastq \
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR005_S4_L002_R1_001_val_1_val_1.fastq \
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR005_S4_L002_R2_001_val_2_val_2.fastq \
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR006_S3_L001_R1_001_val_1_val_1.fastq \
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR006_S3_L001_R2_001_val_2_val_2.fastq \
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR006_S3_L002_R1_001_val_1_val_1.fastq \
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR006_S3_L002_R2_001_val_2_val_2.fastq \
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR012_S1_L001_R1_001_val_1_val_1.fastq \
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR012_S1_L001_R2_001_val_2_val_2.fastq \
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR012_S1_L002_R1_001_val_1_val_1.fastq \
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR012_S1_L002_R2_001_val_2_val_2.fastq \
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR013_AD013_S2_L001_R1_001_val_1_val_1.fastq \
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR013_AD013_S2_L001_R2_001_val_2_val_2.fastq \
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR013_AD013_S2_L002_R1_001_val_1_val_1.fastq \
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR013_AD013_S2_L002_R2_001_val_2_val_2.fastq \
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR014_AD014_S5_L001_R1_001_val_1_val_1.fastq \
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR014_AD014_S5_L001_R2_001_val_2_val_2.fastq \
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR014_AD014_S5_L002_R1_001_val_1_val_1.fastq \
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR014_AD014_S5_L002_R2_001_val_2_val_2.fastq \
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR015_AD015_S6_L001_R1_001_val_1_val_1.fastq \
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR015_AD015_S6_L001_R2_001_val_2_val_2.fastq \
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR015_AD015_S6_L002_R1_001_val_1_val_1.fastq \
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR015_AD015_S6_L002_R2_001_val_2_val_2.fastq \
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR019_S7_L001_R1_001_val_1_val_1.fastq \
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR019_S7_L001_R2_001_val_2_val_2.fastq \
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR019_S7_L002_R1_001_val_1_val_1.fastq \
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR019_S7_L002_R2_001_val_2_val_2.fastq \
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR021_S8_L001_R1_001_val_1_val_1.fastq \
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR021_S8_L001_R2_001_val_2_val_2.fastq \
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR021_S8_L002_R1_001_val_1_val_1.fastq \
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR021_S8_L002_R2_001_val_2_val_2.fastq

Results

Output folder: 20180322_SparseAssembler_novaseq_geoduck/

This completed much more quickly than the previous run (kmer = 61). The previous assembly took ~10 days, while this assembly completed in ~4 days!

The primary output file of interest is this FASTA file:

Contigs.txt (12GB)

In order to get a rough idea of how this assembly looks, I ran it through Quast Version: 4.5, 15ca3b9:

python software/quast-4.5/quast.py \ -t 16 /mnt/owl/Athaliana/20180322_SparseAssembler_novaseq_geoduck/Contigs.txt

Quast output folder: results_2018_03_27_08_25_52/

Here’re the stats on the assembly:

Quast output (text): results_2018_03_27_08_25_52/report.txt

Quast output (HTML):results_2018_03_27_08_25_52/report.html

This is definitely a better assembly than the kmer = 61 assembly.

N50 = 1149

Also, there’s a single, large contig of 56,361bp, and 54 contigs > 25,000bp. This is good.

Admittedly, I’m a little surprised (and, disappointed) the N50 is as small as it is. However, we have a pretty decent assembly on our hands!

Since SparseAssembler seems to actually run (and, relatively quickly), I’m very tempted to just throw ALL of our geoduck data at it and see how it turns out…

Assembly – Geoduck NovaSeq using SparseAssembler (TL;DR – it worked!)

0000-0002-2747-368X

The prior attempt using SparseAssembler failed due to a kmer size that was deemed too large.

For this run, I arbitrarily reduced the kmer size by ~half (k 61) in hopes that this will just get through an assembly. We can potentially explore the effects of kmer size on assemblies if/when this runs and depending no how the assembly looks.

The job was run on our Mox node.

Here’s the batch script to initiate the job:

#!/bin/bash
## Job Name
#SBATCH --job-name=20180313_sparse_assembler_geo_novaseq
## Allocation Definition
#SBATCH --account=srlab
#SBATCH --partition=srlab
## Resources
## Nodes (We only get 1, so this is fixed)
#SBATCH --nodes=1
## Walltime (days-hours:minutes:seconds format)
#SBATCH --time=30-00:00:00
## Memory per node
#SBATCH --mem=500G
##turn on e-mail notification
#SBATCH --mail-type=ALL
#SBATCH --mail-user=samwhite@uw.edu
## Specify the working directory for this job
#SBATCH --workdir=/gscratch/scrubbed/samwhite/20180312_SparseAssembler_novaseq_geoduck

/gscratch/srlab/programs/SparseAssembler/SparseAssembler 
LD 0 
NodeCovTh 1 
EdgeCovTh 0 
k 61 
g 15 
PathCovTh 100 
GS 2200000000 
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/AD002_S9_L001_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/AD002_S9_L001_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/AD002_S9_L002_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/AD002_S9_L002_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR005_S4_L001_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR005_S4_L001_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR005_S4_L002_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR005_S4_L002_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR006_S3_L001_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR006_S3_L001_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR006_S3_L002_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR006_S3_L002_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR012_S1_L001_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR012_S1_L001_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR012_S1_L002_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR012_S1_L002_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR013_AD013_S2_L001_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR013_AD013_S2_L001_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR013_AD013_S2_L002_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR013_AD013_S2_L002_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR014_AD014_S5_L001_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR014_AD014_S5_L001_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR014_AD014_S5_L002_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR014_AD014_S5_L002_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR015_AD015_S6_L001_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR015_AD015_S6_L001_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR015_AD015_S6_L002_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR015_AD015_S6_L002_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR019_S7_L001_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR019_S7_L001_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR019_S7_L002_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR019_S7_L002_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR021_S8_L001_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR021_S8_L001_R2_001_val_2_val_2.fastq 
i1 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR021_S8_L002_R1_001_val_1_val_1.fastq 
i2 /gscratch/scrubbed/samwhite/20180129_trimmed_again/NR021_S8_L002_R2_001_val_2_val_2.fastq

Results

Output folder: 20180312_SparseAssembler_novaseq_geoduck

IT WORKED!!! At last; we have an assembly of the geoduck NovaSeq data!! It took ~10days to complete.

The primary output file of interest is this FASTA file:

Contigs.txt (11GB)

In order to get a rough idea of how this assembly looks, I ran it through Quast Version: 4.5, 15ca3b9:

python software/quast-4.5/quast.py \ -t 16 /mnt/owl/Athaliana/20180312_SparseAssembler_novaseq_geoduck/Contigs.txt

Quast output folder: results_2018_03_22_08_12_12

Here’re the stats on the assembly:

Quast output (text): results_2018_03_22_08_12_12/report.txt

Quast output (HTML):results_2018_03_22_08_12_12/report.html

Overall, the assembly doesn’t look great. The N50 = 645 is really, really low. One would hope for a much large number for a quality assembly. As it stands, this assembly is comprised of many small contigs.

Looks like we’ll have to fiddle with the kmer size used for SparseAssembler and see if we can improve upon this.

Despite that, it’s an accomplishment to finally get any sort of assembler to run to completion for this data set!

Assembly Comparisons – Oly Assemblies Using Quast

0000-0002-2747-368X

I ran Quast to compare all of our current Olympia oyster genome assemblies.

See Jupyter Notebook in Results section for Quast execution.

Results:

Output folder: http://owl.fish.washington.edu/Athaliana/quast_results/results_2018_01_16_10_08_35/

Heatmapped table of results: http://owl.fish.washington.edu/Athaliana/quast_results/results_2018_01_16_10_08_35/report.html

Very enlightening!

After all the difficulties with PB Jelly, it has produced the most large contigs. However, it does also have the highest quantity and rate of N’s of all the assemblies produced to date.

BEST OF:

# contigs (>= 50000 bp): pbjelly_sjw_01 (894)
Largest Contig: redundans_sjw_02 (322,397bp)
Total Length: pbjelly_sjw_01 (1,180,563,613bp)
Total Length (>=50,000bp): pbjelly_sjw_01 (57,741,906bp)
N50: redundans_sjw_03 (17,679bp)

Jupyter Notebook (GitHub): 20180116_swoose_oly_assembly_comparisons_quast.ipynb

Genome Assembly – Olympia Oyster Illumina & PacBio Using PB Jelly w/BGI Scaffold Assembly

0000-0002-2747-368X

After another attempt to fix PB Jelly, I ran it again.

We’ll see how it goes this time…

Re-ran this using the BGI Illumina scaffolds FASTA.

PB Jelly Documentation

Here’s a brief rundown of how this was run:

Default PB Jelly settings (including default settings for blasr).
Illumina reference FASTA: BGI Illumina scaffolds FASTA
PacBio reads for mapping
Protocol.xml file needed for PB Jelly to run

See the Jupyter Notebook for full details of run (see Results section below).

Results:

Output folder: http://owl.fish.washington.edu/Athaliana/20171130_oly_pbjelly/

Output FASTA file: http://owl.fish.washington.edu/Athaliana/20171130_oly_pbjelly/jelly.out.fasta

Quast assessment of output FASTA:

Assembly	jelly.out
# contigs (>= 0 bp)	696946
# contigs (>= 1000 bp)	159429
# contigs (>= 5000 bp)	68750
# contigs (>= 10000 bp)	35320
# contigs (>= 25000 bp)	7048
# contigs (>= 50000 bp)	894
Total length (>= 0 bp)	1253001795
Total length (>= 1000 bp)	1140787867
Total length (>= 5000 bp)	932263178
Total length (>= 10000 bp)	691523275
Total length (>= 25000 bp)	261425921
Total length (>= 50000 bp)	57741906
# contigs	213264
Largest contig	194507
Total length	1180563613
GC (%)	36.57
N50	12433
N75	5983
L50	26241
L75	60202
# N’s per 100 kbp	6580.58

Have added this assembly to our Olympia oyster genome assemblies table.

This took an insanely long time to complete (nearly six weeks)!!! After some internet searching, I’ve found a pontential solution to this and have initiated another PB Jelly run to see if it will run faster. Regardless, it’ll be interesting to see how the results compare from two independent runs of PB Jelly.

Jupyter Notebook (GitHub): 20171130_emu_pbjelly.ipynb

Sam's Notebook

University of Washington – Fishery Sciences – Roberts Lab

Tag Archives: QUAST

Assembly Stats – Geoduck Hi-C Final Assembly Comparison

Results:

Assembly – Geoduck Hi-C Assembly Subsetting

Results:

Assembly & Stats – SparseAssembler (k95) on Geoduck Sequence Data > Quast for Stats

Results:

Assembly Stats – Geoduck Genome Assembly Comparisons w/Quast – SparseAssembler, SuperNova, Hi-C

Results:

Assembly Stats – Geoduck Hi-C Assembly Comparison

Results:

Assembly Stats – Quast Stats for Geoduck SparseAssembler Job from 20180405

Results:

Assembly – Geoduck NovaSeq using SparseAssembler kmer = 101

Results

Assembly – Geoduck NovaSeq using SparseAssembler (TL;DR – it worked!)

Results

Assembly Comparisons – Oly Assemblies Using Quast

Results:

BEST OF:

Genome Assembly – Olympia Oyster Illumina & PacBio Using PB Jelly w/BGI Scaffold Assembly

Results: