Sam's Notebook » phase genomics http://onsnetwork.org/kubu4 University of Washington - Fishery Sciences - Roberts Lab Thu, 08 Nov 2018 21:47:12 +0000 en-US hourly 1 http://wordpress.org/?v=4.0 Assembly Stats – Geoduck Hi-C Final Assembly Comparison http://onsnetwork.org/kubu4/2018/08/23/assembly-stats-geoduck-hi-c-final-assembly-comparison/ http://onsnetwork.org/kubu4/2018/08/23/assembly-stats-geoduck-hi-c-final-assembly-comparison/#comments Thu, 23 Aug 2018 15:47:04 +0000 http://onsnetwork.org/kubu4/?p=3517

We received the final geoduck genome assembly data from Phase Genomics, in which they updated the assembly by performing some manual curation:

There are additional assembly files that provide some additional assembly data. See the following directory:

Actual sequencing data and two previous assemblies were previously received on 20180421.

All assembly data (both old and new) from Phase Genomics was downloaded in full from the Google Drive link provided by them and stored here on Owl:

Ran Quast to compare all three assemblies provided (command run on Swoose):


/home/sam/software/quast-4.5/quast.py \
-t 24 \
--labels 20180403_pga,20180421_pga,20180810_geo_manual \
/mnt/owl/Athaliana/20180421_geoduck_hi-c/Results/geoduck_roberts results 2018-04-03 11:05:41.596285/PGA_assembly.fasta \ /mnt/owl/Athaliana/20180421_geoduck_hi-c/Results/geoduck_roberts results 2018-04-21 18:09:04.514704/PGA_assembly.fasta \ /mnt/owl/Athaliana/20180822_phase_genomics_geoduck_Results/geoduck_manual/geoduck_manual_scaffolds.fasta

Results:

Quast output folder: results_2018_08_23_07_38_28/

Quast report (HTML): results_2018_08_23_07_38_28/report.html

]]>
http://onsnetwork.org/kubu4/2018/08/23/assembly-stats-geoduck-hi-c-final-assembly-comparison/feed/ 0
Assembly – Geoduck Hi-C Assembly Subsetting http://onsnetwork.org/kubu4/2018/05/12/assembly-geoduck-hi-c-assembly-subsetting/ http://onsnetwork.org/kubu4/2018/05/12/assembly-geoduck-hi-c-assembly-subsetting/#comments Sat, 12 May 2018 22:16:56 +0000 http://onsnetwork.org/kubu4/?p=3332 10kbp Contigs >30kbp I used pyfaidx on Roadrunner and the following commands: faidx --size-range 10000,100000000 PGA_assembly.fasta > PGA_assembly_10k_plus.fasta faidx --size-range 30000,100000000 PGA_assembly.fasta > PGA_assembly_30k_plus.fasta Ran Quast afterwards to get stats on the new FastA files just […]]]>

Steven asked me to create a couple of subsets of our Phase Genomics Hi-C geoduck genome assembly (pga_02):

  • Contigs >10kbp

  • Contigs >30kbp

I used pyfaidx on Roadrunner and the following commands:

faidx --size-range 10000,100000000 PGA_assembly.fasta > PGA_assembly_10k_plus.fasta
faidx --size-range 30000,100000000 PGA_assembly.fasta > PGA_assembly_30k_plus.fasta

Ran Quast afterwards to get stats on the new FastA files just to confirm that the upper cutoff value was correct and didn’t get rid of the largest contig(s).

Results:

faidx Output folder: 20180512_geoduck_fasta_subsets/

10kbp contigs (FastA): 20180512_geoduck_fasta_subsets/PGA_assembly_10k_plus.fasta

30kbp contigs (FastA): 20180512_geoduck_fasta_subsets/PGA_assembly_30k_plus.fasta

Quast output folder: results_2018_05_14_06_26_26/

Quast report (HTML): results_2018_05_14_06_26_26/report.html

Everything looks good. The main thing I wanted to confirm by running Quast was that the largest contig in each subset was the same as the original PGA assembly (95,480,635bp.

]]>
http://onsnetwork.org/kubu4/2018/05/12/assembly-geoduck-hi-c-assembly-subsetting/feed/ 0
Data Management – Geoduck Phase Genomics Hi-C Data http://onsnetwork.org/kubu4/2018/04/21/data-management-geoduck-phase-genomics-hi-c-data/ http://onsnetwork.org/kubu4/2018/04/21/data-management-geoduck-phase-genomics-hi-c-data/#comments Sat, 21 Apr 2018 22:48:23 +0000 http://onsnetwork.org/kubu4/?p=3295

We received sequencing/assembly data from Phase Genomics.

The data contains two assemblies, produced on two different dates.

All data is here: 20180421_geoduck_hi-c

All FASTQ files (four files; Geoduck_HiC*.gz) were copied to Nightingales:

MD5 checksums were verified and appended to the Nightingales checksum file:

Nightingales sequencing inventory was updated (Google Sheet):

The two assemblies (and assembly stats) they provided are here:

I’ve updated the project-geoduck-genome GitHub wiki with this info.

]]>
http://onsnetwork.org/kubu4/2018/04/21/data-management-geoduck-phase-genomics-hi-c-data/feed/ 2