Sam's Notebook » faidx http://onsnetwork.org/kubu4 University of Washington - Fishery Sciences - Roberts Lab Thu, 08 Nov 2018 21:47:12 +0000 en-US hourly 1 http://wordpress.org/?v=4.0 Assembly – Geoduck Hi-C Assembly Subsetting http://onsnetwork.org/kubu4/2018/05/12/assembly-geoduck-hi-c-assembly-subsetting/ http://onsnetwork.org/kubu4/2018/05/12/assembly-geoduck-hi-c-assembly-subsetting/#comments Sat, 12 May 2018 22:16:56 +0000 http://onsnetwork.org/kubu4/?p=3332 10kbp Contigs >30kbp I used pyfaidx on Roadrunner and the following commands: faidx --size-range 10000,100000000 PGA_assembly.fasta > PGA_assembly_10k_plus.fasta faidx --size-range 30000,100000000 PGA_assembly.fasta > PGA_assembly_30k_plus.fasta Ran Quast afterwards to get stats on the new FastA files just […]]]>

Steven asked me to create a couple of subsets of our Phase Genomics Hi-C geoduck genome assembly (pga_02):

  • Contigs >10kbp

  • Contigs >30kbp

I used pyfaidx on Roadrunner and the following commands:

faidx --size-range 10000,100000000 PGA_assembly.fasta > PGA_assembly_10k_plus.fasta
faidx --size-range 30000,100000000 PGA_assembly.fasta > PGA_assembly_30k_plus.fasta

Ran Quast afterwards to get stats on the new FastA files just to confirm that the upper cutoff value was correct and didn’t get rid of the largest contig(s).

Results:

faidx Output folder: 20180512_geoduck_fasta_subsets/

10kbp contigs (FastA): 20180512_geoduck_fasta_subsets/PGA_assembly_10k_plus.fasta

30kbp contigs (FastA): 20180512_geoduck_fasta_subsets/PGA_assembly_30k_plus.fasta

Quast output folder: results_2018_05_14_06_26_26/

Quast report (HTML): results_2018_05_14_06_26_26/report.html

Everything looks good. The main thing I wanted to confirm by running Quast was that the largest contig in each subset was the same as the original PGA assembly (95,480,635bp.

]]>
http://onsnetwork.org/kubu4/2018/05/12/assembly-geoduck-hi-c-assembly-subsetting/feed/ 0