Data Wrangling - Panopea generosa Genome Feature Sequence Lengths

In preparation for a paper we’re writing, we needed some summary stats on the various genome assembly feature sequences. I determined the max/min, mean, and median sequence lengths for all the GFF feature files we currently have for Pgenerosa_v070, Pgenerosa_v070_top18_scaffolds, and Pgenerosa_v074. This info will be compiled in to a table for the manuscript. See our Genomic Resources wiki for more info on GFFs:

Calculations were performed using Python in a Jupyter Notebook.

Jupyter Notebook (GitHub):


RESULTS

I’ve copied/pasted (and, reordered) the summary data for each of the GFFs that were analyzed, for quick reference. Will get this compiled in to a table of some sort for people to use for the manuscript.

Pgenerosa_v070_top18_scaffolds.CDS.gff
-------------------------
mean       177.728604
min          1.000000
median     123.000000
max       9851.000000



Pgenerosa_v070_top18_scaffolds.exon.gff
-------------------------
mean        219.024085
min           3.000000
median      131.000000
max       16912.000000


Pgenerosa_v070_top18_scaffolds.five_prime_UTR.gff
-------------------------
mean        148.558072
min           1.000000
median       89.000000
max       16912.000000




Pgenerosa_v070_top18_scaffolds.gene.gff
-------------------------
mean        8696.297953
min          180.000000
median      4621.000000
max       139826.000000



Pgenerosa_v070_top18_scaffolds.mRNA.gff
-------------------------
mean        8696.297953
min          180.000000
median      4621.000000
max       139826.000000


Pgenerosa_v070_top18_scaffolds.three_prime_UTR.gff
-------------------------
mean       481.02338
min          1.00000
median     221.00000
max       8355.00000




Pgenerosa_v074.CDS.gff
-------------------------
mean        193.248049
min           1.000000
median      126.000000
max       11772.000000



Pgenerosa_v074.exon.gff
-------------------------
mean        210.612431
min           3.000000
median      130.000000
max       11772.000000




Pgenerosa_v074.five_prime_UTR.gff
-------------------------
mean        62.226496
min          1.000000
median      41.000000
max       1259.000000




Pgenerosa_v074.gene.gff
-------------------------
mean       12170.897196
min          192.000000
median      5552.000000
max       175905.000000



Pgenerosa_v074.mRNA.gff
-------------------------
mean       12170.897196
min          192.000000
median      5552.000000
max       175905.000000




Pgenerosa_v074.three_prime_UTR.gff
-------------------------
mean       516.604585
min          1.000000
median     251.000000
max       4080.000000





Pgenerosa_v070.CDS.gff
-------------------------
mean        194.539909
min           1.000000
median      132.000000
max       11048.000000


Pgenerosa_v070.exon.gff
-------------------------
mean        217.971132
min           3.000000
median      139.000000
max       16912.000000




Pgenerosa_v070.gene.gff
-------------------------
mean        3768.232394
min          177.000000
median      1474.000000
max       139826.000000




Pgenerosa_v070.mRNA.gff
-------------------------
mean        3768.232394
min          177.000000
median      1474.000000
max       139826.000000