Genome Feature Counts - Panopea-generosa-vv0.74.a4

UPDATE 20191203

Due to a bug with GenSAS, the final GFF generated by their system to did not properly merge all of the repeats data properly. The repeats data below should be ignored! It appears that the other features data (e.g. CDS, exons, etc) was unaffected by the bug.

Here’s the updated analysis: https://robertslab.github.io/sams-notebook/2019/12/03/Data-Wrangling-Renaming,-Splitting,-and-Feature-Counts-of-Updated-Pgenerosa_v074-GenSAS-Merged-GFF.html

I’ll leave original notebook entry below for posterity.


In preparation for a paper we’re writing, we needed some summary stats for Panopea-generosa-vv0.74.a4. This info will be compiled in to a table for the manuscript. See our Genomic Resources wiki for more info on GFFs:

Calculations were performed using Python in Jupyter Notebooks.

Genome Features Jupyter Notebook (GitHub):

Repeat Features Jupyter Notbooke (GitHub):


RESULTS

I’ve copied/pasted the summary data for each of the GFFs that were analyzed, for quick reference. Will get this compiled in to a table of some sort for people to use for the manuscript.

Also, the repeats were split into individual GFFs by repeat type. Those GFFs can be found here:

Genome Features

Panopea-generosa-vv0.74.a4.mRNA.gff3
-------------------------
mean       12903.649559
min          166.000000
median      5453.000000
max       283066.000000
Name: seqlength, dtype: float64



Panopea-generosa-vv0.74.a4.gene.gff3
-------------------------
mean       10811.04461
min          166.00000
median      4464.00000
max       283066.00000
Name: seqlength, dtype: float64



Panopea-generosa-vv0.74.a4.CDS.gff3
-------------------------
mean        201.476988
min           3.000000
median      133.000000
max       13221.000000
Name: seqlength, dtype: float64



Panopea-generosa-vv0.74.a4.tRNA.gff3
-------------------------
mean       74.805659
min        53.000000
median     75.000000
max       314.000000
Name: seqlength, dtype: float64



Panopea-generosa-vv0.74.a4.rRNA.gff3
-------------------------
mean      118.428571
min       113.000000
median    115.000000
max       138.000000
Name: seqlength, dtype: float64



Panopea-generosa-vv0.74.a4.exon.gff3
-------------------------
mean        201.476988
min           3.000000
median      133.000000
max       13221.000000
Name: seqlength, dtype: float64

Repeats Features

Panopea-generosa-vv0.74.a4.repeats.LINE.gff3
-------------------------
percent 2.91
sum       27388849.00
mean           394.85
min             11.00
median         226.00
max           6604.00
Name: seqlength, dtype: float64



Panopea-generosa-vv0.74.a4.repeats.Simple_repeat.gff3
-------------------------
percent 0.5
sum       4733271.0
mean          261.2
min             6.0
median        125.0
max          5981.0
Name: seqlength, dtype: float64



Panopea-generosa-vv0.74.a4.repeats.Unknown.gff3
-------------------------
percent 29.09
sum       2.740281e+08
mean      1.991900e+02
min       1.100000e+01
median    1.440000e+02
max       6.574000e+03
Name: seqlength, dtype: float64



Panopea-generosa-vv0.74.a4.repeats.LTR.gff3
-------------------------
percent 0.22
sum       2060084.00
mean          712.83
min            11.00
median        316.00
max          6541.00
Name: seqlength, dtype: float64



Panopea-generosa-vv0.74.a4.repeats.RC.gff3
-------------------------
percent 0.02
sum       232303.00
mean         425.46
min           13.00
median       464.00
max          674.00
Name: seqlength, dtype: float64



Panopea-generosa-vv0.74.a4.repeats.SINE.gff3
-------------------------
percent 0.65
sum       6133778.00
mean          155.69
min            11.00
median        164.00
max           934.00
Name: seqlength, dtype: float64



Panopea-generosa-vv0.74.a4.repeats.DNA.gff3
-------------------------
percent 0.91
sum       8602532.00
mean          407.82
min            11.00
median        247.00
max          7012.00
Name: seqlength, dtype: float64



-------------------------
Repeats composition of genome (percent): 34.3