Data Wrangling - Pgenerosa_v074.a3 Annotation Genome Repeats Compostion

Needed to pull some numbers on repeats in our Pgenerosa_v074.a3 GenSAS annotation (from 20190710) for the manuscript we’re putting together.

Used the following file (repeats identified via RepeatMasker; original file name t_5d25021b0d20b-5d250896def4c-repeatmasker.gensas.gff3):

Panopea-generosa-vv0.74.a3.TE.gff3

Repeats were analyzed by type of repeats:

  • DNA transposons (DNA)

  • Simple repeats

  • LINE (Long Interspersed Nuclear Elements)

  • LTR (Long Terminal Repeats)

  • SINE (Short Interspersed Nuclear Elements)

  • Unknown repeats

Analysis was done in a Jupyter notebook (link below).

Jupyter Notebook (GitHub):


RESULTS

Will add info to manuscript and create a table.

Panopea-generosa-vv0.74.a3.repeats.DNA.gff3
-------------------------
percent 1.2
sum       11316890.00
mean           293.09
min              1.00
median         154.00
max           7012.00




Panopea-generosa-vv0.74.a3.repeats.Simple_repeat.gff3
-------------------------
percent 2.03
sum       19132744.00
mean            63.78
min              1.00
median          41.00
max          12422.00




Panopea-generosa-vv0.74.a3.repeats.LINE.gff3
-------------------------
percent 3.11
sum       29258694.00
mean           396.48
min             11.00
median         227.00
max           6604.00




Panopea-generosa-vv0.74.a3.repeats.LTR.gff3
-------------------------
percent 0.46
sum       4355629.00
mean          370.63
min             1.00
median        276.00
max          6541.00




Panopea-generosa-vv0.74.a3.repeats.SINE.gff3
-------------------------
percent 2.3
sum       21645991.00
mean           147.84
min              1.00
median         142.00
max            934.00




Panopea-generosa-vv0.74.a3.repeats.Unknown.gff3
-------------------------
percent 31.14
sum       2.933161e+08
mean      2.001500e+02
min       1.100000e+01
median    1.450000e+02
max       1.098100e+04




-------------------------
Repeats composition of genome (percent): 40.24