RepeatMasker - Pgenerosa_v074 for Transposable Element ID on Roadrunner

Yesterday (20190625) I generated a subset of the first 18 FastA sequences from the Pgenerosa_v070.fa file. This subset has been designated as Pgenerosa_v074 by Steven. It’s available on our Genomic Resources wiki:

Steven has asked me to generate a transposable elements (TE) GFF to accompany this assembly. Additionally, he’s asked to annotate this assembly using MAKER - this will be detailed in another notebook entry at some point.

Anyway, I ran RepeatMasker 4.07 on Roadrunner (Apple Xserve running Ubuntu 16.04LTS) using the “all” species setting.

All of it is detailed in the Jupyter Notebook (GitHub):


RESULTS

Run time was 2024 minutes (~33.7hrs).

Output folder:

Summary table (text):

Output GFF:


SUMMARY TABLE

==================================================
file name: Pgenerosa_v074.fa        
sequences:            18
total length:  942353201 bp  (784808881 bp excl N/X-runs)
GC level:         33.78 %
bases masked:   65221692 bp ( 8.31 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
Retroelements       204336     32863590 bp    4.19 %
   SINEs:           127691     15752737 bp    2.01 %
   Penelope           2382       279223 bp    0.04 %
   LINEs:            49426     11965761 bp    1.52 %
    CRE/SLACS          453        37114 bp    0.00 %
     L2/CR1/Rex      13913      2779414 bp    0.35 %
     R1/LOA/Jockey    3341      1189171 bp    0.15 %
     R2/R4/NeSL       1211       165338 bp    0.02 %
     RTE/Bov-B        9983      2559753 bp    0.33 %
     L1/CIN4          6194      1146568 bp    0.15 %
   LTR elements:     27219      5145092 bp    0.66 %
     BEL/Pao          1918       317492 bp    0.04 %
     Ty1/Copia        4335       355225 bp    0.05 %
     Gypsy/DIRS1     16012      3831098 bp    0.49 %
       Retroviral     2945       204333 bp    0.03 %

DNA transposons      89437     12061369 bp    1.54 %
   hobo-Activator    10103      1142451 bp    0.15 %
   Tc1-IS630-Pogo    24664      3657788 bp    0.47 %
   En-Spm                0            0 bp    0.00 %
   MuDR-IS905            0            0 bp    0.00 %
   PiggyBac            472        38428 bp    0.00 %
   Tourist/Harbinger  2582       369771 bp    0.05 %
   Other (Mirage,      628        39925 bp    0.01 %
    P-element, Transib)

Rolling-circles          0            0 bp    0.00 %

Unclassified:        38482      5369675 bp    0.68 %

Total interspersed repeats:    50294634 bp    6.41 %


Small RNA:           16303       859653 bp    0.11 %

Satellites:          10312      1878369 bp    0.24 %
Simple repeats:     239752     12742842 bp    1.62 %
Low complexity:      31725      1550615 bp    0.20 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element
  Runs of >=20 X/Ns in query were excluded in % calcs


The query species was assumed to be root          
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127

run with rmblastn version 2.6.0+