RepeatMasker - Pgenerosa_v070 for Transposable Element ID on Roadrunner

Continuing our various attempts at annotating our geoduck genome assemblies, I will be re-annotating our Pgenerosa_v070 (see Genome Resources GitHub wiki for deets) and realized I hadn’t run RepeatMasker on this assembly previously. Running RepeatMasker will generate a GFF that I can supply to MAKER to aid in repeats identification.

I ran RepeatMasker 4.07 on Roadrunner (Apple Xserve running Ubuntu 16.04LTS) using the “all” species setting.

All of it is detailed in this Jupyter Notebook (GitHub):


RESULTS

This took ~4 days to run - longer than I expected.

Will use the GFF in subsequent MAKER annotation.

Output folder:

Summary table (text):

Output GFF:


SUMMARY TABLE

==================================================
file name: Pgenerosa_v070.fa        
sequences:        313649
total length: 2205688688 bp  (2005531528 bp excl N/X-runs)
GC level:         33.92 %
bases masked:  175175579 bp ( 8.73 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
Retroelements       565711     87788537 bp    4.38 %
   SINEs:           332333     39506023 bp    1.97 %
   Penelope           6883       788411 bp    0.04 %
   LINEs:           142649     32744907 bp    1.63 %
    CRE/SLACS         1237       100944 bp    0.01 %
     L2/CR1/Rex      40317      7764197 bp    0.39 %
     R1/LOA/Jockey   10137      2942539 bp    0.15 %
     R2/R4/NeSL       3825       551996 bp    0.03 %
     RTE/Bov-B       26939      6768723 bp    0.34 %
     L1/CIN4         21435      4046589 bp    0.20 %
   LTR elements:     90729     15537607 bp    0.77 %
     BEL/Pao          6594       918331 bp    0.05 %
     Ty1/Copia       16409      1268565 bp    0.06 %
     Gypsy/DIRS1     50972     11376086 bp    0.57 %
       Retroviral     9680       690936 bp    0.03 %

DNA transposons     259955     34987123 bp    1.74 %
   hobo-Activator    29756      3192075 bp    0.16 %
   Tc1-IS630-Pogo    67456      9717356 bp    0.48 %
   En-Spm                0            0 bp    0.00 %
   MuDR-IS905            0            0 bp    0.00 %
   PiggyBac           1553       121136 bp    0.01 %
   Tourist/Harbinger  7596      1054167 bp    0.05 %
   Other (Mirage,     1803       123196 bp    0.01 %
    P-element, Transib)

Rolling-circles          0            0 bp    0.00 %

Unclassified:        99928     13654973 bp    0.68 %

Total interspersed repeats:   136430633 bp    6.80 %


Small RNA:           42601      2192413 bp    0.11 %

Satellites:          33350      6282246 bp    0.31 %
Simple repeats:     596607     32793030 bp    1.64 %
Low complexity:      75831      3754962 bp    0.19 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element
  Runs of >=20 X/Ns in query were excluded in % calcs


The query species was assumed to be root          
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127

run with rmblastn version 2.6.0+