Transposable Element Mapping - Crassostrea gigas Genome v9 Using RepeatMasker 4.07 on Roadrunner

Per this GitHub issue, I’m IDing transposable elements (TEs) in the Crassostrea gigas genome. Even though the C.gigas genome should be fully annotated, Steven wants a comparable set of analyses to compare to the Crassostrea virginica TE mapping we previously performed.

I used the Crassostrea gigas genome we have linked on our GitHub Genomic Resources wiki:

Analysis was performed in the following Jupyter Notebok (GitHub):


RESULTS

This took ~24hrs to complete.

Output folder:

Genome used (from our Genomic Resources wiki):

GFF file:

Summary table (text):



==================================================
file name: Crassostrea_gigas.oyster_v9.dna_sm.toplevel.fa
sequences:          7658
total length:  557717710 bp  (491860439 bp excl N/X-runs)
GC level:         33.42 %
bases masked:  160369613 bp ( 32.60 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
Retroelements        48481     19773596 bp    4.02 %
   SINEs:             2498       317084 bp    0.06 %
   Penelope           5749      1808270 bp    0.37 %
   LINEs:            26463     10472676 bp    2.13 %
    CRE/SLACS           15         1289 bp    0.00 %
     L2/CR1/Rex       1712       307207 bp    0.06 %
     R1/LOA/Jockey     299        21470 bp    0.00 %
     R2/R4/NeSL        218        69735 bp    0.01 %
     RTE/Bov-B        8417      3631379 bp    0.74 %
     L1/CIN4           983        64189 bp    0.01 %
   LTR elements:     19520      8983836 bp    1.83 %
     BEL/Pao          2050      1349545 bp    0.27 %
     Ty1/Copia        2139       189535 bp    0.04 %
     Gypsy/DIRS1     11971      6501545 bp    1.32 %
       Retroviral     1263        69288 bp    0.01 %

DNA transposons     299050     85782505 bp   17.44 %
   hobo-Activator     9348      2278556 bp    0.46 %
   Tc1-IS630-Pogo    32515      8695261 bp    1.77 %
   En-Spm                0            0 bp    0.00 %
   MuDR-IS905            0            0 bp    0.00 %
   PiggyBac           4136       747000 bp    0.15 %
   Tourist/Harbinger 11590      2828277 bp    0.58 %
   Other (Mirage,      232        14514 bp    0.00 %
    P-element, Transib)

Rolling-circles          0            0 bp    0.00 %

Unclassified:       109149     49075277 bp    9.98 %

Total interspersed repeats:   154631378 bp   31.44 %


Small RNA:             830        93282 bp    0.02 %

Satellites:           2087       401812 bp    0.08 %
Simple repeats:     110847      4687373 bp    0.95 %
Low complexity:      16716       787611 bp    0.16 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element
  Runs of >=20 X/Ns in query were excluded in % calcs


The query species was assumed to be root          
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127

run with rmblastn version 2.6.0+

I’ve put together the TE comparison requested in the GitHub Issue mentioned above in a Google Sheet: