Transposable Element Mapping – Crassostrea virginica Genome, Cvirginica_v300, using RepeatMasker 4.07

Per this GitHub issue, I’m IDing transposable elements (TEs) in the Crassostrea virginica genome.

Genome used:

I ran RepeatMasker (v4.07) with RepBase-20170127 and RMBlast 2.6.0 four times:

  1. Species = all

  2. Species = Crassostrea gigas (Pacific oyster)

  3. Species = Crassostrea virginica (Eastern oyster)

  4. Default settings (i.e. no species select – will use human genome).

The idea with running this with four different settings was to get a sense of how the analyses would differ with species specifications.

All runs were performed on roadrunner.

All commands were documented in a Jupyter Notebook (GitHub):

NOTE: RepeatMasker writes the desired output files (*.out, *.cat.gz, and *.gff) to the same directory that the genome is located in! If you conduct multiple runs with the same genome in the same directory, it will overwrite those files, as they are named using the genome assembly filename. Be sure to move files out of the genome directory after each run!


RESULTS:
RUN 1 (species – all)

Output folder:

Summary table (text):

Output table (GFF):

SUMMARY TABLE

==================================================
file name: Cvirginica_v300.fa       
sequences:            11
total length:  684741128 bp  (684675328 bp excl N/X-runs)
GC level:         34.83 %
bases masked:  113771462 bp ( 16.62 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
Retroelements        97003     27946871 bp    4.08 %
   SINEs:            48145      9242559 bp    1.35 %
   Penelope           1429       256929 bp    0.04 %
   LINEs:            27022     10570154 bp    1.54 %
    CRE/SLACS           28         2219 bp    0.00 %
     L2/CR1/Rex       2160       316660 bp    0.05 %
     R1/LOA/Jockey    3058       386611 bp    0.06 %
     R2/R4/NeSL        511       226938 bp    0.03 %
     RTE/Bov-B        7377      3276312 bp    0.48 %
     L1/CIN4          1331        95476 bp    0.01 %
   LTR elements:     21836      8134158 bp    1.19 %
     BEL/Pao          1807       936488 bp    0.14 %
     Ty1/Copia        3046       296183 bp    0.04 %
     Gypsy/DIRS1     12789      6060883 bp    0.89 %
       Retroviral     2369       152228 bp    0.02 %

DNA transposons     180693     29492426 bp    4.31 %
   hobo-Activator    12869      1114188 bp    0.16 %
   Tc1-IS630-Pogo    17233      2485049 bp    0.36 %
   En-Spm                0            0 bp    0.00 %
   MuDR-IS905            0            0 bp    0.00 %
   PiggyBac           2388       405926 bp    0.06 %
   Tourist/Harbinger  9302       992476 bp    0.14 %
   Other (Mirage,      238        15946 bp    0.00 %
    P-element, Transib)

Rolling-circles          0            0 bp    0.00 %

Unclassified:       137707     45460608 bp    6.64 %

Total interspersed repeats:   102899905 bp   15.03 %


Small RNA:           45243      9057873 bp    1.32 %

Satellites:           3852       760316 bp    0.11 %
Simple repeats:     203542      8946510 bp    1.31 %
Low complexity:      26205      1281043 bp    0.19 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element
  Runs of >=20 X/Ns in query were excluded in % calcs


The query species was assumed to be root          
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127
        
run with rmblastn version 2.6.0+


RUN 2 (species – Crassostrea gigas)

Output folder:

Summary table (text):

Output table (GFF):

SUMMARY TABLE

==================================================
file name: Cvirginica_v300.fa       
sequences:            11
total length:  684741128 bp  (684675328 bp excl N/X-runs)
GC level:         34.83 %
bases masked:   93923386 bp ( 13.72 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
Retroelements        26397     15008601 bp    2.19 %
   SINEs:                4          722 bp    0.00 %
   Penelope            675       190160 bp    0.03 %
   LINEs:            17645      8922188 bp    1.30 %
    CRE/SLACS            0            0 bp    0.00 %
     L2/CR1/Rex         70        39188 bp    0.01 %
     R1/LOA/Jockey       0            0 bp    0.00 %
     R2/R4/NeSL          4         5110 bp    0.00 %
     RTE/Bov-B        6194      2718955 bp    0.40 %
     L1/CIN4             0            0 bp    0.00 %
   LTR elements:      8748      6085691 bp    0.89 %
     BEL/Pao           933       788887 bp    0.12 %
     Ty1/Copia          47        82743 bp    0.01 %
     Gypsy/DIRS1      6819      4822734 bp    0.70 %
       Retroviral        0            0 bp    0.00 %

DNA transposons     163945     26422122 bp    3.86 %
   hobo-Activator     7742       720623 bp    0.11 %
   Tc1-IS630-Pogo    15615      2328538 bp    0.34 %
   En-Spm                0            0 bp    0.00 %
   MuDR-IS905            0            0 bp    0.00 %
   PiggyBac           2246       393498 bp    0.06 %
   Tourist/Harbinger  8431       876020 bp    0.13 %
   Other (Mirage,        0            0 bp    0.00 %
    P-element, Transib)

Rolling-circles          0            0 bp    0.00 %

Unclassified:       160681     41266796 bp    6.03 %

Total interspersed repeats:    82697519 bp   12.08 %


Small RNA:             214        40811 bp    0.01 %

Satellites:           1396       217317 bp    0.03 %
Simple repeats:     216869      9637447 bp    1.41 %
Low complexity:      27520      1418990 bp    0.21 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element
  Runs of >=20 X/Ns in query were excluded in % calcs


The query species was assumed to be crassostrea gigas
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127
        

RUN 3 (species – Crassostrea virginica)

Output folder:

Summary table (text):

Output table (GFF):

SUMMARY TABLE

==================================================
file name: Cvirginica_v300.fa       
sequences:            11
total length:  684741128 bp  (684675328 bp excl N/X-runs)
GC level:         34.83 %
bases masked:   46637065 bp ( 6.81 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
Retroelements        43139      8952068 bp    1.31 %
   SINEs:            43139      8952068 bp    1.31 %
   Penelope              0            0 bp    0.00 %
   LINEs:                0            0 bp    0.00 %
    CRE/SLACS            0            0 bp    0.00 %
     L2/CR1/Rex          0            0 bp    0.00 %
     R1/LOA/Jockey       0            0 bp    0.00 %
     R2/R4/NeSL          0            0 bp    0.00 %
     RTE/Bov-B           0            0 bp    0.00 %
     L1/CIN4             0            0 bp    0.00 %
   LTR elements:         0            0 bp    0.00 %
     BEL/Pao             0            0 bp    0.00 %
     Ty1/Copia           0            0 bp    0.00 %
     Gypsy/DIRS1         0            0 bp    0.00 %
       Retroviral        0            0 bp    0.00 %

DNA transposons       3538      1564942 bp    0.23 %
   hobo-Activator        0            0 bp    0.00 %
   Tc1-IS630-Pogo        0            0 bp    0.00 %
   En-Spm                0            0 bp    0.00 %
   MuDR-IS905            0            0 bp    0.00 %
   PiggyBac              0            0 bp    0.00 %
   Tourist/Harbinger     0            0 bp    0.00 %
   Other (Mirage,        0            0 bp    0.00 %
    P-element, Transib)

Rolling-circles          0            0 bp    0.00 %

Unclassified:        65151     23982146 bp    3.50 %

Total interspersed repeats:    34499156 bp    5.04 %


Small RNA:           43353      8992879 bp    1.31 %

Satellites:              1          222 bp    0.00 %
Simple repeats:     232627     10544162 bp    1.54 %
Low complexity:      29762      1561018 bp    0.23 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element
  Runs of >=20 X/Ns in query were excluded in % calcs


The query species was assumed to be crassostrea virginica
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127
        
run with rmblastn version 2.6.0+

RUN 4 (default settings – human genome)

Output folder:

Summary table (text):

Output table (GFF):

SUMMARY TABLE

==================================================
file name: Cvirginica_v300.fa       
sequences:            11
total length:  684741128 bp  (684675328 bp excl N/X-runs)
GC level:         34.83 %
bases masked:   13461422 bp ( 1.97 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
SINEs:             2056       120820 bp    0.02 %
      ALUs            0            0 bp    0.00 %
      MIRs          240        14635 bp    0.00 %

LINEs:             3408       331585 bp    0.05 %
      LINE1         240        16835 bp    0.00 %
      LINE2         728        69177 bp    0.01 %
      L3/CR1       1369       135234 bp    0.02 %

LTR elements:       704       236625 bp    0.03 %
      ERVL           14          944 bp    0.00 %
      ERVL-MaLRs     12          892 bp    0.00 %
      ERV_classI    272        36695 bp    0.01 %
      ERV_classII     4          206 bp    0.00 %

DNA elements:      1088       100026 bp    0.01 %
     hAT-Charlie     27         1543 bp    0.00 %
     TcMar-Tigger   142         9891 bp    0.00 %

Unclassified:        57         6096 bp    0.00 %

Total interspersed repeats:   795152 bp    0.12 %


Small RNA:         3698       279669 bp    0.04 %

Satellites:          73         5524 bp    0.00 %
Simple repeats:  247957     10848509 bp    1.58 %
Low complexity:   30084      1536314 bp    0.22 %
==================================================

* most repeats fragmented by insertions or deletions
  have been counted as one element
  Runs of >=20 X/Ns in query were excluded in % calcs


The query species was assumed to be homo sapiens  
RepeatMasker Combined Database: Dfam_Consensus-20170127, RepBase-20170127
        
run with rmblastn version 2.6.0+

Leave a Reply

Your email address will not be published. Required fields are marked *


e.g. 0000-0002-7299-680X

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>