IMPORTANT: Please note you can download correlation data tables, supported by Ensembl, via the highly customisable BioMart and EnsMart data mining tools. See http://${division}.ensembl.org/biomart/martview or http://www.ebi.ac.uk/biomart/ for more information. ------------------ GFF FLATFILE DUMPS ------------------ This directory contains GFF flatfile dumps. All files are compressed using GNU Zip. Ensembl Genomes provides an automatic reannotation of genomic data as well as imports of existing genomic data. These data will be dumped in a number of forms - one of them being GFF flat files. As the annotation of this form comes from Ensembl Genomes, and not the original sequence entry, the two annotations are likely to be different. GFF flat file format dumping provides all the sequence features known by Ensembl Genomes, including protein coding genes, ncRNA, repeat features etc. Considerably more information is stored in Ensembl Genomes: the flat file just gives a representation which is compatible with existing tools. We are considering other information that should be made dumpable. In general we would prefer people to use database access over flat file access if you want to do something serious with the data. Note the following features of the GFF3 format provided on this site: 1) types are described using SO terms that are as specific as possible. e.g. protein_coding_gene is used where a gene is known to be protein coding 2) Phase is currently set to 0 - the phase used by the Ensembl system is stored as an attribute 3) Some validators may warn about duplicated identifiers for CDS features. This is to allow split features to be grouped. We are actively working to improve our GFF3 so some of these issues may be addressed in future releases of Ensembl Genomes. ----------- FILE NAMES ------------ The files are consistently named following this pattern: ...gff3.gz : The systematic name of the species. : The assembly build name. : The version of Ensembl Genomes from which the data was exported. gff3 : All files in these directories are in GFF3 format gz : All files are compacted with GNU Zip for storage efficiency. e.g. Drosophila_melanogaster.BDGP5.21.gff3.gz Where the genome has a chromosome-level assembly, individual files are provided for each chromosome, named following this pattern: ...chromosome..gff3.gz Where the assembly also contains additional non-chromosomal that are not present in the chromosomes, these are all available in a file with the pattern: ...non_chromosomal.gff3.gz e.g. Drosophila_melanogaster.BDGP5.21.74.chromosome.2L.gff3.gz Drosophila_melanogaster.BDGP5.21.non_chromosomal.gff3.gz