IMPORTANT: Please note you can download correlation data tables, supported by Ensembl, via the highly customisable BioMart data mining tool. See http://${division}.ensembl.org/biomart/martview or http://www.ebi.ac.uk/biomart/ for more information. Not available for Ensembl Bacteria. -------- GTF DUMP -------- This directory includes a summary of the gene annotation information and GTF format. GTF file format dumping provides all the annotated protein coding genes in this release genes's set. Considerably more information is stored in Ensembl Genomes: the GTF file just gives a representation which is compatible with existing tools. -------------------------------- Definition and supported options -------------------------------- The GFF (General Feature Format) format consists of one line per feature, each containing 9 columns of data, plus optional track definition lines. The following documentation is based on the Version 2 specifications. The GTF (General Transfer Format) is identical to GFF version 2. Fields Fields must be tab-separated. Also, all but the final field in each feature line must contain a value; "empty" columns should be denoted with a '.' seqname - name of the chromosome or scaffold; chromosome names can be given with or without the 'chr' prefix (the convention in Ensembl is to omit the 'chr' prefix). source - name of the program that generated this feature, or the data source (database or project name) feature - feature type name, e.g. Gene, Variation, Similarity start - start position of the feature, with sequence numbering starting at 1. end - end position of the feature, with sequence numbering starting at 1. score - a floating point value. strand - defined as + (forward) or - (reverse). frame - one of '0', '1' or '2'. '0' indicates that the first base of the feature is the first base of a codon, '1' that the second base is the first base of a codon, and so on.. attribute - a semicolon-separated list of tag-value pairs, providing additional information about each feature. Track lines Although not part of the formal GFF specification, Ensembl will use track lines to further configure sets of features. Track lines should be placed at the beginning of the list of features they are to affect. The track line consists of the word 'track' followed by space-separated key=value pairs. Valid parameters used by Ensembl are: name - unique name to identify this track when parsing the file description - Label to be displayed under the track in Region in Detail priority - integer defining the order in which to display tracks, if multiple tracks are defined. -------------- Example output -------------- YHet protein_coding exon 311 424 . + . gene_id "FBgn0001315"; transcript_id "FBtr0113891"; exon_number "1"; gene_name "kl-5"; transcript_name "kl-5-RA"; seqedit "false"; YHet protein_coding CDS 311 424 . + 0 gene_id "FBgn0001315"; transcript_id "FBtr0113891"; exon_number "1"; gene_name "kl-5"; transcript_name "kl-5-RA"; protein_id "FBpp0112614"; YHet protein_coding exon 540 799 . + . gene_id "FBgn0001315"; transcript_id "FBtr0113891"; exon_number "2"; gene_name "kl-5"; transcript_name "kl-5-RA"; seqedit "false"; YHet protein_coding CDS 540 799 . + 0 gene_id "FBgn0001315"; transcript_id "FBtr0113891"; exon_number "2"; gene_name "kl-5"; transcript_name "kl-5-RA"; protein_id "FBpp0112614";