UniProtKB-GOA README FOR GP_INFORMATION.GOA_UNIPROT --------------------------------------------------- 1. Contents ------------ 1. Contents 2. Introduction 2. Introduction ---------------- UniProtKB-GOA (GO Annotation@EBI) is a project run by the European Bioinformatics Institute that aims to provide assignments of gene products to the Gene Ontology (GO) resource. The goal of the Gene Ontology Consortium is to produce a dynamic controlled vocabulary that can be applied to all eukaryotes, even while the knowledge of the gene product roles in cells is still accumulating and changing. In the UniProtKB-GOA project, this vocabulary is applied to all proteins described in the UniProt (Swiss-Prot and TrEMBL) Knowledgebase. For full information on the UniProtKB-GOA project, please go to: http://www.ebi.ac.uk/GOA This readme describes the format of the gp_information.goa_uniprot file. This file has been created to supply additional information on the DB_OBJECT_ID that has been provided with annotations. This file currently consists of 11 tab-separated fields, and the contents of these fields are described below. However as the format of this file is currently being discussed by the GO Consortium, it is therefore subject to change. For further information please see: http://wiki.geneontology.org/index.php/Annotation_File_Format_Proposal#Proposed_file_format This file currently consists of 11 tab-separated fields. Their contents are described below. 1. DB Database from which annotated entry has been taken. This file will contain one of either: UniProtKB or IPI This field equates to column 1 in the GAF2.0 format. 2. DB_Subset The database subset from which the entity being described has been taken. This information will only be supplied for UniProtKB, where this field will be one of Swiss-Prot or TrEMBL. This field is only supplied by the gp_information.goa_uniprot file. 3. DB_Object_ID A unique identifier in the database for the entity being described. Here: an accession number or identifier of the annotated protein - a UniProtKB accession number, UniProtKB isoform identifier or IPI identifier Examples: O00165 O00165-1 This field equates to column 2 (or column 17 in the case of isoform identifiers) in the GAF2.0 format. 4. DB_Object_Symbol A (unique and valid) symbol (gene name) that corresponds to the DB_Object_ID. An officially approved gene symbol will be used in this field when available. Alternatively, other gene symbols or locus names are applied. If no symbols are available, the identifier applied in column 2 will be used. Examples: G6PC CYB561 MGCQ309F3 C10H14ORF1 This field equates to column 3 in the GAF2.0 format. 5. DB_Object_Name The name of the entity being described. The full UniProt protein name will be present here, if available from UniProtKB. If a name cannot be added, this field will be left empty. Examples: Glucose-6-phosphatase Cellular tumor antigen p53 Coatomer subunit beta This field equates column 10 in the GAF2.0 format. 6. DB_Object_Synonym Gene_symbol [or other text] Alternative gene symbol(s), IPI identifier(s) and UniProtKB/Swiss-Prot identifiers are provided, pipe-separated, if available from UniProtKB. If none of these identifiers have been supplied, the field will be left empty. Examples: RNF20|BRE1A|IPI00690596|BRE1A_BOVIN IPI00706050 MMP-16|IPI00689864 This field equates to column 11 in the GAF2.0 format. 7. DB_Object_Type The kind of entity that DB_Object_ID identifies. Examples: protein, isoform, protein_structure This field equates to column 12 in the GAF2.0 format 8. Taxon Identifier for the species to which the described entity relates. Example: taxon:9606 This field equates to column 13 in the GAF2.0 format. 9. Annotation_Target_Set A description of the list in which the protein has been included for prioritized annotation. Examples: BHF-UCL KRUK Reference_Genome This field is only supplied by the gp_information.goa_uniprot file. 10. Annotation_Completed The date when a curator has indicated that the protein's GO annotation record had been comprehensively curated. Example: 20080101 This field is only supplied by the gp_information.goa_uniprot file. 11. Parent_Object_ID This field supplies the relationship between the DB_Object_ID and a top-level UniProtKB accession number, where the DB_Object_ID is an isoform identifier. Example: UniProtKB:P21678 This field is only supplied by the gp_information.goa_uniprot file.