############### 5.6 closestBed ############### Similar to **intersectBed, closestBed** searches for overlapping features in A and B. In the event that no feature in B overlaps the current feature in A, **closestBed** will report the *closest* (that is, least genomic distance from the start or end of A) feature in B. For example, one might want to find which is the closest gene to a significant GWAS polymorphism. Note that **closestBed** will report an overlapping feature as the closest---that is, it does not restrict to closest *non-overlapping* feature. ========================================================================== 5.6.1 Usage and option summary ========================================================================== **Usage:** :: closestBed [OPTIONS] -a -b =========================== =============================================================================================================================================================================================================== Option Description =========================== =============================================================================================================================================================================================================== **-s** Force strandedness. That is, find the closest feature in B overlaps A on the same strand. *By default, this is disabled*. **-d** In addition to the closest feature in B, report its distance to A as an extra column. The reported distance for overlapping features will be 0. **-t** How ties for closest feature should be handled. This occurs when two features in B have exactly the same overlap with a feature in A. *By default, all such features in B are reported*. Here are the other choices controlling how ties are handled: *all-* Report all ties (default). *first-* Report the first tie that occurred in the B file. *last-* Report the last tie that occurred in the B file. =========================== =============================================================================================================================================================================================================== ========================================================================== 5.6.2 Default behavior ========================================================================== **closestBed** first searches for features in B that overlap a feature in A. If overlaps are found, the feature in B that overlaps the highest fraction of A is reported. If no overlaps are found, **closestBed** looks for the feature in B that is *closest* (that is, least genomic distance to the start or end of A) to A. For example, in the figure below, feature B1 would be reported as the closest feature to A1. :: Chromosome ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ BED FILE A ************* BED File B ^^^^^^^^ ^^^^^^ Result ====== For example: :: cat A.bed chr1 100 200 cat B.bed chr1 500 1000 chr1 1300 2000 closestBed -a A.bed -b B.bed chr1 100 200 chr1 500 1000 ========================================================================== 5.6.3 (-s)Enforcing "strandedness" ========================================================================== This option behaves the same as the -s option for intersectBed while scanning for the closest (overlapping or not) feature in B. See the discussion in the intersectBed section for details. ========================================================================== 5.6.4 (-t)Controlling how ties for "closest" are broken ========================================================================== When there are two or more features in B that overlap the *same fraction* of A, **closestBed** will, by default, report both features in B. Imagine feature A is a SNP and file B contains genes. It can often occur that two gene annotations (e.g. opposite strands) in B will overlap the SNP. As mentioned, the default behavior is to report both such genes in B. However, the -t option allows one to optionally choose the just first or last feature (in terms of where it occurred in the input file, not chromosome position) that occurred in B. For example (note the difference between -l 200 and -l 300): :: cat A.bed chr1 100 101 rs1234 cat B.bed chr1 0 1000 geneA 100 + chr1 0 1000 geneB 100 - closestBed -a A.bed -b B.bed chr1 100 101 rs1234 chr1 0 1000 geneA 100 + chr1 100 101 rs1234 chr1 0 1000 geneB 100 - closestBed -a A.bed -b B.bed -t all chr1 100 101 rs1234 chr1 0 1000 geneA 100 + chr1 100 101 rs1234 chr1 0 1000 geneB 100 - closestBed -a A.bed -b B.bed -t first chr1 100 101 rs1234 chr1 0 1000 geneA 100 + closestBed -a A.bed -b B.bed -t last chr1 100 101 rs1234 chr1 0 1000 geneB 100 - ========================================================================== 5.6.5 (-d)Reporting the distance to the closest feature in base pairs ========================================================================== ClosestBed will optionally report the distance to the closest feature in the B file using the **-d** option. When a feature in B overlaps a feature in A, a distance of 0 is reported. :: cat A.bed chr1 100 200 chr1 500 600 cat B.bed chr1 500 1000 chr1 1300 2000 closestBed -a A.bed -b B.bed -d chr1 100 200 chr1 500 1000 300 chr1 500 600 chr1 500 1000 0