Using ESearch and EFetch to retrieve data

Posted September 25, 2015 by Steven Roberts & filed under Tutorial.

Lets say there was this person, we will call her Emma for now, that needed to download lots of data but wanted to make it more robust and reliable. Here is a way to use NCBI ESearch and EFetch tools to do so. Complete documention at http://www.ncbi.nlm.nih.gov/books/NBK25498/. Specific example used is here

Example one: Download all ilumatobacter protein sequences in fasta format.

Will use Esearch to get GI numbers, post them to history and multiple EFetch calls to retrieve data.

Input: $query – ilumatobacter[orgn]

Output: A file named “ilumatobacter.fa” containing FASTA data.

Perl script

use LWP::Simple; $query = 'ilumatobacter[orgn]';


#assemble the esearch URL

$base = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/';

$url = $base . "esearch.fcgi?db=protein&term=$query&usehistory=y";
#post the esearch URL

$output = get($url);
#parse WebEnv, QueryKey and Count (# records retrieved)

$web = $1 if ($output =~ /(\S+)<\/WebEnv>/);

$key = $1 if ($output =~ /(\d+)<\/QueryKey>/);

$count = $1 if ($output =~ /(\d+)<\/Count>/);
#open output file for writing

open(OUT, ">ilumatobacter.fa") || die "Can't open file!\n";

#retrieve data in batches of 500 $retmax = 500; for ($retstart = 0; $retstart < $count; $retstart += $retmax) { $efetch_url = $base ."efetch.fcgi?db=protein&WebEnv=$web"; $efetch_url .= "&query_key=$key&retstart=$retstart"; $efetch_url .= "&retmax=$retmax&rettype=fasta&retmode=text"; $efetch_out = get($efetch_url); print OUT "$efetch_out"; } close OUT;

So if you wanted to use this simple paste the above code in text file (Suggest using TextWrangler) and saving as .pl file (ie /Users/sr320/Desktop/ill-prot.pl. Then in Terminal, type perl /Users/sr320/Desktop/ill-prot.pl. The data will download to whatever directory you are in Terminal.

In actuallity, this still seems to fail randomly. This is common to see on the internets. The best guess is too many requests during busy time of day, so it might take a couple if trys. See http://www.ncbi.nlm.nih.gov/books/NBK25497/#chapter2.Usage_Guidelines_and_Requiremen for usage recommendations.

Using ESearch and EFetch to retrieve data

Related

Github Activity

iPlant Activity

Recent Posts

Tag Cloud

Categories

Links

Archives

Recent Comments

Meta