Tag Archives: scp

Computing – Retrieve data from Amazon EC2 Instance

I had an existing instance that still had data on it from my PyRad analysis on 20160727 that I needed to retrieve.

Logged into Amazon AWS via the web interface and started my existing instance (via the Actions > Instance State > Start menu). After the instance started and generated a new public IP address, I SSH’d into the instance:

ssh -i "/full/path/to/bioinformatics.pem" ubuntu@instance.public.ip.address

NOTE: I needed the full path to the PEM file! Tried multiple times using a relative path (e.g. ~/Documents/bionformatics.pem) and received error messages that the file did not exist and “Permission denied (public key)”.

Changed to the directory with the PyRAD analysis and created a tarball to speed up eventual download from the EC2 instance to my local computer:

tar -cvzf 20160715_pyrad_analysis.tar.gz /home/ubuntu/data/analysis/

After compression, I used secure copy to copy the file from the EC2 instance to my local computer:

scp -i "/full/path/to/bioinformatics.pem" ubuntu@instance.public.ip.address:/home/ubuntu/data/20160715_pyrad_analysis.tar.gz /Volumes/toaster/sam/

This didn’t work initially because I attempted to transfer the file using Hummingbird (instead of my computer). The SSH connection kept timing out. The reason for this was that I hadn’t previously used Hummingbird to connect to the EC2 instance and Hummingbird’s IP address wasn’t listed in the Security Groups table as being allowed to connect. I made that change using the Amazon AWS web interface:

Once transfer was complete, I terminated the EC2 instance and the corresponding data volume.

Computing – A Very Quick “Guide” to Amazon EC2 Continued

Yesterday’s post ended with me trying to mount a S3 bucket to my EC2 instance using s3fs-fuse.

Waited for the 36GB of data to copy over to new bucket with proper naming (i.e. no capital letters in name). Copying took hours; left lab before copying completed.

 

Mount S3 bucket: kubu4

s3fs kubu4 /mnt/s3bucket/ -o passwd_file=/home/ubuntu/creds_s3fs

 

So, that didn’t work. The reason that it doesn’t work is that I uploaded the files to the S3 bucket via the Amazon AWS command line (awscli). Apparently, s3fs-fuse can’t mount S3 buckets that contain data uploaded via awscli [see this GitHub Issue for s3fs-fuse]! However, I had to upload them via awscli because the web interface kept failing!

 

That means I need to upload the data directly to my EC2 instance, but my EC2 instance is set with the default storage capacity of 8GB so I need to increase the capacity to accommodate my two large files, as well as the anticipated intermediate files that will be generated by the types of analysis I plan on running. I’m guessing I’ll need at least 100GB to be safe. To do this, I have to expand the Elastic Block Storage (EBS) volume of my instance. The rest of stuff below is fully explained and covered very well in the EBS expansion link I have in the previous sentence.

Don’t be fooled into thinking I figured any of this out on my own!

 

Expanding the EC2 Instance

The initial part of the process is creating a Snapshot of my instance. This took a long time (2.5hrs). However, I did finally decide to refresh the page when I noticed that the “Status” progress bar hadn’t moved beyond 46% for well over an hour. After refreshing, the “Status” showed “Complete.” Maybe this actually was ready to go much faster, but the page didn’t automatically refresh? Regardless, in retrospect, since this EC2 instance is pretty much brand new and doesn’t have too many changes from when it was initialized, I probably should’ve just created a brand new EC2 instance with the desired amount of EBS…

Created volume from that Snapshot with 150GB of magnetic storage.

Attached volume to the EC2 instance at /dev/sda1 (the default setting /dev/sdf resulted in an error message about the instance not having a root volume) and SSH’d into the instance. Odd, it seems to show that I still only have 8GB of storage (see the “Usage of…” in the screenshot below):

 

 

Check to see if I actually have the expanded storage volume or not. It turns out, I do! (notice that the only drive listed is “xvda” and its partition, “xvda/xvda1″ AND they are equal in size; 150G):

 

 

Time to upload (via the secure copy command) the files to my EC2 instance! The following commands upload the files to a folder called “data” in my /home directory. I also ran the “time” command at the beginning to get an idea of how long it takes to upload each of these files.

time scp -i ~/Dropbox/Lab/Sam/bioinformatics.pem /Volumes/web/nightingales/O_lurida/20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz ubuntu@ec2.ip.address:~/data
time scp -i ~/Dropbox/Lab/Sam/bioinformatics.pem /Volumes/web/nightingales/O_lurida/20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz ubuntu@ec2.ip.address:~/data

 

Details on upload times and file sizes:

 

Confim the files now reside in my EC2 instance:

 

Alas, I should’ve captured all of this in a Jupyter Notebook. However, I didn’t because I thought I would need to enter passwords (which you can’t do with a Jupyter Notebook). It turns out, I didn’t need a password for anything; even when using “sudo” on the EC2 instance. Oh well, it’s set up and running with my data finally accessible. That’s all that really matters here.

Alrighty, time to get rolling on some data analysis with a fancy new Amazon EC2 instance!!!