s3fs-fuse | Sam's Notebook

Yesterday’s post ended with me trying to mount a S3 bucket to my EC2 instance using s3fs-fuse.

Waited for the 36GB of data to copy over to new bucket with proper naming (i.e. no capital letters in name). Copying took hours; left lab before copying completed.

Mount S3 bucket: kubu4

s3fs kubu4 /mnt/s3bucket/ -o passwd_file=/home/ubuntu/creds_s3fs

So, that didn’t work. The reason that it doesn’t work is that I uploaded the files to the S3 bucket via the Amazon AWS command line (awscli). Apparently, s3fs-fuse can’t mount S3 buckets that contain data uploaded via awscli [see this GitHub Issue for s3fs-fuse]! However, I had to upload them via awscli because the web interface kept failing!

That means I need to upload the data directly to my EC2 instance, but my EC2 instance is set with the default storage capacity of 8GB so I need to increase the capacity to accommodate my two large files, as well as the anticipated intermediate files that will be generated by the types of analysis I plan on running. I’m guessing I’ll need at least 100GB to be safe. To do this, I have to expand the Elastic Block Storage (EBS) volume of my instance. The rest of stuff below is fully explained and covered very well in the EBS expansion link I have in the previous sentence.

Don’t be fooled into thinking I figured any of this out on my own!

Expanding the EC2 Instance

The initial part of the process is creating a Snapshot of my instance. This took a long time (2.5hrs). However, I did finally decide to refresh the page when I noticed that the “Status” progress bar hadn’t moved beyond 46% for well over an hour. After refreshing, the “Status” showed “Complete.” Maybe this actually was ready to go much faster, but the page didn’t automatically refresh? Regardless, in retrospect, since this EC2 instance is pretty much brand new and doesn’t have too many changes from when it was initialized, I probably should’ve just created a brand new EC2 instance with the desired amount of EBS…

Created volume from that Snapshot with 150GB of magnetic storage.

Attached volume to the EC2 instance at /dev/sda1 (the default setting /dev/sdf resulted in an error message about the instance not having a root volume) and SSH’d into the instance. Odd, it seems to show that I still only have 8GB of storage (see the “Usage of…” in the screenshot below):

Check to see if I actually have the expanded storage volume or not. It turns out, I do! (notice that the only drive listed is “xvda” and its partition, “xvda/xvda1″ AND they are equal in size; 150G):

Time to upload (via the secure copy command) the files to my EC2 instance! The following commands upload the files to a folder called “data” in my /home directory. I also ran the “time” command at the beginning to get an idea of how long it takes to upload each of these files.

time scp -i ~/Dropbox/Lab/Sam/bioinformatics.pem /Volumes/web/nightingales/O_lurida/20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz ubuntu@ec2.ip.address:~/data

time scp -i ~/Dropbox/Lab/Sam/bioinformatics.pem /Volumes/web/nightingales/O_lurida/20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz ubuntu@ec2.ip.address:~/data

Details on upload times and file sizes:

Confim the files now reside in my EC2 instance:

Alas, I should’ve captured all of this in a Jupyter Notebook. However, I didn’t because I thought I would need to enter passwords (which you can’t do with a Jupyter Notebook). It turns out, I didn’t need a password for anything; even when using “sudo” on the EC2 instance. Oh well, it’s set up and running with my data finally accessible. That’s all that really matters here.

Alrighty, time to get rolling on some data analysis with a fancy new Amazon EC2 instance!!!

0000-0002-2747-368X

This all takes a surprisingly long time to set up.

Setup AWS Identity and Access Management (IAM): http://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html?icmpid=docs_iam_console

Install AWS command line interface: https://aws.amazon.com/cli/

Copy files to S3 bucket:

aws s3 cp /Volumes/web/nightingales/O_lurida/20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_1.fq.gz s3://Samb

aws s3 cp /Volumes/web/nightingales/O_lurida/20160223_gbs/160123_I132_FCH3YHMBBXX_L4_OYSzenG1AAD96FAAPEI-109_2.fq.gz s3://Samb

Launch EC2 instance c4.2xlarge (Ubuntu 14.04 LTS, 8 vCPUs, 16 GiB RAM). Configured to have SSH open (TCP, port 22) and also to be able to access Jupyter Notebook via tunnel (TCP, port 8888). Set with “My IP” to limit access to these ports.

Create new key pair. Have to change permissions:

chmod 400 bioinformatics.pem

Connect to instance

For Amazon AMI:

ssh -i "bioinformatics.pem" ec2-user@ip.address.of.instance

For Amazon Ubuntu Server:

ssh -i "bioinformatics.pem" ubuntu@ip.address.of.instance

Update/Upgrade default Ubuntu packages at after initial launch:

sudo apt-get update

sudo apt-get upgrade

Set up Docker

Install Docker for Ubuntu 14.04 and copy our bioinformatics Dockerfile to the /home directory of the EC2 instance:

ssh -i "bioinformatics.pem" /Users/Sam/GitRepos/LabDocs/code/dockerfiles/Dockerfile.bio ubuntu@ip.address.of.instance:

Access data stored in Amazon S3 bucket(s)

Mounting S3 storage as volume in EC2 instance requires https://github.com/s3fs-fuse/s3fs-fuse

Mount bucket:

sudo s3fs Samb /mnt/s3bucket/ -o passwd_file=/home/ubuntu/s3fs_creds

Error:

s3fs: BUCKET Samb, name not compatible with virtual-hosted style.

Turns out, the error is due to the bucket name having an uppercase letter.

Made new bucket in S3 (via web interface) and copied data files to the new bucket. Will try mounting again once the files are copied over (this will take awhile; the two files total 36GB)..

Sam's Notebook

University of Washington – Fishery Sciences – Roberts Lab

Tag Archives: s3fs-fuse

Computing – A Very Quick “Guide” to Amazon EC2 Continued

Computing – The Very Quick “Guide” to Amazon Web Services Cloud Computing Instances (EC2)