How to access object storage from ESA HPC using s3cmd
In this article, you will learn how to access object storage from ESA HPC on Linux using s3cmd, without mounting it as a file system. This can be done on a virtual machine on ESA HPC cloud or on a local Linux computer.
What We Are Going To Cover
Object storage vs. standard file system
Terminology: container and bucket
Configuring s3cmd
S3 paths in s3cmd
Listing containers
Creating a container
Uploading a file to a container
Listing files and directories of the root directory of a container
Listing files and directories not in the root directory of a container
Removing a file from a container
Downloading a file from a container
Checking how much storage is being used on a container
Removing the entire container
Prerequisites
No. 1 Account
You need a ESA HPC hosting account with access to the Horizon interface: https://horizon.eohpc.net/auth/login/?next=/.
No. 2 Generated EC2 credentials
You need generate EC2 credentials. Learn more here: How to generate ec2 credentials on ESA HPC
No. 3 A Linux computer or virtual machine
You need a Linux virtual machine or local computer. This article was written for Ubuntu 22.04. Other operating systems might work, but are out of scope of this article and might require adjusting of commands.
If you want to use a virtual machine hosted on ESA HPC cloud and you don’t have it yet, one of these articles can help:
No. 4 s3cmd installed
You need to have s3cmd installed on your virtual machine or computer. Learn more here:
No. 5 Understanding how does s3cmd handle its configuration file
s3cmd stores its configuration in configuration files, one connection per file. You need to decide where you want to keep connection data for object storage. Learn more here:
Object storage vs. standard file system
Object storage uses objects and not files and folders. To makes things easier, Horizon dashboard (section Object Store -> Containers) presents these objects as files and folders. This representation will usually make sense, but there are cases in which object storage is clearly different from conventional file system.
For example, in conventional file systems, you usually can’t have a file and folder under the same name in the same location. If you have a file named something in a directory, you can’t create a directory under the name something in the same location.
In object storage, however, it is possible. One path can contain both an object named something/ and an object named something. The former will be represented by the Horizon dashboard as a folder named something, and the latter as a downloadable file named something.
Terminology: container and bucket
The terms container (used by the Horizon dashboard) and bucket (used by s3cmd) are used in this article interchangeably to describe the same thing.
Configuring s3cmd
Throughout this article, we are going to use file storage-access in directory /home/eouser as an example configuration file; feel free to adjust the commands according to your needs.
Here is the command to start the process of configuring the s3cmd:
s3cmd -c /home/eouser/storage-access --configure
A series of questions should appear. Answer them as shown below, but replace access_key and secret_key with the access and secret keys you obtained while following Prerequisite No. 2, respectively. Also, replace your_password with the password to your account. After answering each question, press Enter.
New settings:
Access Key: access_key
Secret Key: secret_key
Default Region: default
S3 Endpoint:
DNS-style bucket+hostname:port template for accessing a bucket:
Encryption password: your_password
Path to GPG program: /usr/bin/gpg
Use HTTPS protocol: Yes
HTTP Proxy server name:
HTTP Proxy server port: 0
Once you have provided the configuration, you should be asked the following question:
Test access with supplied credentials? [Y/n]
Answer with Y and press Enter.
Wait until testing is completed. It should take no more than a couple of seconds. If the process was successful, you should get the output below:
Please wait, attempting to list all buckets...
Success. Your access key and secret key worked fine :-)
Now verifying that encryption works...
Success. Encryption and decryption worked fine :-)
Save settings? [y/N]
Once again, answer with Y and press Enter.
You should get appropriate confirmation and be returned to the command prompt. The confirmation can look like this:
Configuration saved to 'storage-access'
You should now be returned to the command prompt.
S3 paths in s3cmd
When accessing object storage from your project, s3cmd uses path structure which starts with s3:// and is followed by the name of the container which ends in a slash. After that, you have a path with its elements separated by / characters. For instance, what the Horizon dashboard shows as a file called my-file.txt in a directory called my-folder in a container called my-container will be referred to by s3cmd as s3://my-container/my-folder/my-file.txt.
Listing containers
To list your containers, execute the following command:
s3cmd -c /home/eouser/storage-access ls
If you do not have any containers, the output of the command should be empty.
If you do have containers, you should get their list, for example:
2024-01-23 14:21 s3://my-files
2024-01-23 14:21 s3://my-other-files
Creating a container
You can use s3cmd to create a container. To do that, execute the command below (replace my-bucket with the name you wish to choose for your bucket):
s3cmd -c /home/eouser/storage-access mb s3://my-bucket
If the process was successful, you should get the output like this:
Bucket 's3://my-bucket/' created
Your container should now be visible after using the s3cmd ls command on your connection:
2024-01-04 10:49 s3://my-bucket
Troubleshooting
Container already exists
If you receive the output like this:
ERROR: Bucket 'my-bucket' already exists
ERROR: S3 error: 409 (BucketAlreadyExists)
choose a different name of the container and try again.
Using unsupported characters in container name
If you attempt to create a container which has a name containing characters which are not allowed, you should get output like this:
ERROR: Parameter problem: Bucket name 'my_bucket' contains disallowed character '_'. The only supported ones are: lowercase us-ascii letters (a-z), digits (0-9), dot (.) and hyphen (-).
In this case, make sure that you only use supported characters.
Uploading a file to a container
To upload a file into a container, enter the command below. Replace /home/eouser/my-file.txt with the name and location of the file you wish to upload. Replace my-bucket/my-directory with the path under which you wish to upload your file.
s3cmd -c /home/eouser/storage-access put /home/eouser/my-file.txt s3://my-bucket/my-directory/
Make sure that you finish the path with a slash. If you skip the slash, the file will not be shown in the Horizon as uploaded to the folder specified by you, but will be uploaded under that name. If you want Horizon to show the folder directly to the root “folder” of the container, enter only s3://, followed by the name of your container as a path.
The output should show the upload process. Once it is successful, you should be returned to the command prompt.
Listing files and folders of the root directory of a container
To list files and folders inside the root directory of a container, execute the s3cmd ls command on the name of the container preceded by s3://
For example, to list content of the container called my-files, execute this command:
s3cmd -c /home/eouser/storage-access ls s3://my-files
The following command (ending with a slash) should also work for this purpose:
s3cmd -c /home/eouser/storage-access ls s3://my-files
Let’s assume that the location you used only contains directory (which may or may not contain other files) called some-directory and a file named text-file.txt. In this case, the output will look like this:
DIR s3://my-files/some-directory/
2024-01-23 14:39 10 s3://my-files/text-file.txt
Listing files and folders not in the root directory of a container
To list files inside what Horizon shows as folders use the full path separated by slashes (end it also with a slash). For example, to list content of the folder called some-other-directory inside folder some-directory in a container called my-files, execute:
s3cmd -c /home/eouser/storage-access ls s3://my-files/some-directory/some-other-directory/
The output should contain the list of objects located there, for example:
2024-01-23 14:30 0 s3://my-files/some-directory/some-other-directory/
2024-01-23 14:32 19 s3://my-files/some-directory/some-other-directory/some-other-file.txt
2024-01-23 14:31 19 s3://my-files/some-directory/some-other-directory/yet-another-file.txt
Ending paths with slash while using s3cmd ls
While listing contents of directories using the s3cmd ls commands, make sure to finish the paths with a slash. For example, listing files found in the folder called some-directory in the container called my-files can be done this way:
s3cmd -c /home/eouser/storage-access ls s3://my-files/some-directory/
In this case, the output should be correct (note that in this example the directory some-other-directory may or may not contain some files):
DIR s3://my-files/some-directory/some-other-directory/
2024-01-23 14:25 0 s3://my-files/some-directory/
2024-01-23 14:27 15 s3://my-files/some-directory/another-file.txt
2024-01-23 14:28 23 s3://my-files/some-directory/first-file.txt
If you, however, omit the slash in the end:
s3cmd -c /home/eouser/eodata-access ls s3://my-files/some-directory
your output will only contain your request with added slash in the end:
DIR s3://my-files/some-directory/
Removing a file from a container
To delete a file from a container, execute the s3cmd rm command with the path of a file you wish to remove. For example:
s3cmd -c /home/eouser/storage-access rm s3://my-bucket/my-folder/my-file.txt
You should get a confirmation like this:
delete: 's3://my-bucket/my-folder/my-file.txt'
Downloading a file from a container
To download a file from the bucket to your local hard drive, execute the s3cmd get command with the full path to the file you wish to download. Example:
s3cmd -c /home/eouser/storage-access get s3://my-bucket/my-folder/my-file.txt
The download process to the current working directory should begin.
Troubleshooting
If the file already exists in the current working directory, you should get the output like this:
ERROR: Parameter problem: File ./my-file.txt already exists. Use either of --force / --continue / --skip-existing or give it a new name.
If you use the –force parameter, the file which was detected by s3cmd will be overwritten. The parameter –continue can be used to resume stopped download process. If you want to give a file another name, put it after the S3 path, separated by a space. For example:
s3cmd -c /home/eouser/storage-access get s3://my-bucket/my-folder/my-file.txt another-name.txt
Checking how much storage is being used on a container
To check how much space is used by a particular path, use the s3cmd du command. If you want to have human-readable values, use -H parameter. For example:
s3cmd -c /home/eouser/storage-access du -H s3://my-bucket
The output should look like this:
318M 2 objects s3://my-bucket/
Removing the entire container
You can use s3cmd to delete the whole container. In order to do that, it must be empty.
Execute the s3cmd rb command followed by space, s3:// and the name of the bucket. For example, to remove the container called my-bucket, execute this command:
s3cmd -c /home/eouser/storage-access rb s3://my-bucket
If the process was successful, you should get appropriate confirmation like this:
Bucket 's3://my-bucket/' removed
What To Do Next
You can use s3cmd to share your container outside of your project by applying a sharing policy. This article contains more information:
Bucket sharing using s3 bucket policy on ESA HPC
s3cmd can also be used to access the eodata repository. Learn more here: