How to access object storage from ESA HPC using boto3
In this article, you will learn how to access object storage from ESA HPC using Python library boto3.
What We Are Going To Cover
Terminology: container and bucket
Preparing the environment
How to use the examples provided?
Creating a container
Listing buckets
Checking when a bucket was created
Listing files in a bucket
Listing files from a path in a bucket
Uploading file to a bucket
Downloading file from a bucket
Removing file from a bucket
Removing a bucket
Prerequisites
No. 1 Account
You need a ESA HPC hosting account with access to the Horizon interface: https://horizon.eohpc.net/auth/login/?next=/.
No. 2 Generated EC2 credentials
You need to generate EC2 credentials. Learn more here: How to generate ec2 credentials on ESA HPC
No. 3 A Linux or Windows computer or virtual machine
This article was written for Ubuntu 22.04 and Windows Server 2022. Other operating systems might work, but are out of scope of this article and might require you to adjust the commands supplied here.
You can use this article both on Linux or Windows virtual machines or on a local computer running one of those operating systems.
To build a new a virtual machine hosted on ESA HPC cloud, one of the following articles can help:
Terminology: container and bucket
The terms container (used by the Horizon dashboard) and bucket (used by boto3) are used here interchangeably, to describe the same thing.
Preparing the environment
To perform the examples in this article, you will need to have a working installation of Python 3 and the library boto3. The exact method of installation will depend on your goals and habits of working with Python.
Ubuntu 22.04
Method 1: Using virtualenvwrapper and pip
If you are a seasoned Python user and are working on multiple projects, you are bound to use Python virtual environments. It is advisable to create a new virtual environment for boto3 and here is one way to do it: How to install Python virtualenv or virtualenvwrapper on ESA HPC.
As mentioned in that article, use command workon to enter virtual environment. For example, if your environment is called managing-files execute:
workon managing-files
Once the environment is activated, install boto3:
pip3 install boto3
Method 2: Using apt
If you, however, don’t need such an environment and want to make boto3 available globally under Ubuntu, use apt. The following command will both install Python3 and the boto3 library:
sudo apt install python3 python3-boto3
Windows Server 2022
If you are using Windows, follow this article to learn how to install boto3: /s3/How-To-Install-boto3-In-Windows-on-ESA-HPC
How to use the examples provided?
Each of the examples provided can serve as standalone code. You should be able to
enter the code in your text editor,
define appropriate content for the variables provided,
save it as a file and
run.
You can also use these examples as a starting point for your own code.
Each example has three parts:
- import boto3 – blue rectangle
First you import the library boto3, which is standard Python practice.
- Define input parameters for boto3 calls – red rectangle
A typical call in boto3 library will look like this:
s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.waw3-1.cloudferro.com', region_name='waw3-1')
For each call, several variables must be present:
aws_access_key_id – from Prerequisite No. 2
aws_secret_access_key – also from Prerequisite No. 2
endpoint_url – predefined in example source code with a specific value for each cloud
region_name – the same as above
Other variables may be needed on a case by case basis:
name,
path
and possibly others. Be sure to enter the values for all those additional variables before running the examples.
- boto3 call – green rectangle
Once you have provided all the values to the appropriate variables, you can save the file with a .py extension. One of the ways of running Python scripts is using the command line. To run such a script, first navigate to the folder in which the script is located. After that, execute the command below, but replace script.py with the name of the script you wish to run.
python3 script.py
python script.py
Make sure that the name and/or location of the file is passed to the shell correctly – watch out for any spaces and other special characters.
IMPORTANT
In these examples, even in case of potentially destructive operations, you will not be asked for confirmation. If, for example, a script saves a file and a file already exists under the designated name and location, it will likely be overwritten. Please be sure that that is what you want before running the script, or enhance the code with checks whether the file already exists.
In all the examples that follow, we assume that the corresponding files and buckets already exist. Adding code for checks of that kind is out of scope of this article.
Creating a container
To create a new container, first select a name for your bucket and enter it in the name variable. On ESA HPC cloud, use only letters, numbers and hyphens.
import boto3
access_key = ''
secret_key = ''
name = ''
try:
s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')
s3.Bucket(name).create()
except Exception as issue:
print("The following error occurred:")
print(issue)
Successful execution of this code should produce no output.
To test whether the bucket was created, you can, among other things, list buckets as described in section Listing buckets below.
Troubleshooting creating a container
Bucket already exists
If you receive the following output:
The following error occurred:
An error occurred (BucketAlreadyExists) when calling the CreateBucket operation: Unknown
it means that you cannot choose this name for a bucket because a bucket under this name already exists. Choose a different name and try again.
Invalid characters used
If you used wrong characters in the container name, you should receive an error similar to this:
Invalid bucket name "this container should not exist": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"
To resolve, choose a different name - on ESA HPC cloud, use only letters, numbers and hyphens.
Listing buckets
This code allows you to list buckets.
import boto3
access_key = ''
secret_key = ''
name = ''
try:
s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')
print(s3.list_buckets()['Buckets'])
except Exception as issue:
print("The following error occurred:")
print(issue)
The output should be a list of dictionaries, each providing information regarding a particular bucket, starting with its name. If two buckets, my-files and my-other-files, already exist, the output might be similar to this:
[{'Name': 'my-files', 'CreationDate': datetime.datetime(2024, 1, 23, 14, 21, 3, 70000, tzinfo=tzlocal())}, {'Name': 'my-other-files', 'CreationDate': datetime.datetime(2024, 1, 23, 14, 21, 7, 993000, tzinfo=tzlocal())}]
To simplify the output, use a for loop with print statement to display only names of your buckets, one bucket per line:
import boto3
access_key = ''
secret_key = ''
name = ''
try:
s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')
for i in s3.list_buckets()['Buckets']:
print(i['Name'])
except Exception as issue:
print("The following error occurred:")
print(issue)
An example output for this code can look like this:
my-files
my-other-files
If you have no buckets, the output should be empty.
Checking when a bucket was created
Use this code to check the date on which a bucket was created. Enter the name of that bucket in variable name
import boto3
access_key = ''
secret_key = ''
name = ''
try:
s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')
print(s3.Bucket(name).creation_date)
except Exception as issue:
print("The following error occurred:")
print(issue)
The output should contain the date the bucket was created, in format of Python datetime object:
2024-01-23 14:21:03.070000+00:00
Listing files in a bucket
To list files you have in a bucket, provide bucket name in name variable.
import boto3
access_key = ''
secret_key = ''
name = ''
try:
s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')
bucket=s3.Bucket(name)
for obj in bucket.objects.filter():
print(obj)
except Exception as issue:
print("The following error occurred:")
print(issue)
Your output should contain the list of your files, like so:
s3.ObjectSummary(bucket_name='my-files', key='some-directory/')
s3.ObjectSummary(bucket_name='my-files', key='some-directory/another-file.txt')
s2.ObjectSummary(bucket_name='my-files', key='some-directory/first-file.txt')
s3.ObjectSummary(bucket_name='my-files', key='some-directory/some-other-directory/')
s3.ObjectSummary(bucket_name='my-files', key='some-directory/some-other-directory/some-other-file.txt')
s3.ObjectSummary(bucket_name='my-files', key='some-directory/some-other-directory/yet-another-file.txt')
s3.ObjectSummary(bucket_name='my-files', key='text-file.txt')
If there are no files in your bucket, the output should be empty.
Troubleshooting listing files in a bucket
No access to bucket
If your key pair does not have access to the chosen bucket, you should get the error like this:
botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListObjects operation: Unknown
In this case, choose a different bucket, or a different key pair if you have one which can access it.
Bucket does not exist
If a bucket you chose does not exist, the error might be:
botocore.errorfactory.NoSuchBucket: An error occurred (NoSuchBucket) when calling the ListObjects operation: Unknown
If you need a bucket which uses that name, try to create it as explained in the section Creating a container above.
Listing files from particular path in a bucket
This example will list only objects from a certain path. There are two rules to follow for path variable:
End it with a slash
Do not start it with a slash.
As always, add the name of the bucket to name variable.
import boto3
access_key = ''
secret_key = ''
name = ''
try:
s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')
bucket=s3.Bucket(name)
for obj in bucket.objects.filter(Prefix=path):
print(obj)
except Exception as issue:
print("The following error occurred:")
print(issue)
A standard output should be produced but if there are no files under chosen path, the output will be empty.
Uploading file to a bucket
To upload a file to the container, add the following content to variables:
Variable name |
What should it contain |
name |
The name of the bucket to which you want to upload your file. |
source |
The location of the file you wish to upload in your local file system. |
destination |
The path in your container under which you want to upload the file. Should only contain letters, digits, hyphens and slashes. |
Two caveats for variable destination:
Finish it with the name of the file you are uploading and
Do not start or finish it with a slash.
This is the code:
import boto3
access_key = ''
secret_key = ''
name = ''
try:
s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')
bucket=s3.Bucket(name)
bucket.upload_file(source, destination)
except Exception as issue:
print("The following error occurred:")
print(issue)
If the operation was successful, the output should be empty.
Example variables
Suppose you want to upload a file
named file.txt,
located in folder here,
which, in turn, is located in folder from which you are running the script and in which the script is located.
The destination is path some-directory in a bucket called my-files.
This is how the variables should be set up in this case:
name = 'my-files'
source = 'here/file.txt'
destination = 'some-directory/file.txt'
Troubleshooting uploading file to a bucket
File you want to upload does not exist
If you specified a non-existent file in variable source, you should get error similar to this:
The following error occurred:
[Errno 2] No such file or directory: 'here/wrong-file.txt'
To resolve, specify the correct file and try again.
Downloading file from a bucket
To save file from a bucket to your local hard drive, fill in the values of the following variables and run the code below:
Variable name |
What should it contain |
name |
The name of the bucket from which you wish to download your file. |
source |
The path in the container from which you wish to download your file. |
destination |
The path in your local file system under which you wish to save your file. |
Do not start or finish the variable source with a slash.
import boto3
access_key = ''
secret_key = ''
name = ''
try:
s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')
bucket=s3.Bucket(name)
bucket.download_file(source, destination)
except Exception as issue:
print("The following error occurred:")
print(issue)
Successful execution of this code should produce no output.
Example variables
Let’s suppose you are running this script from the same folder in which that script is located. You are downloading the file called
first-file.txt, located in
container named my-files, which in turn is
under directory named some-directory.
The goal is to save it to a folder called here, which is located in the same folder as the script.
Set up the variables like this:
name = 'my-files'
source = 'some-directory/first-file.txt'
destination = 'here/first-file.txt'
Note
On Ubuntu, file paths are written with forward slashes but on Windows, file paths are usually written using backslashes. The above code in Python is written with forward slashes but will still be successfully executed on both Windows and Linux.
Troubleshooting uploading file to a bucket
File does not exist in bucket
If a file you chose does not exist in the bucket, the following error should appear:
The following error occurred:
An error occurred (404) when calling the HeadObject operation: Not Found
To resolve, make sure that the correct file was specified in the first place.
Removing file from a bucket
To remove file from your bucket, supply the name of the bucket to the variable name and its full path to the variable path.
import boto3
access_key = ''
secret_key = ''
name = ''
try:
s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')
s3.Object(name, path).delete()
except Exception as issue:
print("The following error occurred:")
print(issue)
Successful execution of this code should produce no output.
Removing a bucket
To remove a bucket, first remove all objects from it. Once it is empty, define variable name for the bucket you want to remove and execute the code below:
import boto3
access_key = ''
secret_key = ''
name = ''
try:
s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')
s3.Bucket(name).delete()
except Exception as issue:
print("The following error occurred:")
print(issue)
Successful execution of this code should produce no output.
Troubleshooting removing a bucket
Bucket does not exist or is unavailable to your key pair
If the bucket does not exist or is unavailable for your key pair, you should get the following output:
The following error occurred:
An error occurred (NoSuchBucket) when calling the DeleteBucket operation: Unknown
Bucket is not empty
If the bucket is not empty, it cannot be deleted. The message will be:
The following error occurred:
An error occurred (BucketNotEmpty) when calling the DeleteBucket operation: Unknown
To resolve, remove all objects from the bucket and try again.
General troubleshooting
No connection to the endpoint
If you do not have connection to the endpoint (for example because you lost Internet connection), you should get output similar to this:
The following error occurred:
Could not connect to the endpoint URL: "https://s3.waw3-2.cloudferro.com/"
If that is the case, make sure that you are connected to the Internet. If you are sure that you are connected to the Internet and no downtime for ESA HPC object storage was announced, contact ESA HPC customer support: /{{ gettingstarted }}/Help-Desk-And-Support
Wrong credentials
If the access and/or secret keys are wrong, a message like this will appear:
The following error occurred:
An error occurred (InvalidAccessKeyId) when calling the ListBuckets operation: Unknown
Refer to Prerequisite No. 2 for how to obtain valid credentials.
Bucket does not exist or is unavailable to your key pair
If the bucket you chose for this code does not exist or is unavailable for your key pair, you can get different outputs depending on the command you are executing. Below are two such examples:
None
The following error occurred:
An error occurred (NoSuchBucket) when calling the DeleteObject operation: Unknown
What To Do Next
boto3 can also be used to access the EODATA repository on virtual machines hosted on ESA HPC cloud.
Learn more here: How to access EODATA using boto3 on ESA HPC