How to access object storage from ESA HPC using boto3

In this article, you will learn how to access object storage from ESA HPC using Python library boto3.

What We Are Going To Cover

  • Terminology: container and bucket

  • Preparing the environment

  • How to use the examples provided?

  • Creating a container

  • Listing buckets

  • Checking when a bucket was created

  • Listing files in a bucket

  • Listing files from a path in a bucket

  • Uploading file to a bucket

  • Downloading file from a bucket

  • Removing file from a bucket

  • Removing a bucket

Prerequisites

No. 1 Account

You need a ESA HPC hosting account with access to the Horizon interface: https://horizon.eohpc.net/auth/login/?next=/.

No. 2 Generated EC2 credentials

You need to generate EC2 credentials. Learn more here: How to generate ec2 credentials on ESA HPC

No. 3 A Linux or Windows computer or virtual machine

This article was written for Ubuntu 22.04 and Windows Server 2022. Other operating systems might work, but are out of scope of this article and might require you to adjust the commands supplied here.

You can use this article both on Linux or Windows virtual machines or on a local computer running one of those operating systems.

To build a new a virtual machine hosted on ESA HPC cloud, one of the following articles can help:

Terminology: container and bucket

The terms container (used by the Horizon dashboard) and bucket (used by boto3) are used here interchangeably, to describe the same thing.

Preparing the environment

To perform the examples in this article, you will need to have a working installation of Python 3 and the library boto3. The exact method of installation will depend on your goals and habits of working with Python.

Ubuntu 22.04

Method 1: Using virtualenvwrapper and pip

If you are a seasoned Python user and are working on multiple projects, you are bound to use Python virtual environments. It is advisable to create a new virtual environment for boto3 and here is one way to do it: How to install Python virtualenv or virtualenvwrapper on ESA HPC.

As mentioned in that article, use command workon to enter virtual environment. For example, if your environment is called managing-files execute:

workon managing-files

Once the environment is activated, install boto3:

pip3 install boto3

Method 2: Using apt

If you, however, don’t need such an environment and want to make boto3 available globally under Ubuntu, use apt. The following command will both install Python3 and the boto3 library:

sudo apt install python3 python3-boto3

Windows Server 2022

If you are using Windows, follow this article to learn how to install boto3: /s3/How-To-Install-boto3-In-Windows-on-ESA-HPC

How to use the examples provided?

Each of the examples provided can serve as standalone code. You should be able to

  • enter the code in your text editor,

  • define appropriate content for the variables provided,

  • save it as a file and

  • run.

You can also use these examples as a starting point for your own code.

../_images/code_structure.png

Each example has three parts:

import boto3 – blue rectangle

First you import the library boto3, which is standard Python practice.

Define input parameters for boto3 calls – red rectangle

A typical call in boto3 library will look like this:

s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.waw3-1.cloudferro.com', region_name='waw3-1')

For each call, several variables must be present:

  • aws_access_key_id – from Prerequisite No. 2

  • aws_secret_access_key – also from Prerequisite No. 2

  • endpoint_url – predefined in example source code with a specific value for each cloud

  • region_name – the same as above

Other variables may be needed on a case by case basis:

  • name,

  • path

and possibly others. Be sure to enter the values for all those additional variables before running the examples.

boto3 call – green rectangle

Once you have provided all the values to the appropriate variables, you can save the file with a .py extension. One of the ways of running Python scripts is using the command line. To run such a script, first navigate to the folder in which the script is located. After that, execute the command below, but replace script.py with the name of the script you wish to run.

python3 script.py

Make sure that the name and/or location of the file is passed to the shell correctly – watch out for any spaces and other special characters.

IMPORTANT

In these examples, even in case of potentially destructive operations, you will not be asked for confirmation. If, for example, a script saves a file and a file already exists under the designated name and location, it will likely be overwritten. Please be sure that that is what you want before running the script, or enhance the code with checks whether the file already exists.

In all the examples that follow, we assume that the corresponding files and buckets already exist. Adding code for checks of that kind is out of scope of this article.

Creating a container

To create a new container, first select a name for your bucket and enter it in the name variable. On ESA HPC cloud, use only letters, numbers and hyphens.

import boto3

access_key = ''
secret_key = ''

name = ''

try:
    s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')

    s3.Bucket(name).create()

except Exception as issue:
    print("The following error occurred:")
    print(issue)

Successful execution of this code should produce no output.

To test whether the bucket was created, you can, among other things, list buckets as described in section Listing buckets below.

Troubleshooting creating a container

Bucket already exists

If you receive the following output:

The following error occurred:
An error occurred (BucketAlreadyExists) when calling the CreateBucket operation: Unknown

it means that you cannot choose this name for a bucket because a bucket under this name already exists. Choose a different name and try again.

Invalid characters used

If you used wrong characters in the container name, you should receive an error similar to this:

Invalid bucket name "this container should not exist": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).*:(s3|s3-object-lambda):[a-z\-0-9]*:[0-9]{12}:accesspoint[/:][a-zA-Z0-9\-.]{1,63}$|^arn:(aws).*:s3-outposts:[a-z\-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9\-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9\-]{1,63}$"

To resolve, choose a different name - on ESA HPC cloud, use only letters, numbers and hyphens.

Listing buckets

This code allows you to list buckets.

import boto3

access_key = ''
secret_key = ''

name = ''

try:
    s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')

    print(s3.list_buckets()['Buckets'])

except Exception as issue:
    print("The following error occurred:")
    print(issue)

The output should be a list of dictionaries, each providing information regarding a particular bucket, starting with its name. If two buckets, my-files and my-other-files, already exist, the output might be similar to this:

[{'Name': 'my-files', 'CreationDate': datetime.datetime(2024, 1, 23, 14, 21, 3, 70000, tzinfo=tzlocal())}, {'Name': 'my-other-files', 'CreationDate': datetime.datetime(2024, 1, 23, 14, 21, 7, 993000, tzinfo=tzlocal())}]

To simplify the output, use a for loop with print statement to display only names of your buckets, one bucket per line:

import boto3

access_key = ''
secret_key = ''

name = ''

try:
    s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')

    for i in s3.list_buckets()['Buckets']:
        print(i['Name'])

except Exception as issue:
    print("The following error occurred:")
    print(issue)

An example output for this code can look like this:

my-files
my-other-files

If you have no buckets, the output should be empty.

Checking when a bucket was created

Use this code to check the date on which a bucket was created. Enter the name of that bucket in variable name

import boto3

access_key = ''
secret_key = ''

name = ''

try:
    s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')

    print(s3.Bucket(name).creation_date)

except Exception as issue:
    print("The following error occurred:")
    print(issue)

The output should contain the date the bucket was created, in format of Python datetime object:

2024-01-23 14:21:03.070000+00:00

Listing files in a bucket

To list files you have in a bucket, provide bucket name in name variable.

import boto3

access_key = ''
secret_key = ''

name = ''

try:
    s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')

    bucket=s3.Bucket(name)
    for obj in bucket.objects.filter():
        print(obj)

except Exception as issue:
    print("The following error occurred:")
    print(issue)

Your output should contain the list of your files, like so:

s3.ObjectSummary(bucket_name='my-files', key='some-directory/')
s3.ObjectSummary(bucket_name='my-files', key='some-directory/another-file.txt')
s2.ObjectSummary(bucket_name='my-files', key='some-directory/first-file.txt')
s3.ObjectSummary(bucket_name='my-files', key='some-directory/some-other-directory/')
s3.ObjectSummary(bucket_name='my-files', key='some-directory/some-other-directory/some-other-file.txt')
s3.ObjectSummary(bucket_name='my-files', key='some-directory/some-other-directory/yet-another-file.txt')
s3.ObjectSummary(bucket_name='my-files', key='text-file.txt')

If there are no files in your bucket, the output should be empty.

Troubleshooting listing files in a bucket

No access to bucket

If your key pair does not have access to the chosen bucket, you should get the error like this:

botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the ListObjects operation: Unknown

In this case, choose a different bucket, or a different key pair if you have one which can access it.

Bucket does not exist

If a bucket you chose does not exist, the error might be:

botocore.errorfactory.NoSuchBucket: An error occurred (NoSuchBucket) when calling the ListObjects operation: Unknown

If you need a bucket which uses that name, try to create it as explained in the section Creating a container above.

Listing files from particular path in a bucket

This example will list only objects from a certain path. There are two rules to follow for path variable:

  • End it with a slash

  • Do not start it with a slash.

As always, add the name of the bucket to name variable.

import boto3

access_key = ''
secret_key = ''

name = ''

try:
    s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')

    bucket=s3.Bucket(name)
    for obj in bucket.objects.filter(Prefix=path):
        print(obj)

except Exception as issue:
    print("The following error occurred:")
    print(issue)

A standard output should be produced but if there are no files under chosen path, the output will be empty.

Uploading file to a bucket

To upload a file to the container, add the following content to variables:

Variable name

What should it contain

name

The name of the bucket to which you want to upload your file.

source

The location of the file you wish to upload in your local file system.

destination

The path in your container under which you want to upload the file. Should only contain letters, digits, hyphens and slashes.

Two caveats for variable destination:

  1. Finish it with the name of the file you are uploading and

  2. Do not start or finish it with a slash.

This is the code:

import boto3

access_key = ''
secret_key = ''

name = ''

try:
    s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')

    bucket=s3.Bucket(name)
    bucket.upload_file(source, destination)


except Exception as issue:
    print("The following error occurred:")
    print(issue)

If the operation was successful, the output should be empty.

Example variables

Suppose you want to upload a file

  • named file.txt,

  • located in folder here,

  • which, in turn, is located in folder from which you are running the script and in which the script is located.

  • The destination is path some-directory in a bucket called my-files.

This is how the variables should be set up in this case:

name = 'my-files'
source = 'here/file.txt'
destination = 'some-directory/file.txt'

Troubleshooting uploading file to a bucket

File you want to upload does not exist

If you specified a non-existent file in variable source, you should get error similar to this:

The following error occurred:
[Errno 2] No such file or directory: 'here/wrong-file.txt'

To resolve, specify the correct file and try again.

Downloading file from a bucket

To save file from a bucket to your local hard drive, fill in the values of the following variables and run the code below:

Variable name

What should it contain

name

The name of the bucket from which you wish to download your file.

source

The path in the container from which you wish to download your file.

destination

The path in your local file system under which you wish to save your file.

Do not start or finish the variable source with a slash.

import boto3

access_key = ''
secret_key = ''

name = ''

try:
    s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')

    bucket=s3.Bucket(name)

    bucket.download_file(source, destination)

except Exception as issue:
    print("The following error occurred:")
    print(issue)

Successful execution of this code should produce no output.

Example variables

Let’s suppose you are running this script from the same folder in which that script is located. You are downloading the file called

  • first-file.txt, located in

  • container named my-files, which in turn is

  • under directory named some-directory.

The goal is to save it to a folder called here, which is located in the same folder as the script.

Set up the variables like this:

name = 'my-files'
source = 'some-directory/first-file.txt'
destination = 'here/first-file.txt'

Note

On Ubuntu, file paths are written with forward slashes but on Windows, file paths are usually written using backslashes. The above code in Python is written with forward slashes but will still be successfully executed on both Windows and Linux.

Troubleshooting uploading file to a bucket

File does not exist in bucket

If a file you chose does not exist in the bucket, the following error should appear:

The following error occurred:
An error occurred (404) when calling the HeadObject operation: Not Found

To resolve, make sure that the correct file was specified in the first place.

Removing file from a bucket

To remove file from your bucket, supply the name of the bucket to the variable name and its full path to the variable path.

import boto3

access_key = ''
secret_key = ''

name = ''

try:
    s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')

    s3.Object(name, path).delete()

except Exception as issue:
    print("The following error occurred:")
    print(issue)

Successful execution of this code should produce no output.

Removing a bucket

To remove a bucket, first remove all objects from it. Once it is empty, define variable name for the bucket you want to remove and execute the code below:

import boto3

access_key = ''
secret_key = ''

name = ''

try:
    s3=boto3.resource('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key, endpoint_url='https://s3.eohpc.net', region_name='waw3-1')

    s3.Bucket(name).delete()

except Exception as issue:
    print("The following error occurred:")
    print(issue)

Successful execution of this code should produce no output.

Troubleshooting removing a bucket

Bucket does not exist or is unavailable to your key pair

If the bucket does not exist or is unavailable for your key pair, you should get the following output:

The following error occurred:
An error occurred (NoSuchBucket) when calling the DeleteBucket operation: Unknown

Bucket is not empty

If the bucket is not empty, it cannot be deleted. The message will be:

The following error occurred:
An error occurred (BucketNotEmpty) when calling the DeleteBucket operation: Unknown

To resolve, remove all objects from the bucket and try again.

General troubleshooting

No connection to the endpoint

If you do not have connection to the endpoint (for example because you lost Internet connection), you should get output similar to this:

The following error occurred:
Could not connect to the endpoint URL: "https://s3.waw3-2.cloudferro.com/"

If that is the case, make sure that you are connected to the Internet. If you are sure that you are connected to the Internet and no downtime for ESA HPC object storage was announced, contact ESA HPC customer support: /{{ gettingstarted }}/Help-Desk-And-Support

Wrong credentials

If the access and/or secret keys are wrong, a message like this will appear:

The following error occurred:
An error occurred (InvalidAccessKeyId) when calling the ListBuckets operation: Unknown

Refer to Prerequisite No. 2 for how to obtain valid credentials.

Bucket does not exist or is unavailable to your key pair

If the bucket you chose for this code does not exist or is unavailable for your key pair, you can get different outputs depending on the command you are executing. Below are two such examples:

None
The following error occurred:
An error occurred (NoSuchBucket) when calling the DeleteObject operation: Unknown

What To Do Next

boto3 can also be used to access the EODATA repository on virtual machines hosted on ESA HPC cloud.

Learn more here: How to access EODATA using boto3 on ESA HPC