How to access EODATA using boto3 on ESA HPC

In this article you will learn how to access EODATA repository using Python library called boto3, running on Linux virtual machine within ESA HPC cloud.

What Are We Going To Cover

  • Installing boto3

  • How to execute scripts found in this article

  • Browsing EODATA

  • Downloading a single file from EODATA repository

Prerequisites

No. 1 Account

You need a ESA HPC hosting account with access to the Horizon interface: https://horizon.eohpc.net/auth/login/?next=/.

No. 2 A virtual machine

You need a virtual machine running on ESA HPC cloud. This article is written for Ubuntu 20.04 and 24.04 versions.

Other operating systems might also work, but they are outside of scope of this article and might require adjusting of commands provided here.

The EODATA network in ESA HPC is being added automatically.

Linux VM

You can create a Linux virtual machine by following one of these articles:

Make sure you have an editor installed to create Python files with it. For example, install nano text editor with a command such as

sudo apt install nano

No. 3 Python

You need Python installed on your virtual machine. Execute this command to test whether Python is already installed or not:

python3 --version

If the reply contains version number then yes, Python is installed and ready to be used:

Python 3.12.3

To install Python on Linux, see How to install Python virtualenv or virtualenvwrapper on ESA HPC

No. 4 Obtained access and secret key

To access EODATA, you need to obtain your access and secret key. You can do it by following this article: How to get credentials used for accessing EODATA on a cloud VM on ESA HPC

No. 5 Basic knowledge about Python

boto3 is a Python library so you have to know your way around Python.

Installing boto3

Follow the appropriate procedures on installing boto3:

Installing boto3 on Linux

If you are using Python environment like virtualenv, enter the environment in which you wish to install boto3. In it, execute the following command:

pip3 install boto3

You can also install the package globally:

sudo apt install python3-boto3

How to execute scripts found in this article

The method of executing the scripts varies depending on the operating system of your choice.

How to execute scripts using Linux command line

Open a text editor of your choice like nano or vim. Paste the script. Perform appropriate modifications to the code as instructed (like assigning values to variables). Save the file.

Once you have exited from the text editor, execute the python3 command followed by the name of your script from the directory it is in. For example:

python3 browse.py

The script should be executed.

Browsing EODATA

You can use boto3 to browse the EODATA repository. This is Python code you are going to use:

import boto3

access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
directory='Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/'

host='https://eodata.cloudferro.com'
container='DIAS'

s3=boto3.client('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key,endpoint_url=host)

print(s3.list_objects(Delimiter='/',Bucket=container,Prefix=directory,MaxKeys=30000)['CommonPrefixes'])

These are the variables used in the code:

Variable name

What should be assigned to it

access_key

Your access key. Obtain it by following Prerequisite No. 4.

secret_key

Your secret key. Obtain it by following Prerequisite No. 4.

directory

The directory within EODATA repository which you want to explore.

When filling in the variable directory, make sure to follow these rules:

  • Use slashes / as separators between elements of that path - directories and files

  • Do not start the path with a slash /

  • Since the element you are exploring is a directory, finish the path with a slash /

  • Start path with folder name found within the root directory of the EODATA repository (for example Sentinel-2 or Sentinel-5P)

If you want to explore the root directory of the EODATA repository, assign an empty string to variable directory:

directory=''

If you don’t have a directory which you want to explore but you want to simply test this method, you can leave the value which was assigned to variable directory in the example code from above.

Variables host and container contain EODATA endpoint and the name of the container used, respectively. You do not need to modify them.

If you provided your access and secret keys but did not modify the variable directory, the code above will list products found in Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ directory of the EODATA repository. In that case, the output should look like this:

[{'Prefix': 'Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ASA_WSS_1PNESA20120408_110329_000000603113_00267_52867_0000.N1/'}, {'Prefix': 'Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ASA_WSS_1PNESA20120408_110428_000000603113_00267_52867_0000.N1/'}, {'Prefix': 'Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ASA_WSS_1PNESA20120408_110446_000000603113_00267_52867_0000.N1/'}]

This output can be described as a “list of dictionaries”. Each of those dictionaries contains a key called Prefix, providing the path to a file or directory. Instead of printing this list like above, you can loop through it to increase the legibility of the output:

import boto3

access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
directory='Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/'

host='https://eodata.cloudferro.com'
container='DIAS'

s3=boto3.client('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key,endpoint_url=host)

for i in s3.list_objects(Delimiter='/',Bucket=container,Prefix=directory,MaxKeys=30000)['CommonPrefixes']:
    print(i['Prefix'])

This time, the output should show only the paths:

Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ASA_WSS_1PNESA20120408_110329_000000603113_00267_52867_0000.N1/
Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ASA_WSS_1PNESA20120408_110428_000000603113_00267_52867_0000.N1/
Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ASA_WSS_1PNESA20120408_110446_000000603113_00267_52867_0000.N1/

Downloading a single file from EODATA repository

This section covers how to download a file from EODATA repository.

The script below should download that file to a directory from which the script is being executed. If that directory already contains a file which has the same name as the one you are downloading, it will be overwritten without prompt for confirmation.

The code is:

import boto3

access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
key='Landsat-5/TM/L1T/2011/11/16/LS05_RMPS_TM__GTC_1P_20111116T100042_20111116T100111_147386_0194_0035_4BF1/LS05_RMPS_TM__GTC_1P_20111116T100042_20111116T100111_147386_0194_0035_4BF1.BP.PNG'

host='https://eodata.cloudferro.com'
container='DIAS'

s3=boto3.resource('s3',aws_access_key_id=access_key,
aws_secret_access_key=secret_key, endpoint_url=host,)

bucket=s3.Bucket(container)

filename=key.split("/")[-1]

bucket.download_file(key, filename)

The variables are:

Variable name

What should be assigned to it

access_key

Your access key. Obtain it by following Prerequisite No. 4.

secret_key

Your secret key. Obtain it by following Prerequisite No. 4.

key

Full path (including folders) of a file you want to download from EODATA repository.

When filling in variable key, make sure to follow these rules:

  • Use slashes / as separators between elements of that path - directories and files

  • Do not start or finish the path with slash /

  • Start path with the name of the folder found within the root directory of the EODATA repository (for example Sentinel-2 or Sentinel-5P)

If you don’t have a file which you want to download but you simply want to test this method of downloading files, you can leave the value which was assigned to variable key in example code below.

Again, variable host and container contain the EODATA endpoint and the name of the container being used, respectively. You do not need to modify them.

If provided your access key and secret key but you did not change the contents of variable key, the code should download the file called

LS05_RMPS_TM__GTC_1P_20111116T100042_20111116T100111_147386_0194_0035_4BF1.BP.PNG

which is located within the root directory of product

LS05_RMPS_TM__GTC_1P_20111116T100042_20111116T100111_147386_0194_0035_4BF1

After executing the script, the output should be empty. Regardless, the downloaded file should be visible within the directory from which the script was executed. For example, this is how it will look like on Linux:

../_images/access-eodata-boto3-03_creodias.png

What To Do Next

You can further modify these scripts so that they better suit your needs, or integrate them with your own applications. These scripts might also work in other development environments. That is outside of scope of this article.

boto3 can also be used to access object storage containers from ESA HPC cloud: How to access object storage from ESA HPC using boto3