How to access EODATA using boto3 on ESA HPC
In this article you will learn how to access EODATA repository using Python library called boto3, running on Linux virtual machine within ESA HPC cloud.
What Are We Going To Cover
Installing boto3
How to execute scripts found in this article
Browsing EODATA
Downloading a single file from EODATA repository
Prerequisites
No. 1 Account
You need a ESA HPC hosting account with access to the Horizon interface: https://horizon.eohpc.net/auth/login/?next=/.
No. 2 A virtual machine
You need a virtual machine running on ESA HPC cloud. This article is written for Ubuntu 20.04 and 24.04 versions.
Other operating systems might also work, but they are outside of scope of this article and might require adjusting of commands provided here.
The EODATA network in ESA HPC is being added automatically.
Linux VM
You can create a Linux virtual machine by following one of these articles:
Make sure you have an editor installed to create Python files with it. For example, install nano text editor with a command such as
sudo apt install nano
No. 3 Python
You need Python installed on your virtual machine. Execute this command to test whether Python is already installed or not:
python3 --version
If the reply contains version number then yes, Python is installed and ready to be used:
Python 3.12.3
To install Python on Linux, see How to install Python virtualenv or virtualenvwrapper on ESA HPC
No. 4 Obtained access and secret key
To access EODATA, you need to obtain your access and secret key. You can do it by following this article: How to get credentials used for accessing EODATA on a cloud VM on ESA HPC
No. 5 Basic knowledge about Python
boto3 is a Python library so you have to know your way around Python.
Installing boto3
Follow the appropriate procedures on installing boto3:
Installing boto3 on Linux
If you are using Python environment like virtualenv, enter the environment in which you wish to install boto3. In it, execute the following command:
pip3 install boto3
You can also install the package globally:
sudo apt install python3-boto3
How to execute scripts found in this article
The method of executing the scripts varies depending on the operating system of your choice.
How to execute scripts using Linux command line
Open a text editor of your choice like nano or vim. Paste the script. Perform appropriate modifications to the code as instructed (like assigning values to variables). Save the file.
Once you have exited from the text editor, execute the python3 command followed by the name of your script from the directory it is in. For example:
python3 browse.py
The script should be executed.
Browsing EODATA
You can use boto3 to browse the EODATA repository. This is Python code you are going to use:
import boto3
access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
directory='Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/'
host='https://eodata.cloudferro.com'
container='DIAS'
s3=boto3.client('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key,endpoint_url=host)
print(s3.list_objects(Delimiter='/',Bucket=container,Prefix=directory,MaxKeys=30000)['CommonPrefixes'])
These are the variables used in the code:
Variable name |
What should be assigned to it |
access_key |
Your access key. Obtain it by following Prerequisite No. 4. |
secret_key |
Your secret key. Obtain it by following Prerequisite No. 4. |
directory |
The directory within EODATA repository which you want to explore. |
When filling in the variable directory, make sure to follow these rules:
Use slashes / as separators between elements of that path - directories and files
Do not start the path with a slash /
Since the element you are exploring is a directory, finish the path with a slash /
Start path with folder name found within the root directory of the EODATA repository (for example Sentinel-2 or Sentinel-5P)
If you want to explore the root directory of the EODATA repository, assign an empty string to variable directory:
directory=''
If you don’t have a directory which you want to explore but you want to simply test this method, you can leave the value which was assigned to variable directory in the example code from above.
Variables host and container contain EODATA endpoint and the name of the container used, respectively. You do not need to modify them.
If you provided your access and secret keys but did not modify the variable directory, the code above will list products found in Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ directory of the EODATA repository. In that case, the output should look like this:
[{'Prefix': 'Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ASA_WSS_1PNESA20120408_110329_000000603113_00267_52867_0000.N1/'}, {'Prefix': 'Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ASA_WSS_1PNESA20120408_110428_000000603113_00267_52867_0000.N1/'}, {'Prefix': 'Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ASA_WSS_1PNESA20120408_110446_000000603113_00267_52867_0000.N1/'}]
This output can be described as a “list of dictionaries”. Each of those dictionaries contains a key called Prefix, providing the path to a file or directory. Instead of printing this list like above, you can loop through it to increase the legibility of the output:
import boto3
access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
directory='Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/'
host='https://eodata.cloudferro.com'
container='DIAS'
s3=boto3.client('s3',aws_access_key_id=access_key, aws_secret_access_key=secret_key,endpoint_url=host)
for i in s3.list_objects(Delimiter='/',Bucket=container,Prefix=directory,MaxKeys=30000)['CommonPrefixes']:
print(i['Prefix'])
This time, the output should show only the paths:
Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ASA_WSS_1PNESA20120408_110329_000000603113_00267_52867_0000.N1/
Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ASA_WSS_1PNESA20120408_110428_000000603113_00267_52867_0000.N1/
Envisat-ASAR/ASAR/ASA_WSS_1P/2012/04/08/ASA_WSS_1PNESA20120408_110446_000000603113_00267_52867_0000.N1/
Downloading a single file from EODATA repository
This section covers how to download a file from EODATA repository.
The script below should download that file to a directory from which the script is being executed. If that directory already contains a file which has the same name as the one you are downloading, it will be overwritten without prompt for confirmation.
The code is:
import boto3
access_key='YOUR_ACCESS_KEY'
secret_key='YOUR_SECRET_KEY'
key='Landsat-5/TM/L1T/2011/11/16/LS05_RMPS_TM__GTC_1P_20111116T100042_20111116T100111_147386_0194_0035_4BF1/LS05_RMPS_TM__GTC_1P_20111116T100042_20111116T100111_147386_0194_0035_4BF1.BP.PNG'
host='https://eodata.cloudferro.com'
container='DIAS'
s3=boto3.resource('s3',aws_access_key_id=access_key,
aws_secret_access_key=secret_key, endpoint_url=host,)
bucket=s3.Bucket(container)
filename=key.split("/")[-1]
bucket.download_file(key, filename)
The variables are:
Variable name |
What should be assigned to it |
access_key |
Your access key. Obtain it by following Prerequisite No. 4. |
secret_key |
Your secret key. Obtain it by following Prerequisite No. 4. |
key |
Full path (including folders) of a file you want to download from EODATA repository. |
When filling in variable key, make sure to follow these rules:
Use slashes / as separators between elements of that path - directories and files
Do not start or finish the path with slash /
Start path with the name of the folder found within the root directory of the EODATA repository (for example Sentinel-2 or Sentinel-5P)
If you don’t have a file which you want to download but you simply want to test this method of downloading files, you can leave the value which was assigned to variable key in example code below.
Again, variable host and container contain the EODATA endpoint and the name of the container being used, respectively. You do not need to modify them.
If provided your access key and secret key but you did not change the contents of variable key, the code should download the file called
LS05_RMPS_TM__GTC_1P_20111116T100042_20111116T100111_147386_0194_0035_4BF1.BP.PNG
which is located within the root directory of product
LS05_RMPS_TM__GTC_1P_20111116T100042_20111116T100111_147386_0194_0035_4BF1
After executing the script, the output should be empty. Regardless, the downloaded file should be visible within the directory from which the script was executed. For example, this is how it will look like on Linux:
What To Do Next
You can further modify these scripts so that they better suit your needs, or integrate them with your own applications. These scripts might also work in other development environments. That is outside of scope of this article.
boto3 can also be used to access object storage containers from ESA HPC cloud: How to access object storage from ESA HPC using boto3