TERATEC 2020 Forum
Wednesday October 14, 2020 - Workshops

Workshop 01 - 09:00 to 10:30

Environment and satellite data: from abundance of applications to the surge of structured solutions
Chaired by Laurent Boisnard, Deputy Head of Earth Observation, CNES and François Robida, BRGM

Scaling scientific data analysis on HPC or Cloud facilities with Pangeo
By Guillaume Eynad-Bontemps, Responsable Centre de Calcul, CNES
and Tina Odaka, Ingénieur recherche et développement en traitement de larges jeux de données, IFREMER

Pangeo is first a scientific community, but also a Python software ecosystem and a platform that can be deployed on many infrastructures. Its goal is to provide ways to make scientific research and programming easier on big datasets coming from simulations ran on HPC clusters (climatic model) or from sensors like earth observation satellites.

In this talk, we will see how a scientist or an engineer will be able to analyze and process huge data volumes interactively, in a few lines of code, using the software components that are at the heart of Pangeo: Jupyter, Dask and Xarray.

These main pieces of software will be presented:

  • Jupyter is the main graphical interface, it advantageously replaces a terminal.

  • Dask allows scaling computations and data analysis through many nodes or virtual machines.

  • Xarray gives a high level representation of multi-dimensional scientific data. 

We will also describe the main possibilities for deploying a Pangeo platform: your personal laptop, a public cloud provider or an HPC cluster.

Finally, we will demonstrate Pangeo stack usage through some concrete use cases:

  • A multi-temporal analysis on Sentinel 2 satellite tiles, in order to watch the evolution of the NDVI (Normalized Difference Vegetation Index) on its pixels.

  • The computation in a few seconds of the global sea level evolution using Aviso data, by distributing the processes in hundreds of CPU cores.

Biography  : BigData scientific engineer at IFREMER. After obtaining her PhD in co-supervision (Germany and Japan), Tina made her post-doc in satellite data processing and HPC computation on ocean models. Tina has been working since 2008 at IFREMER as a scientific computation expert in the field of marine sciences. Tina is interested in the optimal workflow from the data to the result using HPC infrastructure both for users and usage of infrastructure.
Biography : In charge of CNES Computing Center for a year, Guillaume is a specialist in distributed commputing on huge datasets. He has in the past deployed a Hadoop cluster and deployed algorithm on it. He is also a member of Pangeo steering committee and a Dask active user

