Direkt zum Inhalt | Direkt zur Navigation

Personal tools

Sektionen
You are here: Home » Data » Support Services on Data and Metadata » Analysis Platforms » Demo: calculate a climate index in a server hosting all the climate model data

Demo: calculate a climate index in a server hosting all the climate model data

last modified Jan 25, 2021 12:56 PM

 

How do I calculate a climate index in a server hosting all the climate model data I need?

 

We show here how to count the annual summer days for a particular geolocation of your choice using the results of a climate model, in particular, we write a program that allows us to easily choose one of the historical or one of the shared socioeconomic pathway (SSP) experiments of the Coupled Model Intercomparison Project CMIP6. The reference paper is The Scenario Model Intercomparison Project for CMIP6, in the CMIP6 Experimental Design and Organization special issue.

In this demo we use Python in a Jupyter notebook but many other programming languages are available and you can also run your scripts, that is, not only the notebook option is available.

The code to calculate and visualize the climate index is written in a Jupyter notebook called

"use-case_count_summer_days_cmip6.ipynb"

You can find it in our tutorials and use cases repository. In the Jupyter notebook you can choose the model, the location, and the year. In the example in the plot on the left we show the summer days results for the historical run of the MPI-ESM1-2-HR model in Hamburg for 2008. The Jupyter notebook is meant to run in the Jupyterhub server of the German Climate Computing Center DKRZ which is an ESGF repository that hosts 4 petabytes of CMIP6 data (more info on the data pool here). Therefore, there is not need for data download, the code can direct access the data pool and we just need to load the relevant data.

Successful applicants to the Analysis Platforms service that chose DKRZ as host will get an account to access the high-performance computer called Mistral (follow the steps here to request your account once we let you know by email that your proposal has been accepted). After creating an account as showed in the animation below, you will get an email with your user name, for instance, b123456.

In the same website, log in with your user name and the password you wrote in the registration form. Then you need to join a specific group as showed in the animation below. This is because many users from different projects access the supercomputer and the resources must be allocated for us: our CPU hours will be counted there and also there will be memory allocated to storage the results. The group for IPCC related data analysis activities in the IPCC DDC Virtual Workspace and the IS-ENES3 related data analysis activities, as the Analysis Platforms, is bk1088.

Users can connect to the Mistral console via ssh (by writing "ssh [Email protection active, please enable JavaScript.]" in your console, more login info here) and run the Jupyter notebook there (more info here) but in this example we will show how to run the Jupyter notebook within the DKRZ Jupyterhub (which  already includes the common geoscience software packages, more information in this video tutorial).

You can clone the repository to one of your folders in Mistral. If you are not familiar with this, you can download the notebook "use-case_count_summer_days_cmip6.ipynb" from the repository to a local folder in your computer (just navigate to the notebook in the repository and click on the download bottom, it will download the notebook to your local Downloads folder), and then do the following 3 steps to copy the notebook to your one of your folders in Mistral:

  1. log in to Mistral by writing ssh [Email protection active, please enable JavaScript.] in your console,
  2. you are in your home Mistral folder by default, stay there or create a new folder (for instance, "mkdir my_folder", and go there with "cd my_folder"), up to you, and
  3. in another tab of the console, that is, not in the one you are logged into Mistral, go to the local folder where the notebook is (it would be Downloads if you have not move it from there when you downloaded it from the repo) and write "scp use-case_count_summer_days_cmip6.ipynb  b123456@mistral:/home/dkrz/b123456" where "b123456" must be replaced by your actual user name, the "scp" command stands for "secure copy", write your password, and your notebook will be copied into your Mistral home (you can move it to my_folder with the "mv" command).

Then, when you will open the DKRZ Jupyterhub, the summer days notebook will then appear in the list of available folders and files. The animation below shows how to log in to the DKRZ Jupyterhub, choose a job profile, indicate in what project are your resources allocated (the bk1088 project), and start the server (which takes a few seconds). To get the plot above we do not need big resources (computing time and memory storage), therefore we choose the "5 GB memory, 1 core, prepost, 12 hours" option in the job profile at the Jupyterhub spawner (see the next animation).

Find more information about the Mistral nodes here. For instance, in that docs is written that "prepost" nodes, and not the "shared" ones, are connected to internet, a requirement we need to run our notebook. Regarding the DKRZ Jupyterhub,  this video tutorial shows an overview, including advance topics like how to create your own environments (not required for this demo). Questions on the Jupyterhub can be addressed to support(AT)dkrz.de

We use Python 3 Pandas (the popular data analysis package focused on labelled tabular data) and Xarray (the Pandas generalization for n-dimensional arrays, particularly tailored to working with netCDF files) to process the data. We will use Intake for finding the data in the catalog of the DKRZ data pool.  For visualizing the data in the Jupyter notebook and save the plots in your local computer we will use hvPlot. The figure below shows how to import the packages and choose which model, geolocation, and year you want to use to calculate the summer days index.

Similar to the shopping catalog at your favorite online bookstore, the intake catalog contains information (e.g. model, variables, and time range) about each dataset (the title, author, and number of pages of the book, for instance) that you can access before loading the data. It means that thanks to the catalog, you can find where is the book just by using some keywords and you do not need to hold it in your hand to know the number of pages, for instance.

The animation below shows how to load the Intake catalog, browse it, find the path of the model data you are interested on and load them.

Climate models have a finite resolution. Hence, models do not provide the data of a particular point, but the mean over a model grid cell. We find the model grid cell that contains the geolocation we chose and compare to the actual geolocation in the map. We use hvPlot to plot the time series and the threshold. The definition of a summer day varies from region to region. For instance, according to the German Weather Service, "a summer day is a day on which the maximum air temperature is at least 25.0°C". Depending on the place you selected, you might want to apply a different threshold to calculate the summer days index. Then, we count the days above the threshold.

See in that the plot is interactive and you can save it to your folder in Mistral. You can download the plot to your local computer. If the plot is in your Mistral home, then go to your local folder and write "scp b123456@mistral: /home/dkrz/b123456/your_plot.png . ", (without space between the colom and hte back slash), where the last "." means that the plot must be downloaded in the local folder you are and "b123456" must be replaced by your actual user name. If the plot is in another folder in Mistral, write the path to it when you use "scp".