Direkt zum Inhalt | Direkt zur Navigation

Personal tools

You are here: Home » Data » Support Services on Data and Metadata » Analysis Platforms » Demo: run server-side data-near multimodel comparisons

Demo: run server-side data-near multimodel comparisons

last modified Oct 16, 2020 09:21 AM


How do I run my script on a high performance computing server near the data?


Here we show how to calculate and plot the global mean annual mean near-surface air temperature for some CMIP6 models for the SSP2-4.5 based scenario. In this demo we use Python and cdo in a Jupyternotebook but many other programming languages are available and you can also run your scripts, that is, not only the notebook option is available.

We concatenate historical data (that is, model results from 1850 to 2014 using the best estimates for anthropogenic and natural forcing) with the model results for the Shared Socioeconomic Pathway SSP2-4.5 based scenario. It corresponds to the growth in radiative forcing reached by 2100 (in this case, 4.5 W/m2 or ~650 ppm CO2 equivalent). The reference paper is The Scenario Model Intercomparison Project for CMIP6, which is part of the CMIP6 Experimental Design and Organization special issue.

The code that generated the figure on the right is in a Jupyter notebook called "CMIP6_multimodel_example.ipynb". We have a repository of test cases where you can find and download this Jupyter notebook. In this demo, the Jupyter notebook runs in one of the IS-ENES3 world-class supercomputer called Mistral at the German Climate Computing Center (DKRZ), which has direct access to more than 3.3 petabytes of CMIP6 model data results (more info on the data pool here).

Successful applicants to the Analysis Platforms service that chose DKRZ as host will get an account in Mistral (follow the steps here to request your account once we let you know by email that your proposal has been accepted). After that, users need to join a specific group. This is because many users from different projects access the supercomputer and the resources must be allocated for us: our CPU hours will be counted there and also there will be memory allocated to storage the results. The group for IPCC related data analysis activities in the IPCC DDC Virtual Workspace and the IS-ENES3 related data analysis activities, as the Analysis Platforms, is bk1088 (as shown in the animation below).

Users can connect to the Mistral console via ssh (ssh <user-account>@mistral.dkrz.de, more login info here) and run the Jupyter notebook there but in this example we will show how to run the Jupyter notebook within the DKRZ Jupyterhub (which  already includes the common packages for climate multimodel comparisons, more information in this video tutorial).


Once the notebook "CMIP6_multimodel_example.ipynb" was downloaded from the repository to a local folder in your computer, it must be copied to your home directory in Mistral. When you log in Mistral, you are in your home Mistral folder by default, stay there or create a new folder (for instance, "mkdir my_folder", and go there with "cd my_folder"). Then just write in the Mistral shell "cp your_local_path/CMIP6_multimodel_example.ipynb .", where "your_local_path" stands for where you saved the notebook in your own computer. Then, when you will open the DKRZ Jupyterhub, the CMIP5_multimodel_example.ipynb notebook will then appear in the list of available folders and files. The animation above shows how to log in to the DKRZ Jupyterhub, choose a job profile and start the server (which takes a few seconds). For this demo we needed the smallest resources (computing time and memory storage) allocation in the job profile. For more information about the DKRZ Jupyterhub we created this video tutorial and questions on the Jupyterhub can be addressed to support(AT)dkrz.de

We use Python 3 Pandas (the popular data analysis package focused on labelled tabular data) and Xarray (the Pandas generalization for n-dimensional arrays, particularly tailored to working with netCDF files) to process the data, together with the cdo (Climate Data Operators) package to concatenate the model results with their correspondent historical results. Click on the figure to see how to import the packages and find the data paths in the data pool:

 We then identify the historical and scenario data match:


And then we load the data directly from the data pool:


Finally, we calculate the means and plot the results. We choose to highlight the MPI-ESM1-2-LR and IPSL-CM6A-LR results:


See in the Jupyter notebook that a .pdf with the figure is also created in a folder in Mistral home called "/plots/CMPI6_overview". You can download the plot to your local computer with "cp  SSP2-4.5.pdf your_local_path".