Direkt zum Inhalt | Direkt zur Navigation

Personal tools

You are here: Home » Data » Support Services on Data and Metadata » Analysis Platforms » Demo: run server-side data-near multimodel comparisons

Demo: run server-side data-near multimodel comparisons

last modified Dec 16, 2020 09:27 AM


How do I run my script on a high performance computing server near the data?


Here we show how to calculate and plot the global mean annual mean near-surface air temperature for some CMIP6 models for the SSP2-4.5 based scenario. In this demo we use Python and cdo in a Jupyter notebook but many other programming languages are available and you can also run your scripts, that is, not only the notebook option is available.

We concatenate historical data (that is, model results from 1850 to 2014 using the best estimates for anthropogenic and natural forcing) with the model results for the Shared Socioeconomic Pathway SSP2-4.5 based scenario. It corresponds to the growth in radiative forcing reached by 2100 (in this case, 4.5 W/m2 or ~650 ppm CO2 equivalent). The reference paper is The Scenario Model Intercomparison Project for CMIP6, in the CMIP6 Experimental Design and Organization special issue.

We have a repository of test cases where you can find and download Jupyter notebooks showing the code to run several test cases. The code that generated the plot for on the right is in a Jupyter notebook called "use-case_multimodel_comparison_xarray_cdo_cmip6.ipynb".

In this demo, the Jupyter notebook runs in one of the IS-ENES3 world-class supercomputer called Mistral at the German Climate Computing Center (DKRZ), which has direct access to more than 3.3 petabytes of CMIP6 model data results (more info on the data pool here).

Successful applicants to the Analysis Platforms service that chose DKRZ as host will get an account in Mistral (follow the steps here to request your account once we let you know by email that your proposal has been accepted). You will get a user name, for instance, b123456. After that, users need to join a specific group. This is because many users from different projects access the supercomputer and the resources must be allocated for us: our CPU hours will be counted there and also there will be memory allocated to storage the results. The group for IPCC related data analysis activities in the IPCC DDC Virtual Workspace and the IS-ENES3 related data analysis activities, as the Analysis Platforms, is bk1088 (as shown in the animation below).

Users can connect to the Mistral console via ssh (by writing "ssh [Email protection active, please enable JavaScript.]" in your console, more login info here) and run the Jupyter notebook directly in Mistral (more info here) but in this example we will show how to run the Jupyter notebook within the DKRZ Jupyterhub (which  already includes the common packages for climate multimodel comparisons, more information in this video tutorial).

First, download the notebook "use-case_multimodel_comparison_xarray_cdo_cmip6.ipynb" from the repository to a local folder in your computer (just navigate to the notebook in the repository and click on the download bottom, it will download the nb to your local Downloads folder). Then, the notebook must be copied to your home directory in Mistral: (1) log in to Mistral by writing "ssh [Email protection active, please enable JavaScript.]" in your console, (2) when you log in to Mistral you are in your home Mistral folder by default, stay there or create a new folder (for instance, "mkdir my_folder", and go there with "cd my_folder"), and (3) go to the local folder where the notebook is (it would be your local Downloads if you have not move it from there) in the console and write "scp CMIP6_multimodel_example.ipynb b123456@mistral:/home/dkrz/b123456" where "b123456" must be replaced by your actual user name.


Then, when you open the DKRZ Jupyterhub, the CMIP6_multimodel_example.ipynb notebook will then appear in the list of available folders and files. The animation above shows how to log in to the DKRZ Jupyterhub, choose a job profile, indicate in what project are your resources allocated (the bk1088 project), and start the server (which takes a few seconds). For this demo we needed the smallest resources (computing time and memory storage) allocation in the job profile. For more information about the DKRZ Jupyterhub we created this video tutorial and questions on the Jupyterhub can be addressed to support(AT)dkrz.de

We use Python 3 Pandas (the popular data analysis package focused on labelled tabular data) and Xarray (the Pandas generalization for n-dimensional arrays, particularly tailored to working with netCDF files) to process the data, together with the cdo (Climate Data Operators) package to concatenate the model results with their correspondent historical results. Click on the figure to see how to import the packages and find the data paths in the data pool:

 We then identify the historical and scenario data match:


And then we load the data directly from the data pool:


Finally, we calculate the means and plot the results. We choose to highlight the MPI-ESM1-2-LR and IPSL-CM6A-LR results:


See in the Jupyter notebook that a .pdf with the figure is also created in a folder in Mistral home called "/plots/CMPI6_overview". You can download the plot to your local computer by writing "scp b123456@mistral:/home/dkrz/b123456/plots/CMIP6_overview/SSP2-4.5.pdf . " in your local folder, where the last "." means that the plot must be downloaded in the folder you are and "b123456" must be replaced by your actual user name.