Direkt zum Inhalt | Direkt zur Navigation

Personal tools

Sektionen
You are here: Home » Data » Support Services on Data and Metadata » Analysis Platforms » Demo: run server-side data-near multimodel comparisons

Demo: run server-side data-near multimodel comparisons

last modified Mar 02, 2021 05:25 PM

 

How do I run my script on a high performance computing server near the data?

 

Here we show how to calculate and plot the global mean annual mean near-surface air temperature for some CMIP6 models for the SSP2-4.5 based scenario. In this demo we use Python and cdo in a Jupyter notebook but many other programming languages are available and you can also run your scripts, that is, not only the notebook option is available.

We concatenate historical data (that is, model results from 1850 to 2014 using the best estimates for anthropogenic and natural forcing) with the model results for the Shared Socioeconomic Pathway SSP2-4.5 based scenario. It corresponds to the growth in radiative forcing reached by 2100 (in this case, 4.5 W/m2 or ~650 ppm CO2 equivalent).

The reference paper is The Scenario Model Intercomparison Project for CMIP6, in the CMIP6 Experimental Design and Organization special issue.

We have a repository of test cases where you can find and download Jupyter notebooks showing the code to run several test cases. The code that generated the plot for on the right is in a Jupyter notebook called "use-case_multimodel_comparison_xarray_cdo_cmip6.ipynb".

In this demo, the Jupyter notebook runs in one of the IS-ENES3 world-class supercomputer called Mistral at the German Climate Computing Center (DKRZ), which has direct access to more than 3.3 petabytes of CMIP6 model data results (more info on the data pool here).

Successful applicants to the Analysis Platforms service that chose DKRZ as host will get an email announcing their proposal acceptance and inviting them to open an account in Mistral. The next animation shows the process. Here more information on how to register. First you need to fill the form at: https://luv.dkrz.de/. You will get an email with your user name, something like b123456.

Then, log in again to https://luv.dkrz.de/ with that user name and the password you wrote in the form. As showed in the animation below, users need to join a specific group or "project". This is because many users from different projects access the supercomputer and the resources must be allocated for us: our CPU hours will be counted there and also there will be memory allocated to storage the results. The group for IPCC related data analysis activities in the IPCC DDC Virtual Workspace and the IS-ENES3 related data analysis activities, as the Analysis Platforms, is bk1088 (as shown in the animation below and in the next one for the Jupyterhub).

Users can connect to the Mistral console via ssh (by writing "ssh [Email protection active, please enable JavaScript.]" in your console, more login info here) and run the Jupyter notebook directly in Mistral (more info here) but in this example we will show how to run the Jupyter notebook within the DKRZ Jupyterhub (which  already includes the common packages for climate multimodel comparisons, more information in this video tutorial).

First, clone the repository containing the Jupyter notebook for this demo in a folder in Mistral. If you are not familiar to git, just download the notebook "use-case_multimodel_comparison_xarray_cdo_cmip6.ipynb" from the repository to a local folder in your computer (just navigate to the notebook in the repository and click on the download bottom, it will download the notebook to your local Downloads folder). Then, the notebook must be copied to one of your folders in Mistral:

  1. log in to Mistral by writing "ssh [Email protection active, please enable JavaScript.]" in your console,
  2. when you log in to Mistral you are in your home Mistral folder by default, stay there or create a new folder (for instance, "mkdir my_folder", and go there with "cd my_folder"), and
  3. in another console tab (one where you are not logged to Mistral) go to the local folder where the notebook is in your computer (it would be your local Downloads if you have not move it from there) and write "scp use-case_multimodel_comparison_xarray_cdo_cmip6.ipynb b123456@mistral:/home/dkrz/b123456" where "b123456" must be replaced by your actual user name. "scp" stands for "secure copy", and a copy of the notebook is now in your Mistral folder.

Then, when you open the DKRZ Jupyterhub, the "use-case_multimodel_comparison_xarray_cdo_cmip6.ipynb" notebook will then appear in the list of available folders and files. The animation above shows how to log in to the DKRZ Jupyterhub, choose a job profile, indicate in what project are your resources allocated (the bk1088 project), and start the server (which takes a few seconds). For this demo we needed the smallest resources (computing time and memory storage) allocation in the job profile:

For more information about the DKRZ Jupyterhub we created this video tutorial and questions on the Jupyterhub can be addressed to support(AT)dkrz.de

In this demo we use Python 3 Pandas (the popular data analysis package focused on labelled tabular data) and Xarray (the Pandas generalization for n-dimensional arrays, particularly tailored to working with netCDF files) to process the data, together with the python-cdo (Climate Data Operators) package. Click on the figure to see how to import the packages and find the data paths in the data pool:

 We then identify the historical and scenario data match:

demo_data_missmach

And then we load the data directly from the data pool:

demo_data_load

Finally, we calculate the means and plot the results. We choose to highlight the MPI-ESM1-2-LR and IPSL-CM6A-LR results:

demo_plots

See in the Jupyter notebook that a .pdf with the figure is also created in a folder in Mistral home called "/plots/CMPI6_overview". You can download the plot to your local computer by writing "scp b123456@mistral:/home/dkrz/b123456/plots/CMIP6_overview/SSP2-4.5.pdf . " in your local folder, where the last "." means that the plot must be downloaded in the folder you are and "b123456" must be replaced by your actual user name.