Direkt zum Inhalt | Direkt zur Navigation

Personal tools

You are here: Home » Data » ENES Model Data and Metadata » CMIP5 & AR5 » CMIP5 Data Structure

CMIP5 Data Structure

last modified Jul 11, 2019 04:11 PM
Experiments, ensembles, variable names, and other properties

Structured system of experiments

Generally, an experiment is an activity aimed at addressing a specific scientific problem. CMIP5 experiments are numerical experiments with climate models and vary in numerical requirements as period (years) and forcing constraints. Forcing constraints are, for example, volcanic and anthropogenic emissions and land-use change. A simulation is a run of a configured model which conforms to the numerical requirements, runs on a platform and produces output datasets.

In the CMIP5 project, Near-Term (10-30 years) or Long-Term (century and longer) simulations have been performed, with many models even both. Regardless if a model is in the Near-Term or in the Long-Term group, for each model a control run and a 1% per year CO2 increase experiment have been carried out, the latter to diagnose transient climate response (TCR). The figure below only contains the most important experiments, see CMIP5 Experiment Design for a complete presentation.

The two groups of CMIP5 experiments

RCPs, the Long-Term scenarios

The Representative Concentration Pathways (RCPs) represent the full bandwidth of possible future emission trajectories. Depending on population growth and the development of energy production, food production and land use, various emission trajectories are possible.

Using a simple carbon cycle climate model, CO2, concentrations of other greenhouse gases and aerosols were calculated. These calculations correspond to a change in radiation, showing an increase between 2.6 and 8.5 W/m2 by the year 2100, depending on the scenario. The name of each scenario corresponds to the growth in radiative forcing reached by 2100.

  • RCP2.6: radiative forcing reached nearly 3 W/m2 (equal to 490 ppm CO2 equivalent) and will decrease to 2.6 W/m2 by 2100
  • RCP4.5: Stabilization with overshooting. 4.5 W/m2 by 2100 (~650 ppm CO2 equivalent)
  • RCP6: Stabilization with overshooting. 6 W/m2 by 2100 (~850 ppm CO2 equivalent) [this scenario is not represented as it was not carried out for all models]
  • RCP8.5: rising radiative forcing, leading to 8.5 W/m2 by 2100 (equivalent to 1370 ppm CO2 equivalent)


Emission trajectories of the Representative Concentration Pathways (RCPs) in the CMIP5 project

Radiative forcing for Representative Concentration Pathways (RCPs) in the CMIP5 project

For some models the RCPs were continued until 2300. These Extended Concentration Pathways (ECPs) allow possible future long-run climate change impacts to be studied.

You will find a detailed overview of the development of the RCP Scenarios in this article: Detlef P. van Vuuren, Jae Edmonds, Mikiko Kainuma, Keywan Riahi, Allison Thomson, Kathy Hibbard, George C. Hurtt, Tom Kram, Volker Krey, and Jean-Francois Lamarque, et al., 2011: The representative concentration pathways: an overview


Many CMIP5 experiments, the so-called ensemble calculations, were calculated using several initial states, initialisation methods or physics details. Ensemble calculations facilitate quantifying the variability of simulation data concerning a single model. For example, climate model simulations are dependent on the initial state. The variability we know from weather is also existent in climate simulations. The ensemble members with different initial states are usually called realizations. Initialisation method and physics details may also have an influence. Physics details may be parameterisation constants, for example. In the CMIP5 project, ensemble members are named in the rip-nomenclature, r for realization, i for initialisation and p for physics, followed by an integer, e.g. r1i1p1.

For each model a spinup run was performed since GCMs have to turn in until the model output is usuable. Spinup output was not archived. Greenhouse gas concentrations of the pre-industrial phase were used in the spinup run.

The continuation of the spinup run is the control run, also with greenhouse concentrations of pre-industrial times. This control run provided initial states for several other runs as the diagram below shows this for a fancy variable.

diagram explaining CMIP5 experiment parentship and ensemble members

In the CMIP5 Long-Term group, initial states taken from the control run (piControl, pre-industrial control) were used to start the historical hindcast. For RCP scenarios, the historical run has usually been continued with changed parameters. Therefore, historical and any RCP time series may be united to a longer one if you select matching ensemble members. In case you need this feature, look into the header of the RCP data file: The attributes parent_experiment_id and parent_experiment_rip name the right ensemble member for combination.

Centralized names, units and mean value calculation

CMIP5 data obey the CF conventions (CF short for Climate and Forecast). Especially the CF Standard Names for variables have been used, e.g. "air_temperature". This also facilitates search. Short variable names are also centralized. For example, the short name for air_temperature is "ta".

Which variables have been calculated for which time frequency (e.g. daily) and experiment is tabulated in the CMIP5 Standard Output document. The tables herein also contain the used units and rules for mean value calculation.

Centralized file format

CMIP5 data are in NetCDF/CF format (Network Common Data Form, again obeying the CF conventions). This is a binary and header-based data format. Coordinate variables and data variable are defined in the file header. Each CMIP5 data file contains only one data variable, e.g. ta, and, of course, all necessary coordinate variables as longitude, latitude, altitude, time. Attributes in the header give additional information, for example about units and provenance.

In all NetCDF files, variable data are stored in multidimensional arrays. The sequence of the values is definition-controlled. The index set of the data variable is the Cartesian product of the index sets of the coordinate variables, in the same sequence as in the definition of the data variable. For example, the data variable air_temperature is defined in the header as follows:

float ta(time, plev, lat, lon)

The first values in the ta array belong to the first time value, to the first plev value (pressure level, a measure for the altitude) and to the first latitude value. These first ta values further belong to the longitude values in the same sequence as given in the lon array. The lon loop is the innermost of four nested loops. From innermost to outermost the sequence of the loops is that given in the definition of ta, i.e. the outermost loop is that over time.