The A2C2 project is divided into three workpackages (WPs).

## WP1 Mathematical and Statistical Issues

The notion of attractor (or strange attractor) is generally easy to picture for autonomous dynamical systems, i.e. without time-varying forcings. The properties of the strange attractor provide information on conditions of predictability and the general shape of the system trajectories. This autonomous property is obviously not met for the real climate system, due to time varying forcings from solar variations, volcanic eruptions and man made changes in atmospheric composition. A rudimentary way of reconciling this conundrum is to consider that the climate system is “locally” in equilibrium (e.g., for 30 years) with the slowly varying forcings, and assuming that intervals of 30 years are close to “infinity” in order to trajectories to sample the whole attractor. One can then describe the evolution of the climate attractor over sliding intervals of 30 years.

We want to explore to what extent this hypothesis of local time equilibrium (to slow forcings) can be relaxed, and how changes in the behavior of the climate system can be detected. In particular, we want to address the question whether a climate change is a “global shift” of the attractor (with noqualitative change of its properties), or a local or global deformation with qualitative changes. Specifically, we want to qualify climate change in terms of dynamical system theory [Guckenheimer and Holmes, 1983; Wiggins, 1990], by detecting and qualifying potential bifurcations.

Thus it is necessary to use an appropriate mathematical and statistical framework to tackle this question. This workpackage aims at building and consolidating the heuristic and pragmatic approach of flow analogues [Lorenz, 1969] and providing guidelines for their use, in terms of size of reference dataset, size of domain and metric.

The paradigm of flow analogues can be summarized as follows (or see Figure 3). We assume that a trajectory of a dynamical system (or an ensemble of trajectories) is observed during a reference period R (for reference), and that this trajectory is a good proxy of the underlying attractor. For a distinct time interval T, we determine, for each t in T, the states of the reference trajectory (in R) that are closest to the state reached at target time t (in T, for target). The closest states found during the reference R are called the analogues of the target T. We refer to flow analogues when working on variables representing atmospheric motion. There are several ways of obtaining closest analogues: by minimizing a distance or maximizing a correlation. The analogue determination hence strongly depends on the criterion to be optimized and there is no objective reason to prefer one method over another [Toth, 1991a], although the results can be different. We will use standard spatial distances (Euclidean, Mahalanobis, Kullback-Leibler) and correlation types (linear, rank) that are adapted to mean variations, and we will explore other types of distances that are more relevant to extreme variations (e.g., madogram: [Cooley et al., 2006]).

An important caveat of this method, stemming from the theory of dynamical systems, is that there is no guarantee that good analogues can be found in finite time because the large number of degrees of freedom of the planetary climate is large enough that it is unlikely to have been sampled [Lorenz, 1969; Toth, 1995; Van den Dool, 1983]. This difficulty is overcome by choosing smaller geographical domains and reducing the number of spatial degrees of freedom. Such domains can cover Europe (or the North Atlantic), the Arctic region, etc. Heuristic tests of the sensitivity to analogues to the domain definition are seldom done. We will first define “target” regions (Europe, Arctic, North America and Asia) and examine systematically the sensitivity of the analogue computations to the size of the regions. Such an investigation is crucial for Asian domains, which can be influenced by air flows from the Siberian anticyclone, El Niño and the monsoon, which have their own lives and spatial extension. This will provide an empirical base for a geographical domain definition on which the flow analogues are computed. The climate attractor sampling question is also less acute now (compared to the epoch of the seminal paper of [Lorenz, 1969]), because of the availability of ensembles of very long climate model simulations (covering several centuries).

In practice for climate applications, we compute the best flow analogues of all time steps of the system, and then we compare the dates of the analogues and their scores. This can be done in two ways for climate models: either by comparing “historical” simulations (i.e. with time varying forcings) to a control simulations (with all forcings fixed), or sorting time varying analogues in historical simulations or reanalyses.

The rationale of this diagnostic is that, under a hypothesis of ergodicity [Manneville, 2004], if the climate attractor does not change shape, then the dates of best analogues and their scores should be uniformly distributed, with no trend. The statistical assessment of this uniformity will be performed on simple models [Lorenz, 1963], quasi-geostrophic models that have a chaotic behavior [Molteni, 2003], and control simulations of the CMIP5/PMIP3 ensembles that have a daily time resolution (or more).

The important methodological step is to detect trends or persisting outliers in the dates and scores of analogues when the system yields time-varying forcings. This will be done heuristically from idealized models and full size climate models in which the forcings are known. We will devise a test for analogue trend detection, by bootstrapping the data. By such a test, we can provide p-values of this attractor deformation by assessing the extent to which the observed field belongs to the distribution of its analogues. This statistical development is new but essential to assess how good are analogues, or whether trajectories of a dynamical system indeed shadow [Ghil et al., 2002] the underlying attractor.

Meteorological events are often the results of sequences of synoptic atmospheric conditions. It can hence be useful to consider analogues of sequences of atmospheric variables. For instance, it can be interesting to look for analogues of five consecutive days. Of course, the length of this “windowing” can reduce the scores of the analogues, because one has to find sequence of days that have globally similar patterns, rather that single days. But considering such windows also constrains the dynamical features of the field, especially on the derivative because using windows of more than one day gives a “direction” to the atmospheric field. We will hence make numerical tests on the optimal choice of the window size of analogues, to achieve a trade-off between good analogue scores, and the dynamical smoothness of the computed analogues.

The reason for the formalism of the reference R and the target T sets is that they can stem from different sources. For instance, the reference R can be a long control simulation (e.g. 1000 years) from a climate model for which all the slow components have smoothed out. And the target T can cover a climate projection from an IPCC scenario, with a different model. This flexibility offers a wide range of analogue analysis combinations, with different applications that will be explored in the second part of the project.

## WP 2 Toolkit development

We envisage the development of a computer platform to compute flow analogues from various databases and allowing for parameter testing. The idea is to create a generic tool that will be used for the specific climate-related questions we want to explore. This toolkit will be disseminated to the scientific community with an open source license. The goal is to facilitate the use of flow analogue methods in climate research. Hence it will allow the re-definition of geographical regions so that users can apply it in their own research.

Since the calculation of analogues over large datasets is heavy, the code will contain a parallel scheme, in order to take advantage of the new computer architectures (even laptops). Given the nature of the computer programs and the format of datasets (generally in netcdf format), a linux interface will be privileged. The core of the toolkit will allow the use of various metrics to compute analogues.

The output of this toolkit will provide the best flow analogues (and their statistical characteristics) and simple composite diagnostics on other variables, such as temperature, wind speed and precipitation. It will provide diagnostics on the quality of analogues in an objective way, from the results in WP1, and provide a statement on the relevance of the analogue scores.

We will develop a pseudo real-time interface to recover automatically reanalysis data from ad hoc open databases, over selected regions of interest. This module will enable a pseudo real-time determination of best circulation analogues, composite temperature and precipitation for a region of choice. An early version of such a module allowed the analysis of extreme events in Europe during 2011 [Cattiaux and Yiou, 2012]. Such an analysis will be done routinely for Europe, North America, the Arctic region, and East Asia, from NCEP reanalysis [Kalnay et al., 1996].

The toolkit will also offer the possibility of using long general circulation model (GCM) simulations for the reference database. An interface to CMIP5, PMIP3, CORDEX data will be designed. Such model simulations offer data that either have a longer time span (century or millennium) or finer spatial resolution (11 km or 44 km) than reanalysis data.

The ensemble simulation database interface will be tested in the evaluation of the climate attractor deformation application. The core of the analogue computation will be programmed in the widely used language R and shell scripts. It will be shaped as a standalone application. A prototype of a web application (running on the LSCE computing server) for a pseudo realtime analysis (once a week) of atmospheric flow over several regions of the globe will be implemented.

This computer engineering step is crucial in order to facilitate the transfer of knowledge from the mathematical theory to climate applications. The platform will also serve as a base for further scientific or innovation projects.

We will create a database of available model simulation data (CMIP5, PMIP3, CORDEX), reanalyses (NCEP: [Kalnay et al., 1996]) and observations (ECA&D: [Klein-Tank et al., 2002]), with common format specifications, quality check and bias removal [Michelangeli et al., 2009]. The database will cover the identified regions of analysis. We will focus on sea-level pressure, geopotential heights at various levels (1000, 850, 500 and 300 hPa), near-surface temperature and precipitation. The reanalysis and observational datasets will be updated on a regular basis. The analogue flow platform will then enable a switch between datasets.

The original database of model simulations or observations will not be re-distributed, although the open platform will make use of them. We might distribute corrected datasets after bias removal. We will issue documentation on the quality check and bias removal, in order to guarantee the reproducibility of results, and provide a guide of “best practice”.

## WP 3 Applications

The applications of the flow analogue method are numerous. We will focus on two scientific challenges, for which such methodology provides innovative information. The two applications make an intensive use of the flow analogue methodology in order to infer probability distributions of atmospheric patterns, and assess the statistical significance of changes in distributions.

### Climate attractor drifts (WP3a)

Climate model intercomparison initiatives have generated many long simulations with various forcing hypotheses (including astronomical, solar, volcanic and anthropogenic [Jansen et al., 2007]). If we consider general circulation models of the atmosphere, the basic equations of motion are the same for all models, which mainly differ in the physical schemes for small scale phenomena.

We shall first focus on control simulations of climate models (i.e. with no time varying forcing) from the CMIP5 and PMIP3 databases. We will assume that those control simulations sample the underlying climate attractor of an autonomous dynamical system. We will determine a “climatology” of analogues of sea-level pressure (SLP) for extra-tropical regions of the northern hemisphere covering the North Atlantic, the Arctic, North America and Asia. This climatology will include the probability distribution of analogue scores for various metrics and the probability distribution of dates of analogues. We will also determine the spatial probability distributions of composite temperatures and precipitations for those regions. This climatology describes the shadow of the attractor [Ghil et al., 2002] that is approximated by each trajectory (or model simulation). The distribution of poor scores is particularly interesting because it describes parts of the attractor that are seldom encountered. This climatology provides the baseline statistics against which we will test potential changes in the attractor of the system.

We will then determine the SLP analogues of future climate projection simulations (the target sets) from control simulations (reference sets) for each region that we identified. The statistics of scores will be compared to the “baseline attractor” probability distributions. In particular, we will examine the distribution of poor scores (large distance and/or low correlation value) that represent exceptional synoptic conditions found in the target sets, that are never encountered in the reference sets. Such adiagnostic will provide an assessment of an attractor deformation in climate change scenarios, and provide a basis for specific analyses of the atmospheric circulation. An ensemble of scoring tests, based on variance distances (Euclidean, Mahalanobis, Kullback-Leibler) or higher moments (e.g., madogram, entropy), and correlation tests will be performed.

In a second step, the analogue search will be performed solely on the future climate projection simulations (reference and target). We will examine clusters and trends in the dates of analogues, if an attractor deformation was identified in the previous step. This analysis of trend will permit the identification of the timing of the attractor deformation. This generalizes time of emergence notion [Giorgi and Bi, 2009]. We will provide atlases of deformation times for the regions of focus, when such changes are identified.

The attractor deformation can be due to the appearance of “new” synoptic patterns, or the deformation of already existing ones. Both hypotheses correspond to specific dynamical scenarios of bifurcation (Hopf or pitchfork bifurcations [Wiggins, 1990]). We will use the paradigm of weather regimes [Michelangeli et al., 1995] for the atmospheric circulation to propose dynamic scenarios of such drifts. Such an analysis will also be illustrated on simple dynamical systems showing various types of bifurcations [Lorenz, 1963].

We will then endeavor the attribution of such deformation of the atmospheric circulation patterns to external causes. The Paleoclimate Model Intercomparison Project phase 3 and other initiatives provide an invaluable ensemble of millennium climate simulations including forcings such as solar, volcanic, astronomical and land use [Schmidt et al., 2011]. Solar activity and major volcanic eruptions have induced changes in surface temperature distribution [Mann et al., 2009; Servonnat et al., 2010] and have been suspected to alter the extra-tropical atmospheric circulation [Shindell et al., 2001a; Shindell et al., 2001b], although such changes are certainly very subtle [Yiou et al., 2012]. An analogue analysis of the atmospheric circulation in millennium climate simulations will enable us to assess how the climate attractor responds to forcings that have durations of decades and less, and no particular trend (contrary to anthropogenic forcing during the last century).

### Attribution of extreme events (WP3b)

Extreme climate events like the European summer heatwave in 2003 or the Russian summer heatwave in 2010 were connected to an anomalous anticyclonic flow and soil moisture feedbacks. The warm European winter in 2006/2007 or the cold winter in 2010 were also connected to anomalous atmospheric patterns.

It is in principle difficult to attribute long-term causes to the occurrence of extreme climate events. On the one hand, heatwaves or cold spells in the extra-tropics are generally connected to specific synoptic atmospheric patterns [Cassou et al., 2005; Cattiaux et al., 2010; Yiou et al., 2007], and are a major challenge for seasonal weather prediction because such patterns evolve on daily timescales. On the other hand, it has been shown that the probability distribution of extreme temperatures follows the evolution of mean temperatures [Parey et al., 2010; Yiou et al., 2009] on interannual time scales. We will explore how the framework of flow analogues permits to reconcile both points of view (seasonal vs. interannual) to understand the connection between extreme events that occur on short time scales, and secular climate and environmental variations.

The first direction is to investigate how flow analogues account for temperature anomalies in various regions of the world. We decide to start with four regions of the northern hemisphere (North Atlantic, North America, Arctic and East Asia). A systematic analysis of recent and coming (in a pseudo real time mode) extremes will be performed, as was done by [Cattiaux et al., 2010; Cattiaux and Yiou, 2012; Yiou et al., 2007] for European temperatures and using the NCEP reanalysis. Those case studies will be generalized by using various reference datasets (including high resolution simulations from CMIP5/PMIP3 databases). Hence we will explore how (and when) such extreme events can be reproduced in various possible trajectories of the climate system.

From a given case of extreme event (e.g. heat or cold spell, large-scale drought, extra-tropical storm) we will determine the atmospheric conditions that precede this event, from observations or reanalysis. We then use analogues of SLP and three-dimensional atmospheric circulation of those conditions to create an ensemble of coherent initial and boundary conditions. This ensemble is based on the best analogues of pressure related fields, from different models of the CMIP5/PMIP3 databases. In such models, the physical parameterizations (for convection for instance) and external forcings (solar, volcanic and anthropogenic) yield different implementations that make the computed ensemble of conditions test the structural stability of the trajectories. We then explore how the trajectories emerging from those analogue conditions (of the precursors of the observed extreme event) lead to an extreme event in each available simulation. In this way, we can evaluate the probability of obtaining an observed extreme event, given an initial condition, and a set of forcing hypotheses. We will use of tools of data mining [Hastie et al., 2009] to optimize the search of analogues of an observed pattern in a large multi-model database of simulations. We hence produce an estimate of the probability of obtaining an observed extreme event, conditional to choices of physical parameterizations and forcing factors. From long series of observations or control simulations, we can determine the baseline probability distribution of each type of extreme event. If the conditional probability of obtaining an observed event from the multi-model ensemble is significantly larger than the baseline distribution, we can evaluate the physical factor that is connected to the increased probability of this event. This procedure of attribution will be formalized in a rigorous fashion.

This methodology is similar to the one of [Pall et al., 2011] but it is much less demanding in terms of computer time because no new model simulation is required, and is not restricted to the analysis of single model experiments. This systematic and quasi-automatic analysis of extreme events will benefit from the design of the analogue platform (in WP2).