Obliquity and precession as pacemakers of Pleistocene deglaciations

The Milankovitch theory states that the orbital eccentricity, precession, and obliquity of the Earth influence our climate by modulating the summer insolation at high latitudes in the northern hemisphere. Despite considerable success of this theory in explaining climate change over the Pleistocene epoch (2.6 to 0.01 Myr ago), it is inconclusive with regard to which combination of orbital elements paced the 100 kyr glacial-interglacial cycles over the late Pleistocene. Here we explore the role of the orbital elements in pacing the Pleistocene deglaciations by modeling ice-volume variations in a Bayesian approach. When comparing models, this approach takes into account the uncertainties in the data as well as the different degrees of model complexity. We find that the Earth's obliquity (axial tilt) plays a dominant role in pacing the glacial cycles over the whole Pleistocene, while precession only becomes important in pacing major deglaciations after the transition of the dominant period from 41 kyr to 100 kyr (the mid-Pleistocene transition). We also find that geomagnetic field and orbital inclination variations are unlikely to have paced the Pleistocene deglaciations. We estimate that the mid-Pleistocene transition took place over a 220 kyr interval centered on a time 715 kyr ago, although the data permit a range of 600--1000 kyr. This transition, occurring within just two 100\,kyr cycles, indicates a relatively rapid change in the climate response to insolation.


Introduction
During the past 1 Myr (the late Pleistocene), the polar ice sheets grew slowly (glaciation) then retreated abruptly (deglaciation or glacial termination) repeatedly, with an interval of about 100 kyr (Hays et al., 1976). These quasi-periodic glacial-interglacial cycles dominated terrestrial climate change. They are recorded by paleoclimatic proxies such as δ 18 O (the scaled 18 O/ 16 O isotope ratio) in foraminiferal calcite, which is sensitive to changes in global ice volume and ocean temperature. Following on from the work of Adhémar, Croll, and others, Milankovitch proposed that climate change is driven by the insolation (the received solar radiation) during the northern hemisphere summer at northerly latitudes (Milanković, 1941). This insolation depends on the Earth's orbit and axial tilt (obliquity), and Milankovitch suggested that through various climate response mechanisms, variations in these orbital elements -in particular eccentricity, obliquity, and precession 1 -can cause climate change ("Milankovitch forcing"). Many studies have broadly confirmed Milankovitch's theory and the role of Milankovitch forcing in driving Pleistocene climate change, for example by spectral analyses of paleoclimatic time series derived from deep-sea sediments (Hays et al., 1976;Shackleton and Opdyke, 1973;Kominz et al., 1979). These studies have demonstrated that the climate variance is concentrated in periods of about 19 kyr, 23 kyr, 42 kyr and 100 kyr which are close to the dominant periods in precession (∼23 and 19 kyr), obliquity (∼41 kyr), and eccentricity (∼100 and 400 kyr).
There are, however, several difficulties in reconciling the Milankovitch theory with observation. Two in particular arise when trying to explain the 100 kyr cycles. The first is the transition from the 41 kyr dominant period in climate variations to a 100 kyr dominant period at the mid-Pleistocene around 1 Myr ago (hereafter "Myr ago" is written "Ma"). The second difficulty is generating 100 kyr sawtooth variations from orbital forcings and climate response mechanisms (Imbrie et al. 1993, Huybers 2007, Lisiecki 2010). On the one hand, and as shown in Figure 1, the onset of 100 kyr power at the mid-Pleistocene transition (MPT) occurs without a corresponding change in the summer insolation at high northern latitudes (represented by the daily-averaged insolation on 21 June at 65 • N). On the other hand, the ∼100 kyr eccentricity cycle produces only negligible 100 kyr power in seasonal or mean annual insolation variations, despite its modulation of the precession amplitude. Furthermore, the variations of eccentricity and the northern summer insolation are weak while the 100 kyr climatic variations are strong, notably in marine isotope stage (MIS) 11 (see Figure 1 and Imbrie and Imbrie (1980); Howard (1997)). These problems are referred to as the "100 kyr problem" (Imbrie et Lisiecki and Raymo (2005) is compared with the daily-averaged insolation at the summer solstice at 65 • N,Q day 65 • N (upper solid line), the obliquity (dashed line), and the eccentricity (dotted line) calculated by Laskar et al. (2004). The latter two have been scaled to have a common amplitude. The grey region around −1000 kyr represents the MPT extending from −1250 kyr to −770 kyr (Clark et al., 2006). The grey bar extending from −423 to −362 kyr represents marine isotope stage (MIS) 11. The δ 18 O variations are dominated by 41 kyr and 100 kyr cycles before and after the MPT respectively.

1993).
Various models with different climate forcings and response mechanisms have been proposed to solve the 100 kyr problem. Many are based on either deterministic climate forcing models or stochastic internal climate variations. The former proposes that the 100 kyr cycles are driven by orbital variations, particularly precession and eccentricity (Imbrie and Imbrie, 1980;Paillard, 1998;Gildor and Tziperman, 2000). Many models treat the insolation variation as a pacemaker which sets the phase of the glacial-interglacial oscillation by directly controlling summer melting of ice sheets (Gildor and Tziperman, 2000). In this latter hypothesis, stochastic internal climate variability plays the main role in generating the 100 kyr glacial cycles (Saltzman, 1982;Pelletier, 2003;Wunsch, 2003). A general approach is to combine the deterministic and stochastic elements within a framework of nonlinear dynamics, which allows for the occurrence of bifurcation and synchronisation in the climate system (see review by Crucifix 2012).
Other proposed hypotheses include glaciation cycles controlled by the accretion of interplanetary dust when the Earth crosses the invariable plane (Muller and MacDonald, 1997) or by the cosmic ray flux modulated by the Earth's magnetic field (measured as the geomagnetic paleointensity, GPI; Christl et al. 2004;. Some models also try to explain the MPT with Paillard, 1998;Hönisch et al., 2009;Clark et al., 2006) or without (Huybers, 2009;Lisiecki, 2010;Imbrie et al., 2011) an internal change in the climate system.
The above models comprise both climate forcings and re-sponses. According to various studies (Saltzman, 1987;Maasch and Saltzman, 1990;Ghil, 1994;Raymo et al., 1997;Paillard, 1998;Clark et al., 1999;Tziperman and Gildor, 2003;Ashkenazy and Tziperman, 2004), climate forcings frequently determine the time of occurrence of some climate feature, such as the onset of deglaciation. Many recent studies have employed concepts from chaos theory to address the problem of climate change (Crucifix, 2012;Parrenin and Paillard, 2012;Crucifix, 2013;Mitsui and Aihara, 2014;Ashwin and Ditlevsen, 2015;Williamson and Lenton, 2015), which then allow the concept of "pacing" to be described more rigorously as a forcing mechanism. Huybers (2011) noted that many tens of pacing models have been proposed, yet we lack the means to choose between them.
Our current work aims to compare different forcing mechanisms by using a simple ice volume model for the Pleistocene glacial-interglacial cycles. We adopt the pacing model given by Huybers and Wunsch (2005) and combine it with different forcings in order to predict the glacial terminations, which are identified from several δ 18 O records. Our models do not describe the physical mechanism of the climate response to external forcings. We aim instead only to measure the role of different forcings in determining the times of deglaciations. Due to the large and rapid change in ice volume at deglaciation, these times are relatively easy to identify, so the time uncertainties associated with identification are small. They are nonetheless still affected by the overall uncertainty in the chronology of the δ 18 O record (Huybers and Wunsch, 2005).
A common approach for assessing a model is to use p-values to reject a null hypotheses (Huybers and Wunsch, 2005;Huybers, 2011). However, it is well established that p-values can give very misleading results (Berger and Sellke, 1987;Jaynes, 2003;Christensen, 2005;Bailer-Jones, 2009;Feng and Bailer-Jones, 2013), so we instead compare models using the Bayesian evidence. This compares models on an equal footing and takes into account the different flexibility (or complexity) of the models (Kass and Raftery, 1995;Spiegelhalter et al., 2002;von Toussaint, 2011). This paper is organized as follows. In section 2 we assemble the data -stacked δ 18 O records -and identify the glacial terminations. In section 3 we summarize the Bayesian inference method as we use it. We build models based primarily on orbital elements to predict the Pleistocene glacial terminations in section 4. These are compared for different data sets and time scales in section 5. We perform a test of sensitivity of the results to the model parameters and choice of time scales in section 6. Finally, we discuss our results and conclude in section 7.

δ 18 O from a depth-derived age model
The past climate can be reconstructed from isotopes recorded in ice cores or deep sea sediment cores. Air bubbles trapped at different depths in ice cores can be used to reconstruct the past atmospheric temperature, for example. Ice cores have so far been used to trace the climate back to about 800 kyr (Augustin et al., 2004). In order to reconstruct the climate back to 2 Ma, the δ 18 O ratio recorded in the calcite (CaCO 3 ) in foraminifera fossils (including species of benthos and plankton) in ocean sediment cores can be used. We use the δ 18 O ratio as a measure of variations in the global ice volume, although we note that this is also sensitive to the temperature and isotope composition of seawater, for which corrections can be made. For a discussion of the interpretation of marine calcite δ 18 O see for example Shackleton (1967) and Mix and Ruddiman (1984).
In order to calibrate δ 18 O measurements and to assign ages to sediment cores, one could assume either a constant sedimentation rate (determined using radiometrically dated geomagnetic reversals), or a constant phase relationship between δ 18 O and an insolation forcing based on the Milankovitch theory (see Huybers and Wunsch 2004 for details). The former is the "depth-derived age model" (Huybers and Wunsch, 2004;Huybers, 2007). The latter is referred to as "orbital tuning" (Imbrie et al., 1984;Martinson et al., 1987;Shackleton et al., 1990). Clearly this latter method is not appropriate for testing theories related to Milankovitch forcings, because it already assumes a link between δ 18 O variations and orbital forcings. Huybers (2007) (hereafter H07) stacked and averaged twelve benthic and five planktic δ 18 O records to generate three δ 18 O global records: an average of all δ 18 O records ("HA" data set); an average of the benthic records ("HB" data set); an average of the planktic records ("HP" data set). 2 In addition to these three data sets, we also analyze the orbital-tuned benthic δ 18 O stacked by Lisiecki and Raymo (2005) ("LR04" data set), despite its orbital assumptions. The LR04 record was recalibrated by H07 to generate a tuning-independent LR04 data set ("LRH" data set; see the supplementary material of H07 for details).
We standardize each of the above δ 18 O records over the past 2 Myr to have zero mean and unit variance, to produce what we call the δ 18 O anomalies as shown in Figure 2 (DD, ML, MS are explained below). We identify the deglaciations in the next section. We see that the sawtooth 100 kyr glacial-interglacial cycles become significant over the late Pleistocene while 41 kyr cycles dominate over the early Pleistocene. From now on, we will use the term "late Pleistocene" to mean the time span 1 Ma to 0 Ma, and "early Pleistocene" to mean 2 Ma to 1 Ma.

Identification of deglaciations
Rather than trying to model the full time series of δ 18 O variations, we focus instead only on the times of glacial terminations (deglaciations). This is because an orbital forcing should determine predominantly the timing of a deglaciation rather than the detailed variation of the ice volume (Gildor and Tziperman, 2000;Paillard, 1998;Huybers and Wunsch, 2005). This not only simplifies the problem (thus making results more robust), but is also in line with our goal of trying to identify the main pacemakers for deglaciations, rather than trying to model the continuous response of the climate to orbital forcings. Here we describe how we identify the deglaciations.
From Figure 1, we see that the δ 18 O amplitudes are larger in the late Pleistocene than in the early Pleistocene. This is interpreted to mean that after the MPT, ice sheets both grew to larger volumes and retreated more rapidly to ice-free conditions. This rapid and abrupt shift from extreme glacial to extreme interglacial conditions defines 11 well-established late-Pleistocene terminations (Broecker, 1984;Raymo, 1997). Because termination 3 is sometimes split into two terminations (Huybers, 2011) -labeled 3a and 3b ( Figure 2) -we actually identify 12 major terminations over the late Pleistocene. The times of these major terminations as established by various publications has been collated by Huybers (2011) and are given in his supplementary material. Based on his Table S2, we define three sets of terminations which cover just the late Pleistocene: • DD: termination times and corresponding uncertainties estimated from the depth-derived timescale in H07; • MS: termination times and corresponding uncertainty equal to the median and standard deviation (respectively) of different termination times for each event given in the literature (Imbrie et al., 1984;Shackleton et al., 1990;Lisiecki and Raymo, 2005;Jouzel et al., 2007;Kawamura et al., 2007); • ML: termination times as in the MS data set, but with larger uncertainties obtained by adding the time uncertainties of the depth-derived time scales in quadrature with the corresponding uncertainties in the MS data set. In the late Pleistocene, we identify three additional sets of terminations: the DD terminations are denoted by blue lines while the ML/MS terminations are denoted by green lines. These each consists of 12 major terminations, which are indicated by the numbers (we use the convention of splitting termination 3 into two events). What we call minor terminations are all the red points which are not major terminations.
These terminations are shown as vertical lines in Figure 2. In addition to these major terminations, there are also minor terminations characterized by transitions from moderate glacial to moderate interglacial conditions. Considering the ambiguity in defining these (Huybers and Wunsch, 2005;Lisiecki, 2010), we identify terminations in our δ 18 O records using the method of H07. A termination is identified when a local maximum and the following minimum (defined as a maximum-minimum pair) have a difference in δ 18 O larger than one standard deviation of the whole δ 18 O record. The time of a termination is the midpoint of the maximum-minimum pair and the age uncertainty of this mid-point is calculated from a stochastic sediment accumulation rate model (Huybers, 2007). We identify sustained events in all data sets by filtering δ 18 O with different moving-average (or "Hamming") filters. The data sets are show in Figure 2. We use the term "major terminations" to refer to terminations identified in these data sets which coincide with the major terminations in the DD, MS, or ML data sets. All other terminations we refer to as minor terminations. The data on these are listed in Table 1.
Finally, we also define three additional hybrid data sets. As the HA data set is a stack of both benthic and planktic records, we combine the early-Pleistocene terminations identified from the HA data set together with late-Pleistocene terminations from the DD, ML, and MS data sets to generate the HADD, HAML, and HAMS data sets, respectively.
Thus starting from our five original data sets (HA, HB, HP, LR04, LRH), we have a total of 11 data sets of glacial terminations against which we will compare our models (see Table  1).
As there are dating errors and identification uncertainties, we cannot know exactly when a deglaciation occurred. To take into account these uncertainties, we treat the time of each deglaciation probabilistically by defining a Gaussian distribution with the mean and standard deviation equal to the time and time uncertainty (respectively) of the termination. The terminations in a data set are therefore represented as a sequence of Gaussians, which will be modeled as described in the following section.

Bayesian modelling approach
We use the standard Bayesian probabilistic framework (e.g. Kass and Raftery, 1995;Jeffreys, 1961;MacKay, 2003;Sivia and Skilling, 2006) to compare how well the different models explain the paleontological data. This approach takes into account the measurement errors, accounts consistently for the differing degrees of complexity present in our models, and compares models symmetrically. Our specific methodology is outlined briefly in this section. It is described in more detail in Bailer-Jones (2011a,b), where we also present arguments why this approach should be preferred to hypothesis testing using p-values. Table 1: Terminations (major and minor) identified from different δ 18 O records using H07's method (HA, HB, HP, LR04 and LRH) and the DD, MS and ML data sets of major terminations. Combining the early Pleistocene terminations of HA with the DD, MS and ML data sets, we obtain the hybrid data sets of HADD, HAMS and HAML. For each column, the termination ages are listed on the left side and the age uncertainties are listed on the right side (also see Figure 2). All quantities are in units of kyr. -1029 5.6 -1029 5.6 -1030 5.6 -1031 5.5 -1027 5.5 -1080 6.6 -1080 6.6 -1075 6.1 -1085 6.5 -1079 6. 7 -1783 6.9 -1855 7.7 -1897 7.3 -1897 7.3 -1820 6.9 -1859 7. 6 -1855 7.7 -1940 5.8 -1940 5.8 -1856 7.7 -1940 5.8 -1941 5.9 -1893 7.1 The posterior probability of a model M postulated to describe a data set D is given by the rules of probability as where P(M) is the prior of model M, and P(D) can be considered here as a normalization constant. P(D|M) is the evidence of model M which can be written mathematically as θ is the set of parameters of model M, P(D|θ, M) is the likelihood -the probability of observing the data D given specific values of the model parameters -and P(θ|M) is the prior distribution of parameters of this model.
Ideally we would be interested in evaluating the P(M|D) for different models, as this is the probability of a model being true given the observed data. However, this would require that we define all possible models. Thus in practice we compare models by looking at the ratio of model posterior probabilities. If we cannot (or choose not to) distinguish between models a priori, then we set P(M) to be equal for all models. It follows from equations 1 and 2 that this ratio for models M 1 and M 2 is The above ratio of the evidences is called the Bayes factor and is used to compare how well a model (relative to another model) predicts the data, independent of the values of the model parameters. Note that this does not involve tuning the model parameters, which is why using the evidence takes into account differing model complexities. A (maximum) likelihood ratio test, in contrast, automatically favors more complex models (e.g. ones with more parameters), because such model can be tuned to fit the data better without them suffering any penalty on account of their increased complexity: an arbitrarily complex model will fit the data arbitrarily well. The evidence automatically balances model complexity against fitting accuracy to find the most plausible model, as described in the above references.
If we had good reasons to adopt unequal model priors (i.e. other information favored one model over another), then we should instead look at the product of the Bayes factor with the ratio of these priors, but this is not done here.
To account for the time uncertainties in the glacial terminations, we interpret a termination time as a Gaussian measurement model where t j is the measured time of termination j (identified from a stacked δ 18 O record), σ j is the estimated uncertainty in that measurement and τ j is the (unknown) true termination time.
If D comprises N independently measured events, then the probability of observing the complete data set D = {t j } is just the product where the second line just follows from the marginalization rule of probability. P(t j |θ, M), the event likelihood, is the probability that an event (termination) j is observed at time t j . It is equal to the integral of the product of the measurement model with the model-predicted probability of the true time of the event, P(τ j |θ, M), over all values of the true time. That is, we marginalize (average) over the unknown true time. (This is explained further in section 4.3 after we have introduced the models.) This model-predicted probability of the times of the events, i.e. the deglaciations, is the time series model. This will be derived in section 4 from the orbital forcing and pacing models.
We then have all the ingredients we need to calculate the likelihood (equation 5), and therefore the evidence (equation 2) for a given time series model for a given data set. Both the likelihood calculation and the evidence calculation involve an integral. We perform these numerically. The former is one dimensional (over time), so is straightforward. The latter is multidimensional (over the model parameters), so we use a Monte Carlo method. This involves drawing parameter samples from the parameter prior distribution, P(θ|M), calculating the likelihood for each, and then averaging the result. In each case we draw 10 5 samples.
The Bayes factor is a positive number. The larger it is compared to unity, the more we favor model 1 over model 2. Based on the criterion given by Kass and Raftery (1995), we conclude that model 1 should be favored over model 2 if the Bayes factor is more than 10 (and 2 over 1 if it is less than 0.1). If the Bayes factor lies between 0.1 and 10, we cannot favor either model.

Time series models
In section 4.1 we introduce various climate forcing models, such as those based on variations of the Earth orbital parameters. In section 4.2 we define the pacing models. We use this term in a somewhat narrower sense than is often used in the literature (Saltzman et al., 1984;Tziperman et al., 2006). Here a pacing model is one which modulates the effect of a continuously variable forcing mechanism through the introduction of a threshold. Specifically, the ice volume is unaffected by the forcing mechanism until the ice volume exceeds some threshold, where the value of this threshold depends on the magnitude of the forcing. Having defined the forcing and pacing models, we use them in section 4.3 to predict a sequence of glacial termination times. For a given forcing/pacing model M, and values of its parameters θ, this is the term P(τ j |θ, M) in equation 5. In section 5 we will compare these model-predicted terminations with the measured ones, using the the Bayesian approach to compare the overall ability of the models to explain the data.

Forcing models
Insolation influences the climate in a number of ways, both directly through mechanisms such as heating the lower atmosphere, and indirectly through modifying the ice accumulation rate and other mechanisms (Berger, 1978b,a;Saltzman and Maasch, 1990). Mainstream thinking holds that climate change is most sensitive to the northern summer insolation at high latitudes because the temperature in continental areas, of which there is more in the northern hemisphere, is critical for ice melting or sublimation (Milanković, 1941). The summer insolation at high latitudes depends on the geometry of the Earth's orbit and the inclination of Earth's spin axis, and thus depends on the eccentricity, precession, and obliquity (hereafter referred to collectively as "orbital elements", even though obliquity is not orbital). Variations in these alter how the insolation varies with season (from orbital and axial precession), with latitude (from obliquity changes), and with time scale (e.g. eccentricity variations occur at dominant periods of 100 kyr and 400 kyr).
Milankovitch proposed that the combination of orbital elements which gives rise to the measured summer insolation at 65 • N is crucial to generating the glacial-interglacial cycles (Milanković, 1941;Hays et al., 1976). To model orbital forcings more generally, we define an orbital forcing model, f (t), as a combination of eccentricity, precession, and obliquity, which is proportional to the insolation over certain time scales, seasons, and latitudes. We build the following forcing models based on the reconstructed time-varying eccentricity, f E (t), precession, f P (t), obliquity, f T (t), and four different combinations thereof: where e(t), (t), and e(t) sin(ω(t) − φ) are the eccentricity, obliquity, and precession index (or climatic precession), respectively. ω(t) is the angle between perihelion and the vernal equinox, and φ is a parameter controlling the phase of the precession. We use the variations of these three orbital elements over the past 2 Myr as calculated by Laskar et al. (2004). We standardize each of f E (t), f P (t), and f T (t) to have zero mean and unit variance, and then combine them to generate the compound models. α and β are contribution factors which determine the relative contribution of each component in the compound models, where 0 ≤ α ≤ 1 and 0 ≤ β ≤ 1. In addition to these models, we also use the daily-averaged insolation at 65 • N on July 21 as a proxy for the Milankovitch forcing, f CMF .
Beyond orbital forcings, we also consider the influence of variations of the Earth's orbital inclination and of the cosmic ray flux. To do this we build an inclination-based forcing model, f Inc (t), using the orbital inclination calculated by Muller and MacDonald (1997), and we model the cosmic ray forcing as a geomagnetic paleointensity (GPI) time series (standardized to the mean and unit variance), f G (t), as collected by Channell et al. (2009).
All forcing models and corresponding prior distributions over their parameters ("forcing parameters") are shown in Table 2. In this table and the following sections, all parameters are treated as dimensionless variables by setting the time unit to be 1 kyr (ice volume is on a relative scale). For the precession model, we set φ = 0 to treat precession according to the Milankovitch theory (although in section 6 we will allow the phase of the precession to vary in order to check the sensitivity of our results to this assumption). As we do not have any prior information about the values of the contribution factors in the compound models, we adopt uniform prior distributions over the interval [0, 1] for these. Figure 3 shows the single-component forcing models (which do not have any adjustable parameters). All forcing models will be included in pacing models and corresponding termination models in the following sections. Hereafter, for each forcing model, the corresponding pacing and termination models share the same name as shown in the first column of Table 2.

Pacing models
As described earlier, we use the term "pacing" to mean that some aspect of the climate system is independent of external forcings until the climate system reaches a threshold, whereby the value of this threshold is dependent upon the forcing. We model the pacing effect on ice volume variations using the deterministic version of the stochastic model introduced by Huybers and Wunsch (2005). In that model the ice volume at time t is and if v(t) > h(t) then terminate, (7) where and ∆t is a constant time interval. Thus the ice volume changes in discrete steps until it passes a threshold h(t), which is itself modulated by a climate forcing f (t) with a contribution factor a. The initial ice volume is v 0 and the background threshold, h 0 , is either a constant or can itself vary with time. We set η(t) to be unity while the threshold has not been reached; after that the glaciation is terminated by setting η(t) to a constant negative value such that the ice volume linearly decreases to 0 within 10 kyr of the threshold having been exceeded. 3 After this η(t) is set to unity, the next cycle starts. The threshold and the deglaciation duration are chosen to generate approximately 100 and 41 kyr glacial cycles (Huybers and Wunsch, 2005). If the contribution factor a is zero, the ice volume will vary with a period modulated by the background threshold, h 0 . We define this model as the Periodic model. In general the period may vary with time. However, if h 0 is constant, then the Periodic model has a constant period of value h 0 + 10 kyr. Because  Figure 3: The single-component forcing models. A deglaciation is likely to be triggered by a peak in the forcing. The values of eccentricity, precession, obliquity and Milankovitch forcing (CMF) are calculated by Laskar et al. (2004), the orbital inclination relative to the invariable plane is given by Muller and MacDonald (1997), and the GPI record is from Channell et al. (2009).
h 0 controls the period of ice volume variations, different values of h 0 are required to model the 100 kyr cycles in the late Pleistocene and the 41 kyr cycles in the early Pleistocene (see Figure 4). We therefore first build pacing models to separately predict the deglaciations over the early and late Pleistocene using the constant background threshold model. We then use a varying background threshold (either linear or sigmoidal) to try to model the whole Pleistocene. We now describe these models in more detail.

Constant background threshold
A constant background threshold is appropriate for modeling glacial-interglacial cycles without a transition such as the MPT. One realization of such a pacing model with the threshold modulated by a PT forcing model is shown in Figure 5. The ice volume grows until it passes the forcing-modulated threshold. The ice volume then decreases rapidly to zero within the next 10 kyr. We see that a deglaciation tends to occur when the insolation is near a local maximum. Hence the pacing model (equations 7 and 8) can generate ∼100 kyr saw-tooth cycles which enables a forcing mechanism to pace the phase of these cycles.
The pacing model has three parameters: v 0 , h 0 , a. Rather  than fixing these to some expected values, we assign a probability distribution to them. This is the prior which appears in equation 2, which shows that by averaging the likelihood over values drawn from this prior we get the evidence for the model.
As described above, a periodic pacing model is generated by adopting a constant threshold, h(t) = h 0 and a = 0. When forcings are added onto the constant threshold (to give a 0), the ice volume variations then have an average period of about (h 0 + 10 − a) kyr, because ice volume accumulation tends to ter-minate at a forcing maxima. For this reason we use different prior distributions on a and h 0 depending on whether we are trying to model the early (41 kyr cycles) or late (100 kyr cycles) Pleistocene. Specifically, we use prior distributions for v 0 , h 0 , and a which are uniform over the following intervals (and zero outside): 0 < v 0 < 90γ, 90γ < h 0 < 130γ, 15γ < a < 35γ, where γ = 0.4 when we model ∼41 kyr cycles and γ = 1 when we model ∼100 kyr cycles. The range of v 0 is just the range of the ice volume variation, while the mean values of the prior dis-tributions of h 0 and a with γ = 1 are the fitted values obtained by Huybers (2011). For the periodic model, a is zero and h 0 has a uniform prior distribution over 70γ < h 0 < 110γ. In section 6, we will check how sensitive our results are to this choice of priors.

Linear trend background threshold
The constant background threshold model is incapable of modeling the transition from the 41 kyr world to the 100 kyr world. If we treat h 0 as a step function as shown in Figure 4 (red lines), the corresponding pacing model predicts an abrupt MPT with an extra parameter (the time of the transition). But to model the MPT, we will introduce another two versions of the pacing model by allowing the background threshold to vary with time (linearly and nonlinearly).
Studies have suggested various mechanisms which may be involved in climate change before and after the MPT (Saltzman et al., 1984;Maasch and Saltzman, 1990;Ghil, 1994;Raymo et al., 1997;Paillard, 1998;Clark et al., 1999;Tziperman and Gildor, 2003;Ashkenazy and Tziperman, 2004). H07 suggests that a simple model with a threshold modulated by obliquity and a linear trend can explain changes in glacial variability over the last 2 Myr without invoking complex mechanisms. To investigate this, we replace the threshold constant h 0 with a linear trend in time where p and q are the slope and intercept of the trend respectively. To predict the transition from 41 kyr cycles to 100 kyr cycles with reasonable parameter sets, we adopt the following uniform prior distributions for the pacing parameters: 0 < v 0 < 36, 0 < p < 0.1, 106 < q < 146 and 10 < a < 30. For the Periodic model we use a = 0 and a uniform prior for q between 86 and 126. These ranges are adopted so that the pacing model predicts the 41 kyr and 100 kyr cycles with similar period uncertainties as produced by the ranges of parameters in the pacing model with a constant background threshold (section 4.2.1). An example of the linear trend is shown with the green line in Figure 4. If the threshold is not modulated by any forcing (i.e. a = 0, the Periodic model), then the pacing model generates a gradual transition from 50 kyr cycles 2 Ma to 110 kyr cycles at the present.

Sigmoid trend background threshold
To enable a more rapid onset of the MPT, we introduce another version of the pacing model with a nonlinear trend in the background threshold, defined using the sigmoid function as where k is a scaling factor, t 0 denotes the transition time, and τ represents the time scale of the MPT. The uniform priors of the parameters of this version of pacing models are set to be: 0 < v 0 < 36, 90 < k < 130, 10 < τ < 500, 10 < a < 30, and −700 < t 0 < −1250, as motivated by the range of MPT time given by Clark et al. (2006). For the Periodic model we set a = 0 and change the range of k to be 70 < k < 110. The reason for choosing these priors is the same as given in section 4.2.2. Figure 4 illustrates this model. A late transition time, t 0 , moves the trend to the present, and a smaller transition time scale, τ, generates a more rapid transition. The values of 0.6k and 0.4k in the above equation are set in order to rescale the trend model such that the ice volume threshold including a sigmoid trend allows both ∼41 kyr and ∼100 kyr ice volume variations.

Termination models
Using the same method described in section 2 for the data, we identify glacial terminations in the ice volume time series generated by the pacing models. The age uncertainty of each termination is equal to half of the duration of the termination. As with the data, a single termination is represented as a Gaussian probability distribution over time, which is just the term P(τ j |θ, M) in equation 5 (see section 3). The full set of predicted terminations forms the time series model which we will compare with the data. We use the term "termination model" to refer to the combination of a forcing model and a pacing model, which together has a number of parameters. These are listed in Table 2. Each of these termination models can have different background threshold models, as was explained in section 4.2. Figure 6 shows schematically how we compare the modelderived terminations (red line) with the data on one termination (black line). The event likelihood (the integral in equation 5) for a termination is calculated by integrating over time the product of the probability distribution of the observed time of the termination, P(t j |σ j , τ j ), with the model prediction of the true termination time, P(τ j |θ, M). The product of event likelihoods for all terminations in a data set is the likelihood for the termination model with specific values of the parameters of the forcing and pacing model. By calculating the likelihood for many different values of those parameters (drawn from their prior distributions), and averaging them, we arrive at the evidence for that termination model (equation 2).
To accommodate other contributions from the climate system to the timing of a termination, we add a constant background probability to the termination model. This is defined using the background fraction b = H b /(H b + H g ), where H b is the amplitude of the background and H g is the difference between the maximum and minimum of the Gaussian sequence. The background fraction is a parameter of the model which we do not measure, so we assign it a prior (uniform from 0 to 0.1) and marginalize over this too.
Let us summarize our modelling procedure. A forcing model (Figure 3) modulates the ice volume threshold (equation 8) of the pacing model (equation 7) from which the termination model (e.g. red line in Figure 6) is derived. This is then compared with a sequence of terminations identified from a δ 18 O data set using our Bayesian procedure. The red line is the termination model generated from the pacing model shown in Figure 5. The black line represents the measured data on termination j. Its time and uncertainty are interpreted probabilistically as a Gaussian distribution over time.

Evidence and Bayes factor
We calculate the Bayesian evidence of the termination models listed in Table 2 for each of the data sets shown in Table 1. We calculate this for terminations extending over three different time spans: 1 Ma to 0 Ma, 2 Ma to 1 Ma and 2 Ma to 0 Ma. The first time span is the same as that chosen by Huybers (2011). However, other studies claim that the onset of 100 kyr cycles occurred around 0.8 Ma. We will examine in section 6 how sensitive our results are to the choice of time span. According to the time span in question, we need to choose the appropriate pacing model, because this determines the dominant period.
The Bayes factor (BF) is just the ratio of the evidence for two models. Rather than reporting Bayes factors for various pairs of models, we will report them for all models relative to a simple reference termination model. This reference model is just a uniform probability distribution over the time of deglaciations, and has no parameters. It corresponds to a constant probability in time of a deglaciation, but its choice is arbitrary as it just serves to put the evidences on a convenient scale.
Bayes factors should only be used to compare different models for a common data set. This is because their definition requires that the factor P(D) in equation 3 cancels out.

Late Pleistocene (1-0 Ma)
The deglaciations identified using H07's method (in the data sets HA, HB, HP, LR04, and LRH) contain many minor terminations which may be better explained by models which predict ∼41 kyr cycles. Thus, we choose constant background thresholds with γ = 1 and γ = 0.4 for all termination models in order to predict 100 kyr and 41 kyr variations, respectively, over the past 1 Myr.
The BF for each termination model relative to the uniform model is shown in Figure 7. We see that the HA, HB, LR04, and LRH data sets favor the models with a tilt component and with γ = 0.4. Although compound models such as EPT and CMF sometimes have BFs slightly higher than the Tilt model, precession and eccentricity may not be necessary to explain the terminations identified in these data sets.
The HP data set favors the PT model with γ = 1. This could be caused by a mismatch between the terminations identified in HP and the terminations identified in other data sets. For example, around the time of termination 6 ( Figure 2), two terminations are identified in HP while only one termination is identified in other data sets. The discrepancy between HP and other data sets is larger before 0.8 Ma, which indicates a more ambiguous definition of terminations, particularly for planktic δ 18 O. On account of this, in section 6 we will narrow the time span to 0-0.8 Ma (a more conservative time scale of late Pleistocene). Nevertheless, for all the data sets containing minor terminations, tilt is a common factor in the preferred models.
For the DD, ML, and MS data sets, the PT and CMF models with γ = 1 are favored. In other words, the major terminations are better predicted by a model involving precession and tilt rather than either alone, although tilt alone can pace minor terminations. Because the EPT and CMF models have lower BFs than the PT model, the eccentricity component is unlikely to pace the glacial terminations directly. Yet eccentricity can determine the glacial terminations indirectly through modulating the amplitude of the precession maxima (i.e. e sin ω). A similar conclusion was drawn by Huybers (2011) using p-values. We note that the rejection of a null hypothesis in this way does not automatically validate the alternative hypothesis. The Bayesian approach allows one to directly compare multiple models in a symmetric fashion.
Since the late Pleistocene is characterized predominantly by major terminations, we conclude that late Pleistocene climate change is paced by a combination of obliquity and precession. This does not automatically imply that there is no link between major terminations and eccentricity variations. Eccentricity may determine the 100 kyr cycles in the late Pleistocene, while obliquity and precession influence the exact timing of the terminations (Lisiecki, 2010). This could be studied in future work by introducing an eccentricity dependence into the pacing model.

Early Pleistocene (2-1 Ma)
Here we only consider the HA, HP, HB, L04, and LRH data sets, because the DD, ML, and MS sets have no terminations in the early Pleistocene. We only calculate BFs for models with γ = 0.4 (and not γ = 1), because this reproduces periods on the order of 41 kyr, and such cycles are obvious in all data sets (Figure 2). We exclude the GPI model because the GPI record has a time span less than 2 Myr. The BFs for the termination models are shown in the upper right panel of Figure 7.
We see that the Tilt model is favored by all data sets. The combination of tilt with other orbital elements does not give a higher BF, so we conclude that the other orbital elements do not play a major role in pacing the deglaciations over the early Pleistocene. It is important to realise that although the Bayesian evidence generally penalizes more complex models, this does not automatically result in a lower Bayes factor for such models. They can achieve higher Bayes factors if the model is supported by the data sufficiently strongly (see the references in section 3).

Whole Pleistocene (2-0 Ma)
For the whole Pleistocene we use the data sets HA, HB, HP, LR04, and LRH as well as the hybrid data sets, HADD, HAML, and HAMS. We use pacing models with and without a trend threshold to model the terminations. The BFs for the above models and data sets are shown in the lower left panel of Figure  7.
For the HA, HB, and LR04 data sets, the Tilt model with γ = 0.4 is favored. Other combinations with the tilt component and with γ = 0.4 yield similar BFs. However, for the HP and LRH data sets, the PT model with a sigmoid trend is favored and this model also gives high BFs for the HA, HB, and LR04 data sets. For all these data sets, the Precession, Eccentricity, and Periodic models have rather low BFs. These results indicate a major role for tilt and a minor role for precession in pacing major and minor Pleistocene deglaciations. For all the above data sets, the CMF model with γ = 0.4 has a high BF, but not higher than other models with a tilt component. CMF is an optimized version of the EPT model. Faced with different models which give similar Bayes factors, we will normally want to choose the simplest, which here is PT. We will investigate this further in section 6.
For the HADD, HAML, and HAMS data sets, the PT model with a threshold modulated by a sigmoid trend is favored, and those compound models with a tilt component also have high BFs. The whole Pleistocene deglaciations appear to be paced by the combination of precession and obliquity. This is consistent with the results for the late-Pleistocene deglaciations. The physical reason why precession becomes important after the MPT is beyond the scope of our work and is still under debate.
On account of the existence of the MPT, modeling the whole Pleistocene with a constant background threshold model makes little sense, so those corresponding results should not be given much weight. (This corresponds to assigning all those models a smaller model prior, P(M).) More appropriate are the models with linear and sigmoid background thresholds. Among these, we see that the EPT and CMF models have BFs about ten times lower than the PT model. We conclude that eccentricity does not play a significant role in pacing terminations over the whole Pleistocene. We also find that the PT model with a sigmoid background threshold is more favored than the PT model with a linear background threshold, which indicates that the MPT may not be as gradual as claimed by (Huybers, 2007). We will discuss this further in section 6.
According to Figure 7, the Inclination and GPI models are not favored, and in fact are less favored than the reference uniform model (as BF<1). Thus we find that the geomagnetic paleointensity does not pace glacial cycles over the last 2 Myr, although we note that there is some controversy over the link between the GPI and climate change Pierrehumbert, 2008;Bard and Delaygue, 2008;Courtillot et al., 2008). In contrast to the conclusion of Muller and MacDonald (1997), there we find no evidence for a link between the orbital inclination and ice volume change.

Discrimination power
To validate our method as an effective inference tool to select out the true model, we generate simulated data from each model and then evaluate the Bayes factors for all models on these data. The data are simulated with the following parameters for all models except the Periodic one: h 0 = 110γ, a = 25γ, b = 0, and v 0 = 45γ, where γ = 1 for terminations simulated over the last 1 Myr and γ = 0.4 for the time range 2 to 1 Ma. For the Periodic model we use instead of h 0 = 90γ and of course a = 0. Recall that the period of the resulting time series is approximately h 0 + 10 − a. Other parameters in corresponding forcing models are fixed at α = 0.5 for compound models with two components, α = 0.3 and β = 0.2 for the EPT model, and φ = 0 for models with a precession component.
The BFs for simulated data over the last 1 Myr are shown in the left panel of Figure 8. We see that all models based on a single orbital element are correctly selected, although those models combining the correct single orbital element with other elements may also give comparable BFs. When models have similar BFs we would generally want to favor the one with fewest components. This again corresponds to using a larger value of the model prior, P(M) (see section 3).
Incorrect models, in contrast, generally receive much lower Bayes factors. For the PT-simulated data set, the PT model is correctly discriminated from the CMF model (a fitted EPT model). We also see that although the ET model may not be correctly selected out when its BF is similar to that obtained for EP, PT, EPT, and CMF models, the ratios of the Bayes factors are close to unity. The much larger ratios between them for the real data validate our inference of the ET model (see section 5). Figure 8 shows that the EP model is not favored over the Eccentricity model even when the former is the true model. However, the Eccentricity model is never favored on any of the real data sets, so this misidentification does not occur in practice. In conclusion, this discrimination test indicates that our identification (in section 5.1.1) of the PT model as the best model for the late Pleistocene is reliable. We then apply the same test to the period 2-1 Ma, which uses a different value of γ as explained above. The results are shown in the right panel of Figure 8. We see that the correct model always has a larger Bayes factor than the other models. Yet we also see that for data simulated from the PT model, the CMF and EPT models have similar BFs as the PT model. However, as the PT model is not as fine tuned as the CMF model and has fewer adjustable parameters than the EPT model, we would generally invoke Occam's razor to select the PT model. This experiment confirms that Bayesian model comparison and our interpretation of the Bayes factors allows us to select the correct model. We conclude that tilt (or obliquity) is the main "pacemaker" of the deglaciations over the last 2 Myr, while precession may pace the deglaciations over the late Pleistocene. This indicates that precession becomes important in pacing terminations after the MPT. Other climate forcings, including GPI and inclination forcing, are unlikely to pace the deglaciations over the Pleistocene.

Sensitivity test
We now perform a sensitivity test to check how sensitive a model's BF is to the choices of time scale and model priors.
To do this we first change the time of the onset of the 100 kyr cycles from 1 Ma to 0.8 Ma. We recalculate the BFs and show them in the lower left panel of Figure 7. We see that the combination of obliquity and precession (i.e. the PT model) still paces the major terminations (DD, ML, and MS) better than obliquity alone. So our conclusion is robust to this change of the late-Pleistocene time span.
We then change the prior distributions over some model parameters and keep others fixed. We apply this sensitivity test to the ML, HA, and HAML data sets with time spans of 1-0 Ma, 2-1 Ma, and 2-0 Ma, respectively. These three data sets are representative and conservative because they contain the major terminations as well as minor terminations identified in the HA data set, which is stacked from both benthic and planktic data sets. In each case we select the most favored types of the pacing model according to the model comparison in section 5. They are: the constant background threshold with γ = 1 for ML; the constant background threshold with γ = 0.4 for HA; sigmoid background threshold for HAML. For each model, we change the range of the uniform prior on each parameter as follows (the name in parentheses is used to label the change in Figure 9) • λ = 0 → −10 ≤ λ ≤ 10 (lag): Here we account for the possible time lag between the forcing and its effect (as was suggested by previous studies such as Hays et al. 1976 andImbrie 1980). λ represents the time lag(s) of any model listed in Table 2, and λ ranges -10 to 10 kyr in steps of 1 kyr. For models with a single component, a time lag is achieved by shifting the corresponding time series to the past or to the future. For compound models, each component is shifted independently, and the corresponding evidences are calculated by marginalizing the likelihood over time lags of all components.
• 90γ < h 0 < 130γ → 80γ < h 0 < 140γ (hlarge) and 100γ < h 0 < 120γ (hsmall): We extend or shrink the upper and lower limits of the background threshold, h 0 , by 10γ. Changing the prior distribution of h 0 is equivalent to changing the prior distribution of the period of a pacing model, because the average period is about h 0 + 10 − a (see section 4.2.1). The above changes only apply to models with a 0 while the prior distribution of the Periodic model (a = 0) is changed from 70γ < h 0 < 110γ first to 60γ < h 0 < 120γ and then to 80γ < h 0 < 100γ. For models with a sigmoid trend, the prior distribution of k is changed from 90 < k < 130 first to 80 < k < 140 and then to 100 < k < 120.
• 15γ < a < 35γ → 5γ < a < 45γ (alarge) or 20γ < a < 30γ (asmall): We extend or shrink the range of contribution factor of forcing, a, around its mean. These changes do not apply to the Periodic model, for which a = 0.
We double or halve the upper limit of b, the contribution factor of the background in the termination model.
• φ = 0 → −π < φ < π (phi): We now allow any value for the the phase of the precession, φ, which is related to the season of the insolation that forces the climate change.
The BFs for the models with each of the above changes are shown Figure 9, separated into three blocks corresponding to the different data sets, ML, HA, HAML. For the ML data set (1-0 Ma; top block), the PT and CMF models are favored over the Tilt model for all changes in the priors. The PT and CMF models without time lags are also favored over corresponding models with lags. This indicates that the Tilt and Precession models pace climate change without significant time lags. Over the early Pleistocene (middle block), the Tilt model is marginally favored. The BFs of the EPT model vary a lot but are never higher than the Tilt model. For the HAML data set (2-0 Myr; bottom block), the model combining a sigmoid trend and the PT forcing is favored for all changed priors. Moreover, the BF for the PT model increases when shrinking the range of the background fraction, b. The relative lack of significance of the background suggests a significant influence of obliquity and precession over the past 2 Myr.
To further investigate the role of precession in pacing the major late-Pleistocene deglaciations, we marginalize the likelihoods for the PT model over all its parameters except for the contribution factor of precession contribution factor, α, and phase, φ. (Note that the evidence is the likelihood marginalized over all model parameters.) We do this for the ML data over the last 1 Myr. The distribution of this marginalized likelihood (relative to the uniform model) is shown in the left panel of Figure 10. The highest values occur for phases ranging from −50 • < φ < +50 • , indicating that the main pacemaker under this model is either the intensity of the northern hemisphere summer insolation or the duration of the southern summer (we cannot distinguish between these based on available data). While very small contribution factors, α < 0.1, are strongly disfavored, the model is otherwise not very sensitive to α. Since α determines the size of the contribution of precession to the PT model (equation 6), this means that some precession contribution is favored, but the exact amount is not well constrained. This broad high likelihood range of α and φ means that the pacing depends on the overall northern hemisphere summer insolation at a range of northern latitudes (or equivalently the duration of the southern summer) rather than that at a specific latitude and time in summer. This is consistent with Huybers 2011's conclusion that "climate systems are thoroughly interconnected across temporal and spatial scales".
We found in section 5.1.3 that the pacing model with a sigmoid background threshold model was favored when modeling the whole Pleistocene. We now identify which parameters of that model are most favored by the data. To do this we calculate the marginalized likelihood (relative to the uniform model) for the PT model with a sigmoid background threshold as a function of both the transition time scale, τ, and transition midpoint, t 0 , on the HAML data set (i.e. we marginalize over all other parameters): see the right panel of Figure 10. To explore this more completely we have extended the upper limit of t 0 from -700 kyr to -300 kyr. The peak is at around τ = 100 kyr (about one glacial-interglacial cycle) and t 0 = −715 kyr. To visualize this transition, a sigmoid background model with this value of τ is shown in Figure 4. Defining the transition duration as the time taken for the ice volume to change from 25% to 75% of its maximum value, τ = 100 kyr corresponds to a transition duration of 220 kyr. This timescale for the MPT is consistent with the findings of Hönisch et al. (2009); Mudelsee and Schulz (1997); Tziperman and Gildor (2003); Martínez-Garcia et al. (2011). It is shorter (more abrupt transition) that found by H07 and others (Raymo et al., 2004;Liu and Herbert, 2004;Medina-Elizalde and Lea, 2005;Blunier et al., 1998), although Figure 10 shows that longer time scales are not that improbable (but note that the likelihoods are shown on a logarithmic scale). The transition time of 715 kyr ago is somewhat later than the mid-point of the MPT of ∼-900 kyr identified by Clark et al. (2006) using a frequency spectrogram analysis. Yet our data/analysis permits a range of values, although we see that the region around -900 kyr is disfavored for low values of τ. Discrepancies from previous results could also arise from the fact that we use just termination data.
As a final sensitivity test, we change the sign of the contribution factor of forcing, a, to model possible anticorrelations between forcing models and the data over the late Pleistocene. We find that this significantly reduces the BF for all favored models, which shows that models with anticorrelations are a poor description of the data.

Summary and conclusions
Using likelihood-based model comparison, we find that a combination of obliquity (axial tilt) and precession is the main pacemaker of the 12 major glacial terminations in the late Pleistocene. Obliquity alone can trigger minor terminations over the whole Pleistocene. The obliquity and precession pace the Pleistocene terminations without significant time lags, and their pacing roles can be identified with high significance.
We confirm the dominant role of obliquity in pacing the glacial terminations over the early Pleistocene. In contrast to the conclusion of H07, we find that a model with obliquity alone describes the major and minor Pleistocene deglaciations  Figure 7). These are shown for three different data sets (and time scales) in the three blocks separated by white horizontal lines. In each block the logarithm of the Bayes factor is show on a color scale for each model (vertical axis) and change in prior (horizontal axis). The first column -labeled 'none' -gives the BFs for models with the original priors for reference. Some models are not relevant for certain prior changes, so the corresponding slots are empty (white). The three blocks are as follows. Top: pacing model with a constant background threshold with γ = 1 for the ML data set (0-1 Ma). Middle: pacing model with a constant background threshold with γ = 0.4 for the HA data set (1-2 Ma). Bottom: pacing model with a sigmoid background threshold for the HAML data set (0-2 Ma).  Figure 10: The distribution of the logarithm of the marginalized likelihood relative to the uniform model, log 10 (RML), for the PT model as a function of two model parameters. The left panel shows the distribution over the precession contribution factor (α) and phase (φ) for the PT model with γ = 1 for the ML data set (the last 1 Myr). The right panel shows the distribution over the transition time (t 0 ) and transition time scale (τ) for the PT model with a sigmoid background threshold for the HAML data set (the last 2 Myr). 10 6 and 1.6 × 10 6 sample points sampled in a 200 × 200 grid were used to construct the left and right distributions respectively. For each panel, the most favored region is identified by applying a 25 × 25 grid to the distribution, and is denoted by a cross. Note that the scales saturate: likelihoods above or below the limits of the color bar are plotted using the extreme color.
(together) better than a model which combines obliquity with a trend in the background threshold. Thus obliquity is sufficient to explain at least the time of minor terminations before and after the MPT, without reparameterizing the model as done by H07 and Raymo et al. (1997); Paillard (1998); Ashkenazy and Tziperman (2004); Paillard and Parrenin (2004); Clark et al. (2006).
We observe that precession becomes important in pacing the ∼100 kyr glacial-interglacial cycles after the MPT. Through the comparison of models with a linear trend and models with a sigmoid trend in the background threshold, we find that the glacial terminations over the whole Pleistocene can be paced by a combination of precession, obliquity, and a sigmoid trend in the background threshold. Using marginalized likelihoods, we find that the MPT has a time scale (the time required for ice volume to grow from 25% to 75% of the maximum) of about 220 kyr and a mid-point at around 715 kyr before the present. This is rather late compared with other studies (Clark et al., 2006), although our data/analysis supports a broad range of values. Note that we do not assume the existence of a strict periodicity in the data, in contrast to some studies based on power spectrum analyses. Since there is no significant change in the power spectrum of the insolation before and after the MPT, the MPT must be caused by a rapid change of response of the climate to the insolation, rather than by the insolation itself. This is consistent with previous studies (Paillard, 1998;Parrenin and Paillard, 2003;Ashkenazy and Tziperman, 2004;Clark et al., 2006).
We also find that geomagnetic forcing and forcing by changes in the inclination of the Earth's orbital plane are unlikely to cause significant climate change over the last 2 Myr. This weakens the suggestion that the Earth's orbital inclination relative to the invariable plane influences the climate (Muller and MacDonald, 1997). Our results also suggest that the modulation of cosmic rays or solar activity by the Earth's magnetic field has at best a limited impact on climate change on timescales between 10 kyr and 1 Myr, challenging the hypothesis that connects the geomagnetic paleointensity with climate change (Channell et al., 2009).
The Bayesian modelling approach is well suited to multiple model comparison, because it evaluates all their evidences explicitly: a model is not selected just because some alternative "noise" model is rejected. Uncertainties in the data are also accommodated. Moreover, the approach automatically and consistently takes into account the model complexity, in contrast to most other methods (e.g. frequentist hypothesis testing, maximum likelihood ratio tests) which will favor more complex models unless they are penalized in some ad hoc way.
Our conclusions are reasonably robust to changes of parameters, priors, time scales, and data sets. The main uncertainty in our work comes from the identification of glacial terminations over the Pleistocene, although we have used different data sets of terminations to reduce this uncertainty. In future work, a more sophisticated Bayesian method (e.g. the method introduced by Bailer-Jones 2012) could be employed to model the full time series of climate proxies. Using this model inference approach, we may learn more about the mechanisms involved in the climate response to Milankovitch forcings.