Estimated Using a State Space Model
Item Type Article
Authors Boteler, Claire; Dowd, Michael; Oliver, Eric C. J.; Krainski, Elias Teixeira; Wallace, Douglas W. R.
Citation Boteler, C., Dowd, M., Oliver, E. C. J., Krainski, E. T., & Wallace, D.
W. R. (2023). Trends of Anthropogenic Dissolved Inorganic Carbon in the Northwest Atlantic Ocean Estimated Using a State Space Model. Journal of Geophysical Research: Oceans. Portico. https://
doi.org/10.1029/2022jc019483 Eprint version Post-print
DOI 10.1029/2022jc019483
Publisher American Geophysical Union (AGU)
Journal Journal of Geophysical Research: Oceans
Rights This is an accepted manuscript version of a paper before final publisher editing and formatting. Archived with thanks to American Geophysical Union (AGU). The version of record is available from Journal of Geophysical Research: Oceans.
Download date 21/06/2023 17:58:32
Link to Item http://hdl.handle.net/10754/692675
in the Northwest Atlantic Ocean Estimated Using a
2
State Space Model
3
Claire Boteler1(ORCID 0000-0003-1727-4510), Michael Dowd1(ORCID
4
0000-0003-2805-293X), Eric C. J. Oliver2(ORCID 0000-0002-4006-2826), Elias
5
T. Krainski3(ORCID 0000-0002-7063-2615), Douglas W. R. Wallace2(ORCID
6
0000-0001-7832-3887)
7
1Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada.
8
2Department of Oceanography, Dalhousie University, Halifax, Nova Scotia, Canada.
9
3Department of Statistics, King Abdullah University of Science and Technology, Thuwal, Kingdom of
10
Saudi Arabia.
11
Key Points:
12
• A time series generalization of the eMLR method is developed to produce monthly
13
estimates with uncertainties of anthropogenic ocean carbon.
14
• The rate of anthropogenic carbon increase in northwest Atlantic is roughly the same
15
for all depth layers, at 0.57 micro-mol/kg/year.
16
• Our method produces estimates of anthropogenic carbon increase that are com-
17
parable to those from eMLR, but with smaller uncertainties.
18
Corresponding author: Claire Boteler,[email protected]
Accepted Article
Accepted Article
Abstract
19
The northwest Atlantic Ocean is an important sink for carbon dioxide produced by an-
20
thropogenic activities. However the strong seasonal variability in the surface waters paired
21
with the sparse and summer biased observations of ocean carbon makes it difficult to cap-
22
ture a full picture of its temporal variations throughout the water column. We aim to
23
improve the estimation of temporal trends of dissolved inorganic carbon (DIC) due to
24
anthropogenic sources using a new statistical approach: a time series generalization of
25
the extended multiple linear regression (eMLR) method. Anthropogenic increase of north-
26
west Atlantic DIC in the surface waters is hard to quantify due to the strong, natural
27
seasonal variations of DIC. We address this by separating DIC into its seasonal, natu-
28
ral and anthropogenic components. Ocean carbon data is often collected in the summer,
29
creating a summer bias, however using monthly averaged data made our results less sus-
30
ceptible to the strong summer bias in the available data. Variations in waters below 1000m
31
have usually been analysed on decadal time scales, but our monthly analysis showed the
32
anthropogenic carbon component had a sudden change in 2000 from stationary to an in-
33
creasing trend at the same rate as the waters above. All depths layers had similar rates
34
of anthropogenic increase of∼0.57µmol kg−1 year−1, and our uncertainty levels are smaller
35
than with eMLR results. Integration throughout the water column (0-3500m) gives an
36
anthropogenic carbon storage rate of 1.37±0.57 mol m−2 year−1, which is consistent
37
with other published estimates.
38
Plain Language Summary
39
We need to measure the ocean sink for the CO2emitted by industrialized societies,
40
and it is particularly important for the northwest Atlantic Ocean. The rate of carbon
41
increase is often overshadowed by natural and seasonal variability. We introduce new sta-
42
tistical approaches to better estimate the rate of anthropogenic carbon that has accu-
43
mulated due to human activities. Ocean carbon data is often collected in the summer,
44
creating a summer bias, however using monthly averaged data made our results less sus-
45
Accepted Article
ceptible to the strong summer bias in the available data. From 1993 to 2015 in the north-
46
west Atlantic Ocean, anthropogenic carbon increased at∼0.57µmol kg−1 year−1within
47
all depth-layers. Integration of results throughout the water column (0-3500m) gives an
48
anthropogenic carbon storage rate of 1.37±0.57 mol m−2 year−1.
49
1 Introduction
50
A key element for climate change projection and for carbon emission policy is how
51
effective the ocean is, and will be, in mitigating the effects of anthropogenic CO2emis-
52
sions. The northwest Atlantic is one of the largest oceanic carbon sinks on an areal ba-
53
sis, taking in and storing large amounts of anthropogenic CO2(Khatiwala et al., 2013).
54
There are several chemical, physical and biological factors that influence how much CO2
55
the ocean takes up throughout the year. In the winter, cold winds cool the surface wa-
56
ters, increasing the solubility of CO2(Emerson & Hedges, 2008), which enhances dissolved
57
inorganic carbon (DIC) concentrations in the surface layers as a result of air-sea gas ex-
58
change. Temperature also influences the physical uptake processes of the area, as cool-
59
ing reduces stratification and facilitates the transport of DIC rich waters from the sur-
60
face to the deeper waters. The shoaling of the mixed layer in the summer traps DIC rich
61
waters in the subsurface ocean where it is no longer in contact with the atmosphere. While
62
respiration will release CO2into the surface waters, phytoplankton will use CO2for pho-
63
tosynthesis. Strong, seasonal biological production also promotes seasonal uptake of CO2
64
from the atmosphere, which is transported deeper via the biological carbon and mixed-
65
layer pumps (Lacour et al., 2019).
66
The column inventory of anthropogenic DIC is the depth-integrated amount of ocean
67
carbon present as a result of human activities, reported on a per area basis. In the north-
68
west Atlantic this was estimated at 160 mol m−2in 2010, several times greater than the
69
global average (Khatiwala et al., 2013). The global ocean’s uptake of atmospheric CO2
70
estimated in the last 10 years, as diagnosed from observations, has been increasing at
71
Accepted Article
a faster rate than the uptake estimated from ocean models (see Figure 9 in Friedlingstein
72
et al. (2020)). Such inconsistencies motivate our goal of improving the estimation of up-
73
take and storage of anthropogenic carbon for the northwest Atlantic, as determined di-
74
rectly from DIC observations.
75
Observational data from two main sources are used to study the ocean carbon cy-
76
cle. The SOCAT dataset provides near-surface ocean pCO2measurements made on re-
77
search cruises and ships of opportunity (Bakker et al., 2016). This dataset has been used
78
to estimate air-sea fluxes on the global scale with methods ranging from neural networks
79
(Landsch¨utzer et al., 2014) to atmospheric inversion (R¨odenbeck et al., 2013, 2015) and
80
interpolation (Goddijn-Murphy et al., 2015). The GLODAP dataset provides discrete
81
water sample measurements for the ocean interior (Lauvset et al., 2021). This dataset
82
has been used to estimate the build up of anthropogenic CO2in the oceans (Lauvset et
83
al., 2016; Khatiwala et al., 2013; Sabine, 2004) and diagnose ocean transport of carbon
84
(Holfort et al., 1998; Macdonald et al., 2003). Analyses of temporal changes have tended
85
to use regression approaches (Friis et al., 2005; Bostock et al., 2013; Gruber et al., 2019),
86
but also use mechanistic approaches based on ocean biogeochemical models (Gerber &
87
Joos, 2010; Friedlingstein et al., 2020; Carroll et al., 2022). Our study focuses on the use
88
of interior ocean DIC observations from GLODAP.
89
An important and long-standing research question is how to separate DIC into its
90
natural and anthropogenic components (Wallace, 2001; Sabine & Tanhua, 2010), where
91
anthropogenic typically refers to the excess DIC present in the ocean as a result of hu-
92
man activities (i.e., due to atmospheric CO2increases since preindustrial times). Mul-
93
tiple linear regression (Wallace, 1995), and especially the extended Multiple Linear Re-
94
gression method (eMLR) (Friis et al., 2005), are now widely used to estimate the increase
95
of anthropogenic DIC. This is usually done between two time points that are a decade
96
or more apart and on repeat hydrographic sections (Thacker, 2012). This approach has
97
been extended recently to include eMLR in a depth-spatial moving window (Carter et
98
Accepted Article
al., 2017), eMLR for column inventories (Plancherel et al., 2013), and eMLR(C*) (Clement
99
& Gruber, 2018). eMLR approaches are based on exploiting the changing relationship
100
between observed DIC and other correlated ocean variables that are not directly impacted
101
by anthropogenic CO2uptake, such as temperature, salinity and oxygen. The eMLR(C*)
102
approach, for example, incorporates more ocean predictor variables and centres the anal-
103
ysis on a derived variable, called C*, that more closely relates to anthropogenic carbon
104
than DIC. The eMLR based methods have proven to be useful for estimating trends over
105
long time intervals (decades) in the sub-surface waters, but few studies have attempted
106
an observation-centric analysis of DIC variations on shorter time scales. In addition, they
107
have also not provided for reliable estimates in the surface ocean. This study aims to
108
rectify those shortcomings and improve our understanding of ocean DIC.
109
The main challenges to quantifying the temporal variability of the carbon sink and
110
the carbon inventory include: the seasonal variability that is particularly strong in the
111
surface waters, observation sparsity, and a strong bias towards summer sampling (McKinley
112
et al., 2017; Fassbender et al., 2018). There is potential for use of improved time series
113
approaches to optimally extract information from existing DIC observational databases
114
to improve our quantification of DIC dynamics over multiple time scales. Gruber (2002)
115
has, for example, used time series methods that incorporate deseasonalization via har-
116
monic analysis, fitting of linear trends, and analysing correlations between variables. In
117
this study, we build upon and generalize the eMLR method into a time series framework
118
using dynamic linear regression. This allows us to analyze seasonally biased observations
119
of DIC and isolate its sources of variability, while improving temporal resolution of the
120
analysis and producing objective observation-driven error estimates. Our method pro-
121
vides estimation of anthropogenic changes in DIC, making it capable of assessing non-
122
linear trends in anthropogenic CO2uptake, as well as its seasonal and inter-annual vari-
123
ability. Here, we use this approach to study DIC in the northwest Atlantic within three
124
depth-layers ranging from the surface to the deep waters, and estimate on a monthly time
125
scale the seasonal, natural variability and anthropogenic DIC. We also investigate the
126
Accepted Article
influence of seasonally biased observations on the analysis and results. This provides a
127
North Atlantic perspective to complement the work performed with seasonally biased
128
data in the Southern Ocean (Fassbender et al., 2018; Mackay et al., 2022).
129
2 Data
130
The principal dataset used in this study is the GLODAPv2.2019 data product (Olsen
131
et al., 2016, 2019), hereafter referred to as GLODAP. It is a global collection of quality
132
controlledin situmeasurements from scientific cruises conducted from 1972 to 2017. It
133
contains 12 core biogeochemical variables, including our target variable of Dissolved In-
134
organic Carbon (DIC), which were measured from bottle samples and include their time
135
and location coordinates (i.e. date, latitude, longitude and depth). Physical variables
136
also reported include potential temperature and practical salinity; we make use of these
137
two variables in our analysis because they have a strong mechanistic and statistical con-
138
nection with DIC, are widely available, and are commonly used in eMLR analyses (Friis
139
et al., 2005). The remaining biogeochemical variables have less frequentin situmeasure-
140
ments than temperature and salinity, and were chosen not to be include in this analy-
141
sis. Hence, we considered only DIC and covariates of temperature and salinity in our anal-
142
ysis, although we note that in the future measurements of biogeochemical variables, such
143
as oxygen from profiling floats, could be used with our approach to improve results.
144
Our region of interest is the northwest Atlantic, which we define as the region south
145
of 65◦ N within the Atlantic Arctic (ARCT) Longhurst biogeochemical province (Longhurst,
146
2007) (Figure 1a). It includes areas of deep convection that are associated with intense
147
anthropogenic carbon storage (Raimondi et al., 2021). These areas of deep convection
148
have been shown to shift their location and intensity in space and time (R¨uhs et al., 2021).
149
The relatively broad study area was chosen to ensure we captured all key regions of deep
150
convection, as well as the more pragmatic concern of having a large enough data set for
151
our analysis.
152
Accepted Article
We separated the data into three depth-layers: layer 1: 0-200m, layer 2: 200m-1000m,
153
and layer 3: 1000-4500m. These correspond to the euphotic zone (sunlit), the mesopelagic
154
zone (twilight) and the bathypelagic zone (midnight), respectively (Berger & Shor, 2009).
155
Biogeochemical ocean regions transition from surface to intermediate and deep waters
156
at different depths due to the mixed layer depth which varies seasonally, interannually
157
and spatially, and can be challenging to define quantitatively (R¨uhs et al., 2021). For
158
this analysis the depths of layer boundaries were chosen to be constant through the year,
159
and correspond to how the mean and standard deviation of DIC changed with depth (Fig-
160
ure 1b). The near surface, layer 1, has the largest spread of DIC due to direct gas ex-
161
change with the atmosphere and the seasonal biological activity that occurs in the eu-
162
photic zone (mean 2124µmol kg−1 and standard deviation 31µmol kg−1). The sepa-
163
ration between layer 1 and layer 2 at 200m is similar to that used by others (Brewer et
164
al., 1995; Ullman et al., 2009; Turk et al., 2017), as is the separation between layer 2 and
165
layer 3 around 1000m (Emery, 2001; Bostock et al., 2013; Keppler et al., 2020). In layer
166
2 the mean DIC value becomes more constant and with a large reduction in spread (mean
167
2158µmol kg−1and standard deviation 8.6µmol kg−1). The deep layer 3 has an even
168
smaller spread with a near-constant DIC (mean 2159µmol kg−1 and standard deviation
169
5.6 µmol kg−1).
170
The DIC observations from January 1993 to December 2015 have a distinct sea-
171
sonal sampling bias, which is clearly seen in the distribution of observations by month
172
(Figure 2a). Of the individual observations, 37% were recorded in the summer (July to
173
Oct) while only 2% were recorded in the winter (Jan to Apr). The seasons are defined
174
as the four months with the warmest and coldest temperatures in the ocean surface layer.
175
Figure 2b shows DIC observations over time in the northwest Atlantic and highlights the
176
difference between winter and summer, with summer having lower DIC values due to higher
177
sea surface temperatures causing some outgassing but mainly due to phytoplankton pho-
178
tosynthesis using CO2(Montes-Hugo et al., 2010) in the surface waters. Fitting a lin-
179
ear temporal trend using least squares regression to the winter observations shows it is
180
Accepted Article
Figure 1. Spatial distributions of GLODAPv2.2019 data. (a) Geographical location of obser- vations (semi-transparent gray dots). Repeated observations at the same location or at multiple depths or times are plotted on top of each other (appearing as solid black dots). The Atlantic Arctic Longhurst Province is indicated (gray polygon). (b) Observations of DIC vs depth are given. Horizontal dashed lines identify the 200 and 1000m depths that separate our data into three depth-layers.increasing at 0.77±0.07µmol kg−1 year−1, a faster rate than the 0.23±0.41µmol kg−1 year−1
181
for the summer observations. Figure 2b motivated the question of whether winter and
182
summer really have different DIC trends, or if this an artifact of sampling bias. This will
183
be explored in Section 4.1. We pre-processed the data by separating it into the three depth-
184
layers, after which each subset was turned into monthly averaged time series (which still
185
have observation gaps). The monthly data was better balanced in terms of its seasonal
186
distribution, i.e. of the months occurring in winter between 1993-2015, 8% had at least
187
1 observation and likewise 12% in summer had at least one observation. By averaging
188
all observations in a month, this implicitly assumed that the observations in that month
189
had homogeneous oceanographic properties, which is a viable assumption for our data
190
contained within the Atlantic Arctic Longhurst Province.
191
Accepted Article
GLODAP has many data gaps through time (Figure 2a). Our analysis has DIC as
192
its target variable but, like the eMLR method, makes use of temperature and salinity
193
as covariates to better estimate it. To improve the spatial and temporal coverage of tem-
194
perature and salinity, we used the GLORYS12v1 reanalysis product to provide a com-
195
plete time series for these physical oceanographic variables (GLORYS12V1 - Global Ocean
196
Physical Reanalysis Product, 2018). GLORYS12v1 provides potential temperature and
197
practical salinity values over a regular 1/12◦ spatial grid, at 50 vertical depth-levels and
198
on daily time intervals from 1993-2018. As with the GLODAP data, this daily reanal-
199
ysis product was separated into our three depth-layers and converted into times series
200
of monthly and spatially averaged values within our region of interest in the northwest
201
Atlantic. Figure S2 in Supplemental Information shows that deseasonalized values for
202
temperature and salinity from both GLODAP and GLORYS12v1 agree, with GLODAP
203
having a slightly larger scatter, but are centered around the GLORYS12v1 values.
204
Figure 2. (a) Proportion by month of GLODAPv2.2019 DIC observations for depths 0-200m.
Numbers at the top of each bar gives the proportion of observations occurring in that month.
(b) Dissolved inorganic carbon (DIC) over time in the Northwest Atlantic domain for depths 0-200m. Dots are colored by the season they were recorded: winter i.e. Jan to Apr (blue dots), and summer i.e. July to Oct (red dots). Note that spring and fall are not shown. To each of the different subsets, linear temporal trends were fitted using ordinary least squares, and their slopes are reported in units ofµmol kg−1 year−1.
Accepted Article
3 Methods
205
The eMLR method (Friis et al., 2005) has been widely used to estimate the anthro-
206
pogenic increase of DIC between two time points, typically a decade or more apart. For
207
a repeat-sampled ocean transect and assuming the relationship between the natural vari-
208
ability of DIC and oceanographic variables remains constant over time, any changes in
209
the regression coefficients of the relationship can be interpreted as a non-natural change,
210
i.e. excess DIC due to anthropogenic sources (Thacker, 2012). When this is evaluated
211
for the later time point, it yields an estimate of the anthropogenic increase of DIC over
212
the time interval of study.
213
Here we develop and apply a time series generalization of the eMLR method. We
214
build upon the eMLR methodology by using a time-invariant linear regression relation-
215
ship of DIC against temperature and salinity to represent the natural variability of DIC,
216
and we also include a time-varying term to represent the excess DIC. As with the eMLR
217
method, we seek to quantify the change in DIC over time, but ours is a time series method
218
that better resolves the changes through time. While the eMLR looks at differences over
219
a fixed time interval, our time series approach constrains variability over a range of time
220
scales, and allows us to look at inter-annual to decadal variability of anthropogenic DIC.
221
It can also account for common observational issues such as varying sampling intervals
222
and data gaps. Importantly, it can produce statistically objective error bars that change
223
through time and reflect the data properties and distribution. Our approach, as detailed
224
below, is based on a decomposition of the DIC observations into three components: a
225
seasonal cycle, natural variability, and excess carbon due to anthropogenic sources. It
226
produces monthly time series of these carbon components (with confidence intervals).
227
Accepted Article
3.1 Decomposition of DIC
228
We designate DIC observations asCt. The first step was to deseasonalize them, such that.
Ct0=Ct−Cts (1)
whereCt0 are the deseasonalized DIC anomalies andCtsrepresents the mean seasonal cy- cle over the period of interest. This seasonal cycle,Cts was estimated empirically by tak- ing the mean of each month (Jan, Feb, etc.). The equivalent deseasonalization procedure was also performed for our other variables of temperatureTt and salinitySt. The DIC anomalies were then further decomposed as follows:
Ct0=Cte+Ctn+vt (2)
whereCte represents the excess carbon (e.g., assumed due to anthropogenic sources) and
229
Ctnrepresents the natural variability (e.g., from variations in oceanic water properties).
230
Thevtis called the observation error, and contains the remaining variability not described
231
by the excess and natural variability components, i.e. within-month variations, and mea-
232
surement error. Separating the natural and anthropogenic changes of DIC in (2) is the
233
same concept proposed in Clement and Gruber (2018), but the methods used to estimate
234
the natural and anthropogenic components target different parts of the carbon system,
235
i.e. circulation and mixing, biological pump, solubility pump, and air-sea flux, and will
236
be discussed further in Section 5.
237
In order to separate and estimate the excess carbon and natural variability com- ponents, we build upon foundational assumptions of the eMLR method and re-cast (2) as the regression equation
Ct0 =β0,t+β1Tt0+β2St0+vt. (3)
The excess DIC,Cte, is now represented by the time varying intercept termβ0,t. The nat-
238
ural variability,Ctn, is equated with the termsβ1Tt0+β2St0, representing a linear rela-
239
Accepted Article
tionship between anomalies of DIC and anomalies of temperature,Tt0, and salinity,S0t.
240
Note, the linear regression could be expanded to include biogeochemical ocean variables
241
such as oxygen, etc. We assumed observation errorvt∼N(0, σv2). For this time-dependent
242
regression, we estimated the four parameters: β0,t,β1,β2, andσv using the monthly av-
243
eraged and deseasonalized GLODAP DIC data.
244
To illustrate how our method relates to the equations used in the eMLR method, consider the eMLR prediction equation for anthropogenic carbon (CeM LR) that has been expanded to show its two regressions:
CeM LR= (at2−at1) + (bt2−bt1)Tt2+ (ct2−ct1)St2
= (at2+bt2Tt2+ct2St2)−(at1+bt1Tt2+ct1St2).
The regression parametersat1,bt1 andct1 would be estimated from DIC, temperature
245
and salinity data at timet1, and similarly for parametersat2,bt2 andct2 estimated at
246
timet2. Both regressions are predicted atTt2 andSt2, temperature and salinity at time
247
t2. When the eMLR is expanded to show its two regressions, the second bracket (at1+
248
bt1Tt2+ct1St2) is equivalent to our natural component but evaluated at each timet in
249
our monthly time series. The anthropogenic carbon (CeM LR) is equivalent to our time
250
varying intercept term (β0,t).
251
3.2 Dynamic Linear Regression
252
To estimate excess carbon,Cte viaβ0,t, we used a state space modelling framework that corresponds to dynamic linear regression (Zivot & Wang, 2003; Laine, 2020). This uses two equations: an observation equation, as described by (3) above, and a predic- tion equation (4) as described below. The prediction equation predicts the time evolu- tion of excess carbon, as embodied in the regression interceptβ0,t. It takes the form
β0,t=β0,t−1+φ(β0,t−1−β0,t−2) +wt. (4)
Accepted Article
This is a correlated random walk process. It predicts the excess carbon at timet(β0,t)
253
based on its previous two values at timest−1 andt−2. Specifically, the update is based
254
onβ0,t−1 plus a proportion of the change occurring between the previous two time steps
255
(β0,t−1−β0,t−2). Note that the unit time increments used here refer to the time inter-
256
val of the analysis, in this instance monthly. The parameterφis a measure of the strength
257
of the excess carbon time tendency; we term this theproportion of trend persistence. Note
258
that we used a random walk instead of an auto-regressive process for the prediction equa-
259
tion (4) because such a stationary autoregressive processes have a mean reversion prop-
260
erty (i.e. when predicting forward in time, they are drawn back to the mean), which is
261
inconsistent with the increasing anthropogenic DIC over time that we wish to quantify.
262
The purpose of the correlated random walk is to introduce temporal memory in excess
263
carbon, with the DIC observations guiding its time evolution through the state space model
264
(3)-(4). The prediction error for the new intercept is given bywt ∼N(0, σw2), and its
265
varianceσ2w will be estimated as part of the implementation (see section 3.3).
266
The Kalman filter/smoother algorithm (see, e.g. (Anderson & Moore, 1979)) was
267
used to solve this dynamic linear regression (3)-(4). This fixed-interval smoother pro-
268
vides for estimation of the system state, which in this instance is excess carbon,β0,t. Specif-
269
ically, it provides for the monthly estimates of the mean values of excess carbon and its
270
time-varying variance. The variance can be used to produce errors bars, i.e. confidence
271
intervals for excess carbon. The algorithm uses sequential, or recursive, estimation. This
272
relies on the Kalman filter, which operates as a forward in time recursion such that at
273
each time stept, an estimate of the new excess carbon value is available via a one-step
274
ahead prediction using (4). This prediction is then updated to be closer to the DIC ob-
275
servation using a smoother step that further refines these estimates through a backwards
276
in time recursion. The ratio of the variances for the observation error (σv) to the pre-
277
diction error (σw) is a key quantity that dictates how closely the results follow the ob-
278
servations. To facilitate their specification, the Kalman filter also allows for parameter
279
estimation through likelihood-based approaches. For this study, we estimated the pa-
280
Accepted Article
rametersφandσ2w using maximum likelihood, whileσv was estimated from analysis of
281
DIC observations . For details of the Kalman filter/smoother and parameter estimation,
282
the reader is referred to the Supplementary Information.
283
3.3 Implementation
284
To implement our proposed methodology, the following steps were taken to per-
285
form the analysis on each depth-layer. Following the pre-processing of the DIC,T and
286
S data into monthly average time series, we: (i) determine the seasonal cycle, and de-
287
seasonalize DIC,T andS; (ii) find the regression coefficients relating DIC toT andS,
288
i.e. β0,t=1,β1 andβ2; (iii) specify initial conditions for mean and variance of excess car-
289
bon; (iv) specify or estimate the state space model parameters: σv,σw andφ; (v) carry
290
out state estimation of excess carbon component with Kalman Smoother; (vi) reconstruct
291
the total carbon time series by combining its components (excess, natural, seasonal). These
292
steps are outlined in detail below.
293
(i) Seasonal component and deseasonalize DIC, T and S
294
After pre-processing the data into a monthly average time series, the seasonal cy-
295
cle of DIC, ˆCts, was estimated from the GLODAP monthly observations as the long-term
296
mean for each month (e.g. the mean of all Januaries across all years of data). The sea-
297
sonal cycle of DIC was removed to produce observations of deseasonalized DIC anoma-
298
lies,Ct0. The temperature and salinity observations were also deseasonalized to yieldTt0
299
andSt0.
300
(ii) Estimate regression coefficients that relateCt0 to Tt0 and St0: β0,t=1, β1 and
301
β2
302
The regression coefficients ˆβ0,t=1, ˆβ1, and ˆβ2 from (3) were estimated with ordi-
303
nary least-squares regression using deseasonalized GLODAP data from the first five years
304
(1993-1997). When estimating the regression parameters for the covariates, it was im-
305
portant to check the overall regression significance (F-test), and the significance for each
306
Accepted Article
parameter (t-test). If the regression was not significant, then the natural component be-
307
come a constant value set to 0 and we set ˆβ1 = ˆβ2 = 0, as is seen in Table 1. In the
308
case that one parameter was not significant then that parameter was set to 0 (e.g. ˆβ2=
309
0), effectively simplifying the regression to have only one covariate.
310
The five year time span was chosen to provide enough data to estimate a statis-
311
tically significant regression, but small enough that we could assume the anthropogenic
312
changes of carbon within the time span were negligible. The relationship between car-
313
bon and its covariates change over time, such that if regressions on the first five years
314
and the last five years were performed, their estimated parameters would be different,
315
which is the foundation of the eMLR method (Friis et al., 2005). We took the eMLR as-
316
sumption thatthe natural relationship of carbon with covariates does not change over
317
time literally by using data at only the beginning of the analysis period in order to es-
318
tablish the baseline natural oceanographic relationship between DIC and temperature
319
and salinity, that would not take into account anthropogenic changes over time.
320
(iii) Initial conditions for mean and variance of excess carbon
321
To run the sequential Kalman smoother algorithm, we require an initial condition
322
for the mean and variance of the state. The initial condition for its mean, ˆβ0,t=1, was
323
estimated in the previous step with the other regression coefficients, using ordinary least-
324
squares regression with the deseasonalized GLODAP data from the first two years (1993-
325
1997). For the initial condition of its variance ˆσ2e,t=1 a reasonable value is selected by
326
iterating through steps (iii) - (v) a couple of times for an initial variance that did not
327
artificially inflate or deflate the confidence intervals of excess carbon.
328
(iv) Estimate state space model parameters: σv,σw and φ
329
The observation error standard deviation,σv, was derived from deviations of the
330
original DIC observations,C, about the estimated seasonal cycle. Specifically, standard
331
errors for each of the 12 months were calculated from the original GLODAP observa-
332
Accepted Article
tions for the duration of the analysis period. The median of these values was used as ˆσv
333
and are reported in Table 1. (Note that this standard error is for monthly mean values
334
and hence corresponds to what is here termed the observation standard deviation of DIC
335
anomaliesCt0).
336
Parameter estimates for the prediction standard deviation,σw, and the proportion
337
of trend persistence,φ, were determined using the method of maximum likelihood. The
338
likelihood is determined by using the one-step ahead predictions of the Kalman filter,
339
and its detailed computation is outlined in the Supplemental Information. The maxi-
340
mum likelihood parameter values forφandσw are given in Table 1.
341
(v) State estimation of excess carbon with Kalman Smoother
342
We execute the dynamic linear regression (3) - (4) with the Kalman smoother al-
343
gorithm, producing a monthly estimate of the excess carbon component ˆβ0,t and its vari-
344
ance ˆσe,t2 , which was used to construct 95% confidence intervals. Details of the Kalman
345
smoother are provided in the Supplemental Information.
346
(vi) Reconstructing total carbon
347
The total carbon time series was reconstructed by combining estimates for its sea-
348
sonal, natural and excess components. The natural variability component within (3) is
349
estimated using the regression parameters ˆβ1and ˆβ2 along with deseasonalized temper-
350
ature and salinity from a prediction dataset ( ˆCtn= ˆβ1Tp,t0 + ˆβ2Sp,t0 ). The GLORYS12v1
351
reanalysis product was used as the prediction dataset. The errors of the natural com-
352
ponent were calculated using the 95% prediction intervals from the linear regression, which
353
is equivalent to the amount of variability that remains to be described by the excess car-
354
bon component.
355
As a final remark on error bars, by adding together the carbon components (sea-
356
sonal, natural variability and excess carbon) we get a complete monthly time series es-
357
timate of total carbon ( ˆCt = ˆCts+ ˆCtn+ ˆCte). As the components are additive, so are
358
Accepted Article
the variances for natural variability and excess carbon, which produces a confidence in-
359
terval for carbon anomalies. This uncertainty was also used for total carbon because we
360
considered the deseasonalization done in (1) as a linear transformation prior to our main
361
analysis. For the seasonal component, confidence intervals (1.96×the standard error
362
of the seasonal cycle of DIC) show the within month variability that was considered for
363
the estimation of the observation standard deviation (ˆσv). Table 1 gives estimates for
364
all the parameters needed to initialize and execute the dynamic linear regression: the
365
estimated initial regression coefficients ( ˆβ0,t=1, ˆβ1, and ˆβ2), initial standard error (ˆσe,t=1),
366
observation standard deviation (ˆσv), prediction standard deviation (ˆσw) and the propor-
367
tion of trend persistence ( ˆφ).
368
Table 1. Dynamic linear regression parameters estimated for each depth-layer
βˆ0,t=1 βˆ1 βˆ2 σˆe,t=1 σˆv σˆw φˆ Layer 1: 0-200m -5.40 -6.97 47.72 2.0 10.1 0.10 0.64
Layer 2: 200-1000m -4.17 0 0 1.7 2.2 0.95 0
Layer 3: 1000-4500m -2.73 7.40 0 1.5 6.5 0.10 0.46
4 Results
369
4.1 Carbon Components: Seasonal, Natural & Excess
370
The seasonal evolution of DIC ( ˆCts) for our three depth-layers is shown in Figure 3.
371
The surface water in layer 1 shows a prominent annual cycle peaking at 2142µmol kg−1
372
in the spring followed by the lowering of DIC corresponding to the spring bloom of phy-
373
toplankton. In the fall, DIC begins to rise again, aligning with the deepening of the mixed
374
layer, incorporating waters with higher DIC values due to subsurface respiration. Layer
375
1 seasonal DIC showed a secondary peak in the fall at 2133µmol kg−1, however note
376
there is large uncertainty associated with the dip in December. The deeper waters main-
377
tain a roughly constant value throughout the year. The annual average of the seasonal
378
cycle in layer 2 is larger than in layer 1 (by 32µmol kg−1) and for layer 3 it is slightly
379
Accepted Article
larger again (by 12µmol kg−1). Layer 3 shows a subtle seasonal cycle that could pos-
380
sibly be related to changes through the year within the dominant water mass that flows
381
into the region via boundary currents, similar to the seasonal signal discovered for oxy-
382
gen concentrations in the deep Labrador Sea boundary current (Koelling et al., 2022).
383
However, based on the width of the confidence interval, the seasonal cycle in layer 3 may
384
not be statistically significant.
385
Figure 3. Seasonal cycle of DIC, ˆCts, for each depth-layer: (a) layer 1: 0-200m, (b) layer 2:
200-1000m, and (c) layer 3: 1000-4500m. Each plot shows the estimated mean (black line) and 95% confidence intervals (gray area).
The natural variability component of carbon ( ˆCtn) shows variations of DIC that are
386
related to oceanic temperature and salinity variations (Figure 4), and captures the inter-
387
and intra-annual variations of carbon. For the natural component in layer 3, the vari-
388
ations are very small with an amplitude of 1µmol kg−1and a significant trend of 0.07±
389
0.04µmol kg−1 year−1as estimated via generalized least squares regression. Layer 1 has
390
no significant increasing trend, meanwhile its fluctuations have a larger amplitude of 7.9µmol kg−1.
391
Upon further inspection of the layer 1 natural component using spectral analysis, its sta-
392
tistical character changed at∼ 2006 from high frequency (i.e. periods shorter than a
393
year) to low frequency (i.e. periods longer than a year) (Figure 4a). The natural com-
394
ponent in layer 2 is shown as a constant line because a statistically significant regression
395
relationship could not be estimated for DIC anomalies against temperature and salin-
396
Accepted Article
ity covariates (Table 1 shows ˆβ1= ˆβ2= 0). A non-significant regression relationship oc-
397
curred due to the small amount of data used for fitting, and also the natural variabil-
398
ity being small. The natural regression relationship for layer 1 used both temperature
399
and salinity, while that for layer 3 used only temperature. The 95% prediction intervals
400
show the amount of variability that remains to be described by the excess carbon com-
401
ponent.
402
Figure 4. The natural variability component of carbon, ˆCtn, for each depth-layer: (a) layer 1: 0-200m, (b) layer 2: 200-1000m, and (c) layer 3: 1000-4500m. Each plot shows the estimated mean (black line) and 95% prediction intervals (gray area).
The excess components of carbon ( ˆCte) represent the increase of anthropogenic DIC
403
over time, with the largest annual increase occurring in the surface waters. The anthro-
404
pogenic increases are presented as a linear trend for convenience, and fit using general-
405
ized least squares. Considering their confidence intervals overlap, all three layers have
406
approximately the same rate of increase of∼0.57µmol kg−1 year−1(Figure 5). Specif-
407
ically, the trend in layer 1 is 0.63±0.59µmol kg−1year−1, layer 2 is 0.57±0.11µmol kg−1 year−1,
408
and layer 3 is 0.47±0.11µmol kg−1 year−1. However, the layer 3 trend was only es-
409
timated using 2000-2015 data. Prior to 2000, the excess carbon component had a slight
410
declining trend, though it was not statistically significant. The apparent difference in trend
411
before and after 2000 is notable, and is suggestive of low frequency variability in the ven-
412
tilation of the deep ocean.
413
Accepted Article
The time series characteristics of excess carbon in layer 2 (Figure 5b) are influenced
414
by the observations and the input parameters used for the dynamic linear regression (i.e.,
415
σv, σw, andφ). The maximum likelihood procedure estimated the trend persistence to
416
be zero (φ= 0), which suggests that excess carbon in this layer follows a random walk,
417
instead of a correlated random walk (whenφ 6= 0). This leads to a more blocky and
418
erratic (rather than smoothly varying) appearance. Excess carbon in layer 1 and 3 have
419
smoother inter-annual variations due to following a correlated random walk through the
420
observations (φ= 0.64 and 0.46, respectively). In layer 3, the observations are less fre-
421
quent, commonly with 3-5 month gaps, resulting in sections with a subtle staircase ap-
422
pearance with the Kalman smoother predicting the same value over the observation gaps
423
(e.g. years 2000-2005).
424
Figure 5. The excess carbon component, ˆCte, for each depth-layer: (a) layer 1: 0-200m, (b) layer 2: 200-1000m, and (c) layer 3: 1000-4500m. Each plot shows the estimated mean (black line) and 95% confidence intervals (gray area). Monthly averaged excess DIC (i.e. Ct00are DIC anomalies after removing the seasonal and natural components) are also shown (purple dots).
Annual trends are shown for excess DIC (orange dashed line), and their slopes are reported in the legend in units ofµmol kg−1 year−1.
As a naive test for sampling bias, a linear trend was estimated for the reconstructed
425
total carbon over time. Similar trends were obtained regardless of the subset of months
426
used of our carbon estimate, thus supporting that we have accounted for the data’s sam-
427
pling bias. In contrast, trend analysis of individual GLODAP observations (not monthly
428
Accepted Article
averages) produced very different results for different subsets (i.e. only winter or only
429
summer data). The largest influence in accounting for seasonal sampling bias was the
430
pre-processing step of monthly averaging the data.
431
4.2 Comparison to eMLR
432
We now compare the results from our statistical approach with those obtained us-
433
ing the eMLR approach. We ran an eMLR analysis with potential temperature and prac-
434
tical salinity to estimate the increase of anthropogenic carbon in the study region using
435
GLODAP data from the first two years (1993-1994) and the last two years (2015-2016);
436
spatial plots of the data used are shown in Figure S3 of the Supplemental Information.
437
The eMLR anthropogenic increase over 24 years was estimated for layer 1, layer 2 and
438
layer 3 to be 0.52, 0.49, and 0.47 µmol kg−1 year−1, respectively. These eMLR results
439
agreed generally with other eMLR results from Friis et al. (2005), who reported results
440
between 0.4 and 1.6 µmol kg−1year−1 for transects in the same overall region of the
441
North Atlantic.
442
We then compared these eMLR results with our excess carbon results from dynamic
443
linear regression; the rates of increase for layer 1, layer 2 and layer 3 were 0.63±0.59, 0.57±0.11,
444
and 0.47±0.11 µmol kg−1year−1, respectively (from Figure 5). For layer 3, both the
445
eMLR and dynamic linear regression report the same rate of increase of 0.47µmol kg−1 year−1.
446
For layers 1 and 2, our dynamic linear regression results gave faster increases than than
447
the eMLR results, but the eMLR results are within our confidence intervals. Our method
448
produces roughly equivalent rates of overall anthropogenic increase as eMLR, but are
449
resolved on a monthly time scale providing information on climate-related changes in DIC
450
over time.
451
4.3 Column Inventories of Carbon
452
We converted our estimate of excess carbon concentration ( ˆCtein units ofµmol kg−1)
453
into column inventories of anthropogenic carbon (It), i.e. depth-integrated carbon per
454
Accepted Article
unit area (in units of mol m−2). The purpose was to facilitate comparison with other
455
values in the literature, which are often reported as column inventories. The conversion
456
It =ρ H Cˆte used the average density of seawaterρ=f(T, S, P) and the layer thick-
457
ness (H). For layer 3, which has observations from 1000-4500m, its layer thickness was
458
calculated with an an ocean bottom depth of 3500m, which was the median bottom depth
459
associated with GLODAP observations. The conversion toIt produced a time series of
460
mean column inventory in our study region that looks the same as Figure 5. The dif-
461
ference between the maximum and mininimum values gave estimates for the excess car-
462
bon storage per square meter for each layer: 5.55mol m−2 for layer 1 (0-200m), 13.86mol m−2
463
for layer 2 (200-1000m), and 20.63mol m−2for layer 3 (1000-3500m). The time series
464
for the three depth-layers were then added together and the difference between its max-
465
imum and minimum values gave an overall excess carbon storage of 35.41 mol m−2through
466
the whole water column over the period 1993 to 2015, corresponding to an average stor-
467
age rate of 1.54 mol m−2year−1. A conservative storage rate of 1.37±0.57 mol m−2 year−1
468
was estimated as the slope (±95% confidence interval) of a linear generalized least squares
469
regression through the time series for the whole water column (i.e. the combined layers).
470
Note, this water column estimate is strongly influenced by the excess carbon time se-
471
ries in layer 3 due to its very large layer thickness.
472
Our estimated storage rate of excess carbon of 1.37±0.57 mol m−2year−1, when
473
considering the width of its confidence interval, is consistent with the average storage
474
rate of 1 mol m−2 year−1 approximated from Figure 3A in Gruber et al. (2019) for our
475
study region. Our confidence interval also overlaps with the reported estimate and un-
476
certainty of 0.97±0.34 mol m−2year−1 for DIC storage rate in the subpolar northwest
477
Atlantic above 2000m (Tanhua & Keeling, 2012), as well as those for the Irminger Sea,
478
where Fr¨ob et al. (2018) reports the increase of anthropogenic carbon at 1.84 ±0.16 mol m−2 year−1.
479
The time spans of study were similar: 1994-2007 (Gruber et al., 2019), 1980-2010 (Tanhua
480
& Keeling, 2012), 1991-2015 (Fr¨ob et al., 2018) and 1993-2015 for our study. Differences
481
might be accounted by our data pre-processing step of spatially and monthly averaging
482
Accepted Article
data across a large region spanning both the Labrador Sea and Irminger Sea, while Gruber
483
et al. (2019) and Fr¨ob et al. (2018) analysed column inventories on a smaller spatial grid.
484
While spatial sampling bias across the large study region may affect our estimates, it nonethe-
485
less serves to demonstrate the analysis method while highlighting and mitigating data
486
availability complications, such as seasonal sampling bias.
487
5 Discussion and Conclusions
488
Our time series generalization of the eMLR method is based on a dynamic linear
489
regression (a state space model) that incorporates time dependence, accounts for irreg-
490
ular temporal sampling of data, and provides for an assessment of estimation errors based
491
on observation properties. We then analyzed the temporal trends and variability of DIC
492
on a monthly basis and estimated the anthropogenic increase of DIC in a manner that
493
accounts for the strong seasonal sampling bias of the input data. We improved the tem-
494
poral resolution of anthropogenic carbon estimates from being a simple difference be-
495
tween two widely separated time periods with eMLR (Carter et al., 2017; Friis et al., 2005)
496
to providing DIC estimates on a monthly time scale. This allows the opportunity to look
497
at inter-annual variability and elucidate connections between excess carbon uptake and
498
climate forcing or circulation changes. We were also able to present improved estimates
499
of the anthropogenic component of DIC in the surface layer by accounting for the strong
500
seasonal variations, which usually interfere.
501
For the northwest Atlantic we produced monthly time series estimates of DIC, with
502
uncertainties, including estimates for the seasonal cycle, natural variability and excess
503
anthropogenic carbon components. The annual increase of excess carbon due to anthro-
504
pogenic sources since 2000 was estimated at∼0.57µmol kg−1year−1 for all depth-layers.
505
Our estimated storage rate, through the full water column (0-3500m), of anthropogenic
506
carbon was 1.37±0.57 mol m−2year−1, which seems consistent with the 1 mol m−2year−1
507
shown in Figure 3A of Gruber et al. (2019) for our study region.
508
Accepted Article
While we summarize our results with linear trends it brings into question whether
509
a linear trend could be used directly in (3) to model anthropogenic carbon instead of a
510
correlated random walk (i.e. the time-varying intercept termβ0,t). Though a linear trend
511
in the model would provide for similar estimates of the rate of anthropogenic carbon in-
512
crease, the correlated random walk provides a framework that allows the state variable
513
to follow the data through its inter-annual variations while not constraining the data to
514
follow a straight line. As seen in Figure 5c, the trend is not linear for the whole time frame
515
of analysis, with an increase in excess carbon after 2000 but not before. The flexibility
516
of the correlated random walk framework also can show excess carbon variations at sub-
517
seasonal time scales.
518
Our excess carbon component describes most of the anthropogenic change of DIC
519
in the northwest Atlantic, but potentially not all of it. Global ocean temperatures are
520
warming and salinity in the subpolar North Atlantic is freshening (Sathyanarayanan et
521
al., 2021). These anthropogenic changes in temperature and salinity can then influence
522
what is here defined as the natural variability component of carbon. Our natural vari-
523
ability component for layer 3 (Figure 4c) shows an increasing trend starting at the year
524
2000 that may reflect this anthropogenic influence, and could even be associated with
525
the ‘warming hole’ of ocean temperature and changes in deep convection in the North
526
Atlantic (Drijfhout et al., 2012). Hence, a benefit of our method is that it allows for the
527
distinction between the anthropogenic change in ocean properties (i.e. temperature and
528
salinity) and anthropogenic changes in DIC due to increasing CO2in the atmosphere.
529
In contrast, Gruber et al. (2019) subtracted the non-steady state net flux of natural CO2
530
associated with anthropogenic climate change such as ocean warming. Meanwhile, Fr¨ob
531
et al. (2018) estimated the natural DIC component in the Irminger Sea to have had a
532
declining trend until 2015. This distinction of different sources of anthropogenic change
533
may be important in the future because they might even drive opposing changes. For
534
example, in the near-surface, the anthropogenic increase of atmospheric pCO2(partial
535
pressure of CO2) will increase ocean carbon, while warming ocean temperatures due to
536