Trends of Anthropogenic Dissolved Inorganic Carbon in the Northwest Atlantic Ocean Estimated Using a State Space Model

(1)

Estimated Using a State Space Model

Item Type Article

Authors Boteler, Claire; Dowd, Michael; Oliver, Eric C. J.; Krainski, Elias Teixeira; Wallace, Douglas W. R.

Citation Boteler, C., Dowd, M., Oliver, E. C. J., Krainski, E. T., & Wallace, D.

W. R. (2023). Trends of Anthropogenic Dissolved Inorganic Carbon in the Northwest Atlantic Ocean Estimated Using a State Space Model. Journal of Geophysical Research: Oceans. Portico. https://

doi.org/10.1029/2022jc019483 Eprint version Post-print

DOI 10.1029/2022jc019483

Publisher American Geophysical Union (AGU)

Journal Journal of Geophysical Research: Oceans

Rights This is an accepted manuscript version of a paper before final publisher editing and formatting. Archived with thanks to American Geophysical Union (AGU). The version of record is available from Journal of Geophysical Research: Oceans.

Download date 21/06/2023 17:58:32

Link to Item http://hdl.handle.net/10754/692675

(2)

in the Northwest Atlantic Ocean Estimated Using a

2

State Space Model

3

Claire Boteler¹(ORCID 0000-0003-1727-4510), Michael Dowd¹(ORCID

4

0000-0003-2805-293X), Eric C. J. Oliver²(ORCID 0000-0002-4006-2826), Elias

5

T. Krainski³(ORCID 0000-0002-7063-2615), Douglas W. R. Wallace²(ORCID

6

0000-0001-7832-3887)

7

1Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada.

8

2Department of Oceanography, Dalhousie University, Halifax, Nova Scotia, Canada.

9

3Department of Statistics, King Abdullah University of Science and Technology, Thuwal, Kingdom of

10

Saudi Arabia.

11

Key Points:

12

• A time series generalization of the eMLR method is developed to produce monthly

13

estimates with uncertainties of anthropogenic ocean carbon.

14

• The rate of anthropogenic carbon increase in northwest Atlantic is roughly the same

15

for all depth layers, at 0.57 micro-mol/kg/year.

16

• Our method produces estimates of anthropogenic carbon increase that are com-

17

parable to those from eMLR, but with smaller uncertainties.

18

Corresponding author: Claire Boteler,[email protected]

Accepted Article

(3)

Accepted Article

Abstract

19

The northwest Atlantic Ocean is an important sink for carbon dioxide produced by an-

20

thropogenic activities. However the strong seasonal variability in the surface waters paired

21

with the sparse and summer biased observations of ocean carbon makes it difficult to cap-

22

ture a full picture of its temporal variations throughout the water column. We aim to

23

improve the estimation of temporal trends of dissolved inorganic carbon (DIC) due to

24

anthropogenic sources using a new statistical approach: a time series generalization of

25

the extended multiple linear regression (eMLR) method. Anthropogenic increase of north-

26

west Atlantic DIC in the surface waters is hard to quantify due to the strong, natural

27

seasonal variations of DIC. We address this by separating DIC into its seasonal, natu-

28

ral and anthropogenic components. Ocean carbon data is often collected in the summer,

29

creating a summer bias, however using monthly averaged data made our results less sus-

30

ceptible to the strong summer bias in the available data. Variations in waters below 1000m

31

have usually been analysed on decadal time scales, but our monthly analysis showed the

32

anthropogenic carbon component had a sudden change in 2000 from stationary to an in-

33

creasing trend at the same rate as the waters above. All depths layers had similar rates

34

of anthropogenic increase of∼0.57µmol kg⁻¹ year⁻¹, and our uncertainty levels are smaller

35

than with eMLR results. Integration throughout the water column (0-3500m) gives an

36

anthropogenic carbon storage rate of 1.37±0.57 mol m⁻² year⁻¹, which is consistent

37

with other published estimates.

38

Plain Language Summary

39

We need to measure the ocean sink for the CO₂emitted by industrialized societies,

40

and it is particularly important for the northwest Atlantic Ocean. The rate of carbon

41

increase is often overshadowed by natural and seasonal variability. We introduce new sta-

42

tistical approaches to better estimate the rate of anthropogenic carbon that has accu-

43

mulated due to human activities. Ocean carbon data is often collected in the summer,

44

creating a summer bias, however using monthly averaged data made our results less sus-

45

(4)

Accepted Article

ceptible to the strong summer bias in the available data. From 1993 to 2015 in the north-

46

west Atlantic Ocean, anthropogenic carbon increased at∼0.57µmol kg⁻¹ year⁻¹within

47

all depth-layers. Integration of results throughout the water column (0-3500m) gives an

48

anthropogenic carbon storage rate of 1.37±0.57 mol m⁻² year⁻¹.

49

1 Introduction

50

A key element for climate change projection and for carbon emission policy is how

51

effective the ocean is, and will be, in mitigating the effects of anthropogenic CO₂emis-

52

sions. The northwest Atlantic is one of the largest oceanic carbon sinks on an areal ba-

53

sis, taking in and storing large amounts of anthropogenic CO₂(Khatiwala et al., 2013).

54

There are several chemical, physical and biological factors that influence how much CO₂

55

the ocean takes up throughout the year. In the winter, cold winds cool the surface wa-

56

ters, increasing the solubility of CO₂(Emerson & Hedges, 2008), which enhances dissolved

57

inorganic carbon (DIC) concentrations in the surface layers as a result of air-sea gas ex-

58

change. Temperature also influences the physical uptake processes of the area, as cool-

59

ing reduces stratification and facilitates the transport of DIC rich waters from the sur-

60

face to the deeper waters. The shoaling of the mixed layer in the summer traps DIC rich

61

waters in the subsurface ocean where it is no longer in contact with the atmosphere. While

62

respiration will release CO₂into the surface waters, phytoplankton will use CO₂for pho-

63

tosynthesis. Strong, seasonal biological production also promotes seasonal uptake of CO₂

64

from the atmosphere, which is transported deeper via the biological carbon and mixed-

65

layer pumps (Lacour et al., 2019).

66

The column inventory of anthropogenic DIC is the depth-integrated amount of ocean

67

carbon present as a result of human activities, reported on a per area basis. In the north-

68

west Atlantic this was estimated at 160 mol m⁻²in 2010, several times greater than the

69

global average (Khatiwala et al., 2013). The global ocean’s uptake of atmospheric CO₂

70

estimated in the last 10 years, as diagnosed from observations, has been increasing at

71

(5)

Accepted Article

a faster rate than the uptake estimated from ocean models (see Figure 9 in Friedlingstein

72

et al. (2020)). Such inconsistencies motivate our goal of improving the estimation of up-

73

take and storage of anthropogenic carbon for the northwest Atlantic, as determined di-

74

rectly from DIC observations.

75

Observational data from two main sources are used to study the ocean carbon cy-

76

cle. The SOCAT dataset provides near-surface ocean pCO₂measurements made on re-

77

search cruises and ships of opportunity (Bakker et al., 2016). This dataset has been used

78

to estimate air-sea fluxes on the global scale with methods ranging from neural networks

79

(Landsch¨utzer et al., 2014) to atmospheric inversion (R¨odenbeck et al., 2013, 2015) and

80

interpolation (Goddijn-Murphy et al., 2015). The GLODAP dataset provides discrete

81

water sample measurements for the ocean interior (Lauvset et al., 2021). This dataset

82

has been used to estimate the build up of anthropogenic CO₂in the oceans (Lauvset et

83

al., 2016; Khatiwala et al., 2013; Sabine, 2004) and diagnose ocean transport of carbon

84

(Holfort et al., 1998; Macdonald et al., 2003). Analyses of temporal changes have tended

85

to use regression approaches (Friis et al., 2005; Bostock et al., 2013; Gruber et al., 2019),

86

but also use mechanistic approaches based on ocean biogeochemical models (Gerber &

87

Joos, 2010; Friedlingstein et al., 2020; Carroll et al., 2022). Our study focuses on the use

88

of interior ocean DIC observations from GLODAP.

89

An important and long-standing research question is how to separate DIC into its

90

natural and anthropogenic components (Wallace, 2001; Sabine & Tanhua, 2010), where

91

anthropogenic typically refers to the excess DIC present in the ocean as a result of hu-

92

man activities (i.e., due to atmospheric CO₂increases since preindustrial times). Mul-

93

tiple linear regression (Wallace, 1995), and especially the extended Multiple Linear Re-

94

gression method (eMLR) (Friis et al., 2005), are now widely used to estimate the increase

95

of anthropogenic DIC. This is usually done between two time points that are a decade

96

or more apart and on repeat hydrographic sections (Thacker, 2012). This approach has

97

been extended recently to include eMLR in a depth-spatial moving window (Carter et

98

(6)

Accepted Article

al., 2017), eMLR for column inventories (Plancherel et al., 2013), and eMLR(C*) (Clement

99

& Gruber, 2018). eMLR approaches are based on exploiting the changing relationship

100

between observed DIC and other correlated ocean variables that are not directly impacted

101

by anthropogenic CO₂uptake, such as temperature, salinity and oxygen. The eMLR(C*)

102

approach, for example, incorporates more ocean predictor variables and centres the anal-

103

ysis on a derived variable, called C*, that more closely relates to anthropogenic carbon

104

than DIC. The eMLR based methods have proven to be useful for estimating trends over

105

long time intervals (decades) in the sub-surface waters, but few studies have attempted

106

an observation-centric analysis of DIC variations on shorter time scales. In addition, they

107

have also not provided for reliable estimates in the surface ocean. This study aims to

108

rectify those shortcomings and improve our understanding of ocean DIC.

109

The main challenges to quantifying the temporal variability of the carbon sink and

110

the carbon inventory include: the seasonal variability that is particularly strong in the

111

surface waters, observation sparsity, and a strong bias towards summer sampling (McKinley

112

et al., 2017; Fassbender et al., 2018). There is potential for use of improved time series

113

approaches to optimally extract information from existing DIC observational databases

114

to improve our quantification of DIC dynamics over multiple time scales. Gruber (2002)

115

has, for example, used time series methods that incorporate deseasonalization via har-

116

monic analysis, fitting of linear trends, and analysing correlations between variables. In

117

this study, we build upon and generalize the eMLR method into a time series framework

118

using dynamic linear regression. This allows us to analyze seasonally biased observations

119

of DIC and isolate its sources of variability, while improving temporal resolution of the

120

analysis and producing objective observation-driven error estimates. Our method pro-

121

vides estimation of anthropogenic changes in DIC, making it capable of assessing non-

122

linear trends in anthropogenic CO₂uptake, as well as its seasonal and inter-annual vari-

123

ability. Here, we use this approach to study DIC in the northwest Atlantic within three

124

depth-layers ranging from the surface to the deep waters, and estimate on a monthly time

125

scale the seasonal, natural variability and anthropogenic DIC. We also investigate the

126

(7)

Accepted Article

influence of seasonally biased observations on the analysis and results. This provides a

127

North Atlantic perspective to complement the work performed with seasonally biased

128

data in the Southern Ocean (Fassbender et al., 2018; Mackay et al., 2022).

129

2 Data

130

The principal dataset used in this study is the GLODAPv2.2019 data product (Olsen

131

et al., 2016, 2019), hereafter referred to as GLODAP. It is a global collection of quality

132

controlledin situmeasurements from scientific cruises conducted from 1972 to 2017. It

133

contains 12 core biogeochemical variables, including our target variable of Dissolved In-

134

organic Carbon (DIC), which were measured from bottle samples and include their time

135

and location coordinates (i.e. date, latitude, longitude and depth). Physical variables

136

also reported include potential temperature and practical salinity; we make use of these

137

two variables in our analysis because they have a strong mechanistic and statistical con-

138

nection with DIC, are widely available, and are commonly used in eMLR analyses (Friis

139

et al., 2005). The remaining biogeochemical variables have less frequentin situmeasure-

140

ments than temperature and salinity, and were chosen not to be include in this analy-

141

sis. Hence, we considered only DIC and covariates of temperature and salinity in our anal-

142

ysis, although we note that in the future measurements of biogeochemical variables, such

143

as oxygen from profiling floats, could be used with our approach to improve results.

144

Our region of interest is the northwest Atlantic, which we define as the region south

145

of 65^◦ N within the Atlantic Arctic (ARCT) Longhurst biogeochemical province (Longhurst,

146

2007) (Figure 1a). It includes areas of deep convection that are associated with intense

147

anthropogenic carbon storage (Raimondi et al., 2021). These areas of deep convection

148

have been shown to shift their location and intensity in space and time (R¨uhs et al., 2021).

149

The relatively broad study area was chosen to ensure we captured all key regions of deep

150

convection, as well as the more pragmatic concern of having a large enough data set for

151

our analysis.

152

(8)

Accepted Article

We separated the data into three depth-layers: layer 1: 0-200m, layer 2: 200m-1000m,

153

and layer 3: 1000-4500m. These correspond to the euphotic zone (sunlit), the mesopelagic

154

zone (twilight) and the bathypelagic zone (midnight), respectively (Berger & Shor, 2009).

155

Biogeochemical ocean regions transition from surface to intermediate and deep waters

156

at different depths due to the mixed layer depth which varies seasonally, interannually

157

and spatially, and can be challenging to define quantitatively (R¨uhs et al., 2021). For

158

this analysis the depths of layer boundaries were chosen to be constant through the year,

159

and correspond to how the mean and standard deviation of DIC changed with depth (Fig-

160

ure 1b). The near surface, layer 1, has the largest spread of DIC due to direct gas ex-

161

change with the atmosphere and the seasonal biological activity that occurs in the eu-

162

photic zone (mean 2124µmol kg⁻¹ and standard deviation 31µmol kg⁻¹). The sepa-

163

ration between layer 1 and layer 2 at 200m is similar to that used by others (Brewer et

164

al., 1995; Ullman et al., 2009; Turk et al., 2017), as is the separation between layer 2 and

165

layer 3 around 1000m (Emery, 2001; Bostock et al., 2013; Keppler et al., 2020). In layer

166

2 the mean DIC value becomes more constant and with a large reduction in spread (mean

167

2158µmol kg⁻¹and standard deviation 8.6µmol kg⁻¹). The deep layer 3 has an even

168

smaller spread with a near-constant DIC (mean 2159µmol kg⁻¹ and standard deviation

169

5.6 µmol kg⁻¹).

170

The DIC observations from January 1993 to December 2015 have a distinct sea-

171

sonal sampling bias, which is clearly seen in the distribution of observations by month

172

(Figure 2a). Of the individual observations, 37% were recorded in the summer (July to

173

Oct) while only 2% were recorded in the winter (Jan to Apr). The seasons are defined

174

as the four months with the warmest and coldest temperatures in the ocean surface layer.

175

Figure 2b shows DIC observations over time in the northwest Atlantic and highlights the

176

difference between winter and summer, with summer having lower DIC values due to higher

177

sea surface temperatures causing some outgassing but mainly due to phytoplankton pho-

178

tosynthesis using CO₂(Montes-Hugo et al., 2010) in the surface waters. Fitting a lin-

179

ear temporal trend using least squares regression to the winter observations shows it is

180

(9)

Accepted Article

^{Figure 1.} Spatial distributions of GLODAPv2.2019 data. (a) Geographical location of observations (semi-transparent gray dots). Repeated observations at the same location or at multiple depths or times are plotted on top of each other (appearing as solid black dots). The Atlantic Arctic Longhurst Province is indicated (gray polygon). (b) Observations of DIC vs depth are given. Horizontal dashed lines identify the 200 and 1000m depths that separate our data into three depth-layers.

increasing at 0.77±0.07µmol kg⁻¹ year⁻¹, a faster rate than the 0.23±0.41µmol kg⁻¹ year⁻¹

181

for the summer observations. Figure 2b motivated the question of whether winter and

182

summer really have different DIC trends, or if this an artifact of sampling bias. This will

183

be explored in Section 4.1. We pre-processed the data by separating it into the three depth-

184

layers, after which each subset was turned into monthly averaged time series (which still

185

have observation gaps). The monthly data was better balanced in terms of its seasonal

186

distribution, i.e. of the months occurring in winter between 1993-2015, 8% had at least

187

1 observation and likewise 12% in summer had at least one observation. By averaging

188

all observations in a month, this implicitly assumed that the observations in that month

189

had homogeneous oceanographic properties, which is a viable assumption for our data

190

contained within the Atlantic Arctic Longhurst Province.

191

(10)

Accepted Article

GLODAP has many data gaps through time (Figure 2a). Our analysis has DIC as

192

its target variable but, like the eMLR method, makes use of temperature and salinity

193

as covariates to better estimate it. To improve the spatial and temporal coverage of tem-

194

perature and salinity, we used the GLORYS12v1 reanalysis product to provide a com-

195

plete time series for these physical oceanographic variables (GLORYS12V1 - Global Ocean

196

Physical Reanalysis Product, 2018). GLORYS12v1 provides potential temperature and

197

practical salinity values over a regular 1/12^◦ spatial grid, at 50 vertical depth-levels and

198

on daily time intervals from 1993-2018. As with the GLODAP data, this daily reanal-

199

ysis product was separated into our three depth-layers and converted into times series

200

of monthly and spatially averaged values within our region of interest in the northwest

201

Atlantic. Figure S2 in Supplemental Information shows that deseasonalized values for

202

temperature and salinity from both GLODAP and GLORYS12v1 agree, with GLODAP

203

having a slightly larger scatter, but are centered around the GLORYS12v1 values.

204

Figure 2. (a) Proportion by month of GLODAPv2.2019 DIC observations for depths 0-200m.

Numbers at the top of each bar gives the proportion of observations occurring in that month.

(b) Dissolved inorganic carbon (DIC) over time in the Northwest Atlantic domain for depths 0-200m. Dots are colored by the season they were recorded: winter i.e. Jan to Apr (blue dots), and summer i.e. July to Oct (red dots). Note that spring and fall are not shown. To each of the different subsets, linear temporal trends were fitted using ordinary least squares, and their slopes are reported in units ofµmol kg⁻¹ year⁻¹.

(11)

Accepted Article

3 Methods

205

The eMLR method (Friis et al., 2005) has been widely used to estimate the anthro-

206

pogenic increase of DIC between two time points, typically a decade or more apart. For

207

a repeat-sampled ocean transect and assuming the relationship between the natural vari-

208

ability of DIC and oceanographic variables remains constant over time, any changes in

209

the regression coefficients of the relationship can be interpreted as a non-natural change,

210

i.e. excess DIC due to anthropogenic sources (Thacker, 2012). When this is evaluated

211

for the later time point, it yields an estimate of the anthropogenic increase of DIC over

212

the time interval of study.

213

Here we develop and apply a time series generalization of the eMLR method. We

214

build upon the eMLR methodology by using a time-invariant linear regression relation-

215

ship of DIC against temperature and salinity to represent the natural variability of DIC,

216

and we also include a time-varying term to represent the excess DIC. As with the eMLR

217

method, we seek to quantify the change in DIC over time, but ours is a time series method

218

that better resolves the changes through time. While the eMLR looks at differences over

219

a fixed time interval, our time series approach constrains variability over a range of time

220

scales, and allows us to look at inter-annual to decadal variability of anthropogenic DIC.

221

It can also account for common observational issues such as varying sampling intervals

222

and data gaps. Importantly, it can produce statistically objective error bars that change

223

through time and reflect the data properties and distribution. Our approach, as detailed

224

below, is based on a decomposition of the DIC observations into three components: a

225

seasonal cycle, natural variability, and excess carbon due to anthropogenic sources. It

226

produces monthly time series of these carbon components (with confidence intervals).

227

(12)

Accepted Article

3.1 Decomposition of DIC

228

We designate DIC observations asC_t. The first step was to deseasonalize them, such that.

C_t⁰=Ct−C_t^s (1)

whereC_t⁰ are the deseasonalized DIC anomalies andC_t^srepresents the mean seasonal cycle over the period of interest. This seasonal cycle,C_t^s was estimated empirically by taking the mean of each month (Jan, Feb, etc.). The equivalent deseasonalization procedure was also performed for our other variables of temperatureT_t and salinityS_t. The DIC anomalies were then further decomposed as follows:

C_t⁰=C_t^e+C_tⁿ+vt (2)

whereC_t^e represents the excess carbon (e.g., assumed due to anthropogenic sources) and

229

C_tⁿrepresents the natural variability (e.g., from variations in oceanic water properties).

230

Thevtis called the observation error, and contains the remaining variability not described

231

by the excess and natural variability components, i.e. within-month variations, and mea-

232

surement error. Separating the natural and anthropogenic changes of DIC in (2) is the

233

same concept proposed in Clement and Gruber (2018), but the methods used to estimate

234

the natural and anthropogenic components target different parts of the carbon system,

235

i.e. circulation and mixing, biological pump, solubility pump, and air-sea flux, and will

236

be discussed further in Section 5.

237

In order to separate and estimate the excess carbon and natural variability components, we build upon foundational assumptions of the eMLR method and re-cast (2) as the regression equation

C_t⁰ =β_0,t+β₁T_t⁰+β₂S_t⁰+v_t. (3)

The excess DIC,C_t^e, is now represented by the time varying intercept termβ0,t. The nat-

238

ural variability,C_tⁿ, is equated with the termsβ1T_t⁰+β2S_t⁰, representing a linear rela-

239

(13)

Accepted Article

tionship between anomalies of DIC and anomalies of temperature,T_t⁰, and salinity,S⁰_t.

240

Note, the linear regression could be expanded to include biogeochemical ocean variables

241

such as oxygen, etc. We assumed observation errorv_t∼N(0, σ_v²). For this time-dependent

242

regression, we estimated the four parameters: β0,t,β1,β2, andσv using the monthly av-

243

eraged and deseasonalized GLODAP DIC data.

244

To illustrate how our method relates to the equations used in the eMLR method, consider the eMLR prediction equation for anthropogenic carbon (C^{eM LR}) that has been expanded to show its two regressions:

C^{eM LR}= (a_t₂−a_t₁) + (b_t₂−b_t₁)T_t₂+ (c_t₂−c_t₁)S_t₂

= (at₂+bt₂Tt₂+ct₂St₂)−(at₁+bt₁Tt₂+ct₁St₂).

The regression parametersat₁,bt₁ andct₁ would be estimated from DIC, temperature

245

and salinity data at timet1, and similarly for parametersat₂,bt₂ andct₂ estimated at

246

timet2. Both regressions are predicted atTt2 andSt2, temperature and salinity at time

247

t₂. When the eMLR is expanded to show its two regressions, the second bracket (a_t₁+

248

b_t₁T_t₂+c_t₁S_t₂) is equivalent to our natural component but evaluated at each timet in

249

our monthly time series. The anthropogenic carbon (C^{eM LR}) is equivalent to our time

250

varying intercept term (β0,t).

251

3.2 Dynamic Linear Regression

252

To estimate excess carbon,C_t^e viaβ0,t, we used a state space modelling framework that corresponds to dynamic linear regression (Zivot & Wang, 2003; Laine, 2020). This uses two equations: an observation equation, as described by (3) above, and a prediction equation (4) as described below. The prediction equation predicts the time evolution of excess carbon, as embodied in the regression interceptβ0,t. It takes the form

β0,t=β_0,t−1+φ(β_0,t−1−β_0,t−2) +wt. (4)

(14)

Accepted Article

This is a correlated random walk process. It predicts the excess carbon at timet(β_0,t)

253

based on its previous two values at timest−1 andt−2. Specifically, the update is based

254

onβ_0,t−1 plus a proportion of the change occurring between the previous two time steps

255

(β_0,t−1−β_0,t−2). Note that the unit time increments used here refer to the time inter-

256

val of the analysis, in this instance monthly. The parameterφis a measure of the strength

257

of the excess carbon time tendency; we term this theproportion of trend persistence. Note

258

that we used a random walk instead of an auto-regressive process for the prediction equa-

259

tion (4) because such a stationary autoregressive processes have a mean reversion prop-

260

erty (i.e. when predicting forward in time, they are drawn back to the mean), which is

261

inconsistent with the increasing anthropogenic DIC over time that we wish to quantify.

262

The purpose of the correlated random walk is to introduce temporal memory in excess

263

carbon, with the DIC observations guiding its time evolution through the state space model

264

(3)-(4). The prediction error for the new intercept is given bywt ∼N(0, σ_w²), and its

265

varianceσ²_w will be estimated as part of the implementation (see section 3.3).

266

The Kalman filter/smoother algorithm (see, e.g. (Anderson & Moore, 1979)) was

267

used to solve this dynamic linear regression (3)-(4). This fixed-interval smoother pro-

268

vides for estimation of the system state, which in this instance is excess carbon,β_0,t. Specif-

269

ically, it provides for the monthly estimates of the mean values of excess carbon and its

270

time-varying variance. The variance can be used to produce errors bars, i.e. confidence

271

intervals for excess carbon. The algorithm uses sequential, or recursive, estimation. This

272

relies on the Kalman filter, which operates as a forward in time recursion such that at

273

each time stept, an estimate of the new excess carbon value is available via a one-step

274

ahead prediction using (4). This prediction is then updated to be closer to the DIC ob-

275

servation using a smoother step that further refines these estimates through a backwards

276

in time recursion. The ratio of the variances for the observation error (σ_v) to the pre-

277

diction error (σw) is a key quantity that dictates how closely the results follow the ob-

278

servations. To facilitate their specification, the Kalman filter also allows for parameter

279

estimation through likelihood-based approaches. For this study, we estimated the pa-

280

(15)

Accepted Article

rametersφandσ²_w using maximum likelihood, whileσ_v was estimated from analysis of

281

DIC observations . For details of the Kalman filter/smoother and parameter estimation,

282

the reader is referred to the Supplementary Information.

283

3.3 Implementation

284

To implement our proposed methodology, the following steps were taken to per-

285

form the analysis on each depth-layer. Following the pre-processing of the DIC,T and

286

S data into monthly average time series, we: (i) determine the seasonal cycle, and de-

287

seasonalize DIC,T andS; (ii) find the regression coefficients relating DIC toT andS,

288

i.e. β_0,t=1,β₁ andβ₂; (iii) specify initial conditions for mean and variance of excess car-

289

bon; (iv) specify or estimate the state space model parameters: σ_v,σ_w andφ; (v) carry

290

out state estimation of excess carbon component with Kalman Smoother; (vi) reconstruct

291

the total carbon time series by combining its components (excess, natural, seasonal). These

292

steps are outlined in detail below.

293

(i) Seasonal component and deseasonalize DIC, T and S

294

After pre-processing the data into a monthly average time series, the seasonal cy-

295

cle of DIC, ˆC_t^s, was estimated from the GLODAP monthly observations as the long-term

296

mean for each month (e.g. the mean of all Januaries across all years of data). The sea-

297

sonal cycle of DIC was removed to produce observations of deseasonalized DIC anoma-

298

lies,C_t⁰. The temperature and salinity observations were also deseasonalized to yieldT_t⁰

299

andS_t⁰.

300

(ii) Estimate regression coefficients that relateC_t⁰ to T_t⁰ and S_t⁰: β0,t=1, β1 and

301

β₂

302

The regression coefficients ˆβ_0,t=1, ˆβ₁, and ˆβ₂ from (3) were estimated with ordi-

303

nary least-squares regression using deseasonalized GLODAP data from the first five years

304

(1993-1997). When estimating the regression parameters for the covariates, it was im-

305

portant to check the overall regression significance (F-test), and the significance for each

306

(16)

Accepted Article

parameter (t-test). If the regression was not significant, then the natural component be-

307

come a constant value set to 0 and we set ˆβ₁ = ˆβ₂ = 0, as is seen in Table 1. In the

308

case that one parameter was not significant then that parameter was set to 0 (e.g. ˆβ₂=

309

0), effectively simplifying the regression to have only one covariate.

310

The five year time span was chosen to provide enough data to estimate a statis-

311

tically significant regression, but small enough that we could assume the anthropogenic

312

changes of carbon within the time span were negligible. The relationship between car-

313

bon and its covariates change over time, such that if regressions on the first five years

314

and the last five years were performed, their estimated parameters would be different,

315

which is the foundation of the eMLR method (Friis et al., 2005). We took the eMLR as-

316

sumption thatthe natural relationship of carbon with covariates does not change over

317

time literally by using data at only the beginning of the analysis period in order to es-

318

tablish the baseline natural oceanographic relationship between DIC and temperature

319

and salinity, that would not take into account anthropogenic changes over time.

320

(iii) Initial conditions for mean and variance of excess carbon

321

To run the sequential Kalman smoother algorithm, we require an initial condition

322

for the mean and variance of the state. The initial condition for its mean, ˆβ_0,t=1, was

323

estimated in the previous step with the other regression coefficients, using ordinary least-

324

squares regression with the deseasonalized GLODAP data from the first two years (1993-

325

1997). For the initial condition of its variance ˆσ²e,t=1 a reasonable value is selected by

326

iterating through steps (iii) - (v) a couple of times for an initial variance that did not

327

artificially inflate or deflate the confidence intervals of excess carbon.

328

(iv) Estimate state space model parameters: σ_v,σ_w and φ

329

The observation error standard deviation,σ_v, was derived from deviations of the

330

original DIC observations,C, about the estimated seasonal cycle. Specifically, standard

331

errors for each of the 12 months were calculated from the original GLODAP observa-

332

(17)

Accepted Article

tions for the duration of the analysis period. The median of these values was used as ˆσ_v

333

and are reported in Table 1. (Note that this standard error is for monthly mean values

334

and hence corresponds to what is here termed the observation standard deviation of DIC

335

anomaliesC_t⁰).

336

Parameter estimates for the prediction standard deviation,σw, and the proportion

337

of trend persistence,φ, were determined using the method of maximum likelihood. The

338

likelihood is determined by using the one-step ahead predictions of the Kalman filter,

339

and its detailed computation is outlined in the Supplemental Information. The maxi-

340

mum likelihood parameter values forφandσ_w are given in Table 1.

341

(v) State estimation of excess carbon with Kalman Smoother

342

We execute the dynamic linear regression (3) - (4) with the Kalman smoother al-

343

gorithm, producing a monthly estimate of the excess carbon component ˆβ0,t and its vari-

344

ance ˆσ_e,t² , which was used to construct 95% confidence intervals. Details of the Kalman

345

smoother are provided in the Supplemental Information.

346

(vi) Reconstructing total carbon

347

The total carbon time series was reconstructed by combining estimates for its sea-

348

sonal, natural and excess components. The natural variability component within (3) is

349

estimated using the regression parameters ˆβ1and ˆβ2 along with deseasonalized temper-

350

ature and salinity from a prediction dataset ( ˆC_tⁿ= ˆβ1T_p,t⁰ + ˆβ2S_p,t⁰ ). The GLORYS12v1

351

reanalysis product was used as the prediction dataset. The errors of the natural com-

352

ponent were calculated using the 95% prediction intervals from the linear regression, which

353

is equivalent to the amount of variability that remains to be described by the excess car-

354

bon component.

355

As a final remark on error bars, by adding together the carbon components (sea-

356

sonal, natural variability and excess carbon) we get a complete monthly time series es-

357

timate of total carbon ( ˆCt = ˆC_t^s+ ˆC_tⁿ+ ˆC_t^e). As the components are additive, so are

358

(18)

Accepted Article

the variances for natural variability and excess carbon, which produces a confidence in-

359

terval for carbon anomalies. This uncertainty was also used for total carbon because we

360

considered the deseasonalization done in (1) as a linear transformation prior to our main

361

analysis. For the seasonal component, confidence intervals (1.96×the standard error

362

of the seasonal cycle of DIC) show the within month variability that was considered for

363

the estimation of the observation standard deviation (ˆσv). Table 1 gives estimates for

364

all the parameters needed to initialize and execute the dynamic linear regression: the

365

estimated initial regression coefficients ( ˆβ0,t=1, ˆβ1, and ˆβ2), initial standard error (ˆσe,t=1),

366

observation standard deviation (ˆσ_v), prediction standard deviation (ˆσ_w) and the propor-

367

tion of trend persistence ( ˆφ).

368

Table 1. Dynamic linear regression parameters estimated for each depth-layer

βˆ_0,t=1 βˆ₁ βˆ₂ σˆ_e,t=1 σˆ_v σˆ_w φˆ Layer 1: 0-200m -5.40 -6.97 47.72 2.0 10.1 0.10 0.64

Layer 2: 200-1000m -4.17 0 0 1.7 2.2 0.95 0

Layer 3: 1000-4500m -2.73 7.40 0 1.5 6.5 0.10 0.46

4 Results

369

4.1 Carbon Components: Seasonal, Natural & Excess

370

The seasonal evolution of DIC ( ˆC_t^s) for our three depth-layers is shown in Figure 3.

371

The surface water in layer 1 shows a prominent annual cycle peaking at 2142µmol kg⁻¹

372

in the spring followed by the lowering of DIC corresponding to the spring bloom of phy-

373

toplankton. In the fall, DIC begins to rise again, aligning with the deepening of the mixed

374

layer, incorporating waters with higher DIC values due to subsurface respiration. Layer

375

1 seasonal DIC showed a secondary peak in the fall at 2133µmol kg⁻¹, however note

376

there is large uncertainty associated with the dip in December. The deeper waters main-

377

tain a roughly constant value throughout the year. The annual average of the seasonal

378

cycle in layer 2 is larger than in layer 1 (by 32µmol kg⁻¹) and for layer 3 it is slightly

379

(19)

Accepted Article

larger again (by 12µmol kg⁻¹). Layer 3 shows a subtle seasonal cycle that could pos-

380

sibly be related to changes through the year within the dominant water mass that flows

381

into the region via boundary currents, similar to the seasonal signal discovered for oxy-

382

gen concentrations in the deep Labrador Sea boundary current (Koelling et al., 2022).

383

However, based on the width of the confidence interval, the seasonal cycle in layer 3 may

384

not be statistically significant.

385

Figure 3. Seasonal cycle of DIC, ˆCt^s, for each depth-layer: (a) layer 1: 0-200m, (b) layer 2:

200-1000m, and (c) layer 3: 1000-4500m. Each plot shows the estimated mean (black line) and 95% confidence intervals (gray area).

The natural variability component of carbon ( ˆC_tⁿ) shows variations of DIC that are

386

related to oceanic temperature and salinity variations (Figure 4), and captures the inter-

387

and intra-annual variations of carbon. For the natural component in layer 3, the vari-

388

ations are very small with an amplitude of 1µmol kg⁻¹and a significant trend of 0.07±

389

0.04µmol kg⁻¹ year⁻¹as estimated via generalized least squares regression. Layer 1 has

390

no significant increasing trend, meanwhile its fluctuations have a larger amplitude of 7.9µmol kg⁻¹.

391

Upon further inspection of the layer 1 natural component using spectral analysis, its sta-

392

tistical character changed at∼ 2006 from high frequency (i.e. periods shorter than a

393

year) to low frequency (i.e. periods longer than a year) (Figure 4a). The natural com-

394

ponent in layer 2 is shown as a constant line because a statistically significant regression

395

relationship could not be estimated for DIC anomalies against temperature and salin-

396

(20)

Accepted Article

ity covariates (Table 1 shows ˆβ₁= ˆβ₂= 0). A non-significant regression relationship oc-

397

curred due to the small amount of data used for fitting, and also the natural variabil-

398

ity being small. The natural regression relationship for layer 1 used both temperature

399

and salinity, while that for layer 3 used only temperature. The 95% prediction intervals

400

show the amount of variability that remains to be described by the excess carbon com-

401

ponent.

402

Figure 4. The natural variability component of carbon, ˆCtⁿ, for each depth-layer: (a) layer 1: 0-200m, (b) layer 2: 200-1000m, and (c) layer 3: 1000-4500m. Each plot shows the estimated mean (black line) and 95% prediction intervals (gray area).

The excess components of carbon ( ˆC_t^e) represent the increase of anthropogenic DIC

403

over time, with the largest annual increase occurring in the surface waters. The anthro-

404

pogenic increases are presented as a linear trend for convenience, and fit using general-

405

ized least squares. Considering their confidence intervals overlap, all three layers have

406

approximately the same rate of increase of∼0.57µmol kg⁻¹ year⁻¹(Figure 5). Specif-

407

ically, the trend in layer 1 is 0.63±0.59µmol kg⁻¹year⁻¹, layer 2 is 0.57±0.11µmol kg⁻¹ year⁻¹,

408

and layer 3 is 0.47±0.11µmol kg⁻¹ year⁻¹. However, the layer 3 trend was only es-

409

timated using 2000-2015 data. Prior to 2000, the excess carbon component had a slight

410

declining trend, though it was not statistically significant. The apparent difference in trend

411

before and after 2000 is notable, and is suggestive of low frequency variability in the ven-

412

tilation of the deep ocean.

413

(21)

Accepted Article

The time series characteristics of excess carbon in layer 2 (Figure 5b) are influenced

414

by the observations and the input parameters used for the dynamic linear regression (i.e.,

415

σ_v, σ_w, andφ). The maximum likelihood procedure estimated the trend persistence to

416

be zero (φ= 0), which suggests that excess carbon in this layer follows a random walk,

417

instead of a correlated random walk (whenφ 6= 0). This leads to a more blocky and

418

erratic (rather than smoothly varying) appearance. Excess carbon in layer 1 and 3 have

419

smoother inter-annual variations due to following a correlated random walk through the

420

observations (φ= 0.64 and 0.46, respectively). In layer 3, the observations are less fre-

421

quent, commonly with 3-5 month gaps, resulting in sections with a subtle staircase ap-

422

pearance with the Kalman smoother predicting the same value over the observation gaps

423

(e.g. years 2000-2005).

424

Figure 5. The excess carbon component, ˆCt^e, for each depth-layer: (a) layer 1: 0-200m, (b) layer 2: 200-1000m, and (c) layer 3: 1000-4500m. Each plot shows the estimated mean (black line) and 95% confidence intervals (gray area). Monthly averaged excess DIC (i.e. C_t⁰⁰are DIC anomalies after removing the seasonal and natural components) are also shown (purple dots).

Annual trends are shown for excess DIC (orange dashed line), and their slopes are reported in the legend in units ofµmol kg⁻¹ year⁻¹.

As a naive test for sampling bias, a linear trend was estimated for the reconstructed

425

total carbon over time. Similar trends were obtained regardless of the subset of months

426

used of our carbon estimate, thus supporting that we have accounted for the data’s sam-

427

pling bias. In contrast, trend analysis of individual GLODAP observations (not monthly

428

(22)

Accepted Article

averages) produced very different results for different subsets (i.e. only winter or only

429

summer data). The largest influence in accounting for seasonal sampling bias was the

430

pre-processing step of monthly averaging the data.

431

4.2 Comparison to eMLR

432

We now compare the results from our statistical approach with those obtained us-

433

ing the eMLR approach. We ran an eMLR analysis with potential temperature and prac-

434

tical salinity to estimate the increase of anthropogenic carbon in the study region using

435

GLODAP data from the first two years (1993-1994) and the last two years (2015-2016);

436

spatial plots of the data used are shown in Figure S3 of the Supplemental Information.

437

The eMLR anthropogenic increase over 24 years was estimated for layer 1, layer 2 and

438

layer 3 to be 0.52, 0.49, and 0.47 µmol kg⁻¹ year⁻¹, respectively. These eMLR results

439

agreed generally with other eMLR results from Friis et al. (2005), who reported results

440

between 0.4 and 1.6 µmol kg⁻¹year⁻¹ for transects in the same overall region of the

441

North Atlantic.

442

We then compared these eMLR results with our excess carbon results from dynamic

443

linear regression; the rates of increase for layer 1, layer 2 and layer 3 were 0.63±0.59, 0.57±0.11,

444

and 0.47±0.11 µmol kg⁻¹year⁻¹, respectively (from Figure 5). For layer 3, both the

445

eMLR and dynamic linear regression report the same rate of increase of 0.47µmol kg⁻¹ year⁻¹.

446

For layers 1 and 2, our dynamic linear regression results gave faster increases than than

447

the eMLR results, but the eMLR results are within our confidence intervals. Our method

448

produces roughly equivalent rates of overall anthropogenic increase as eMLR, but are

449

resolved on a monthly time scale providing information on climate-related changes in DIC

450

over time.

451

4.3 Column Inventories of Carbon

452

We converted our estimate of excess carbon concentration ( ˆC_t^ein units ofµmol kg⁻¹)

453

into column inventories of anthropogenic carbon (It), i.e. depth-integrated carbon per

454

(23)

Accepted Article

unit area (in units of mol m⁻²). The purpose was to facilitate comparison with other

455

values in the literature, which are often reported as column inventories. The conversion

456

I_t =ρ H Cˆ_t^e used the average density of seawaterρ=f(T, S, P) and the layer thick-

457

ness (H). For layer 3, which has observations from 1000-4500m, its layer thickness was

458

calculated with an an ocean bottom depth of 3500m, which was the median bottom depth

459

associated with GLODAP observations. The conversion toIt produced a time series of

460

mean column inventory in our study region that looks the same as Figure 5. The dif-

461

ference between the maximum and mininimum values gave estimates for the excess car-

462

bon storage per square meter for each layer: 5.55mol m⁻² for layer 1 (0-200m), 13.86mol m⁻²

463

for layer 2 (200-1000m), and 20.63mol m⁻²for layer 3 (1000-3500m). The time series

464

for the three depth-layers were then added together and the difference between its max-

465

imum and minimum values gave an overall excess carbon storage of 35.41 mol m⁻²through

466

the whole water column over the period 1993 to 2015, corresponding to an average stor-

467

age rate of 1.54 mol m⁻²year⁻¹. A conservative storage rate of 1.37±0.57 mol m⁻² year⁻¹

468

was estimated as the slope (±95% confidence interval) of a linear generalized least squares

469

regression through the time series for the whole water column (i.e. the combined layers).

470

Note, this water column estimate is strongly influenced by the excess carbon time se-

471

ries in layer 3 due to its very large layer thickness.

472

Our estimated storage rate of excess carbon of 1.37±0.57 mol m⁻²year⁻¹, when

473

considering the width of its confidence interval, is consistent with the average storage

474

rate of 1 mol m⁻² year⁻¹ approximated from Figure 3A in Gruber et al. (2019) for our

475

study region. Our confidence interval also overlaps with the reported estimate and un-

476

certainty of 0.97±0.34 mol m⁻²year⁻¹ for DIC storage rate in the subpolar northwest

477

Atlantic above 2000m (Tanhua & Keeling, 2012), as well as those for the Irminger Sea,

478

where Fr¨ob et al. (2018) reports the increase of anthropogenic carbon at 1.84 ±0.16 mol m⁻² year⁻¹.

479

The time spans of study were similar: 1994-2007 (Gruber et al., 2019), 1980-2010 (Tanhua

480

& Keeling, 2012), 1991-2015 (Fr¨ob et al., 2018) and 1993-2015 for our study. Differences

481

might be accounted by our data pre-processing step of spatially and monthly averaging

482

(24)

Accepted Article

data across a large region spanning both the Labrador Sea and Irminger Sea, while Gruber

483

et al. (2019) and Fr¨ob et al. (2018) analysed column inventories on a smaller spatial grid.

484

While spatial sampling bias across the large study region may affect our estimates, it nonethe-

485

less serves to demonstrate the analysis method while highlighting and mitigating data

486

availability complications, such as seasonal sampling bias.

487

5 Discussion and Conclusions

488

Our time series generalization of the eMLR method is based on a dynamic linear

489

regression (a state space model) that incorporates time dependence, accounts for irreg-

490

ular temporal sampling of data, and provides for an assessment of estimation errors based

491

on observation properties. We then analyzed the temporal trends and variability of DIC

492

on a monthly basis and estimated the anthropogenic increase of DIC in a manner that

493

accounts for the strong seasonal sampling bias of the input data. We improved the tem-

494

poral resolution of anthropogenic carbon estimates from being a simple difference be-

495

tween two widely separated time periods with eMLR (Carter et al., 2017; Friis et al., 2005)

496

to providing DIC estimates on a monthly time scale. This allows the opportunity to look

497

at inter-annual variability and elucidate connections between excess carbon uptake and

498

climate forcing or circulation changes. We were also able to present improved estimates

499

of the anthropogenic component of DIC in the surface layer by accounting for the strong

500

seasonal variations, which usually interfere.

501

For the northwest Atlantic we produced monthly time series estimates of DIC, with

502

uncertainties, including estimates for the seasonal cycle, natural variability and excess

503

anthropogenic carbon components. The annual increase of excess carbon due to anthro-

504

pogenic sources since 2000 was estimated at∼0.57µmol kg⁻¹year⁻¹ for all depth-layers.

505

Our estimated storage rate, through the full water column (0-3500m), of anthropogenic

506

carbon was 1.37±0.57 mol m⁻²year⁻¹, which seems consistent with the 1 mol m⁻²year⁻¹

507

shown in Figure 3A of Gruber et al. (2019) for our study region.

508

(25)

Accepted Article

While we summarize our results with linear trends it brings into question whether

509

a linear trend could be used directly in (3) to model anthropogenic carbon instead of a

510

correlated random walk (i.e. the time-varying intercept termβ_0,t). Though a linear trend

511

in the model would provide for similar estimates of the rate of anthropogenic carbon in-

512

crease, the correlated random walk provides a framework that allows the state variable

513

to follow the data through its inter-annual variations while not constraining the data to

514

follow a straight line. As seen in Figure 5c, the trend is not linear for the whole time frame

515

of analysis, with an increase in excess carbon after 2000 but not before. The flexibility

516

of the correlated random walk framework also can show excess carbon variations at sub-

517

seasonal time scales.

518

Our excess carbon component describes most of the anthropogenic change of DIC

519

in the northwest Atlantic, but potentially not all of it. Global ocean temperatures are

520

warming and salinity in the subpolar North Atlantic is freshening (Sathyanarayanan et

521

al., 2021). These anthropogenic changes in temperature and salinity can then influence

522

what is here defined as the natural variability component of carbon. Our natural vari-

523

ability component for layer 3 (Figure 4c) shows an increasing trend starting at the year

524

2000 that may reflect this anthropogenic influence, and could even be associated with

525

the ‘warming hole’ of ocean temperature and changes in deep convection in the North

526

Atlantic (Drijfhout et al., 2012). Hence, a benefit of our method is that it allows for the

527

distinction between the anthropogenic change in ocean properties (i.e. temperature and

528

salinity) and anthropogenic changes in DIC due to increasing CO₂in the atmosphere.

529

In contrast, Gruber et al. (2019) subtracted the non-steady state net flux of natural CO₂

530

associated with anthropogenic climate change such as ocean warming. Meanwhile, Fr¨ob

531

et al. (2018) estimated the natural DIC component in the Irminger Sea to have had a

532

declining trend until 2015. This distinction of different sources of anthropogenic change

533

may be important in the future because they might even drive opposing changes. For

534

example, in the near-surface, the anthropogenic increase of atmospheric pCO₂(partial

535

pressure of CO₂) will increase ocean carbon, while warming ocean temperatures due to

536