• Tidak ada hasil yang ditemukan

Bayesian Analysis of Electrochemical Impedance Spectroscopy Data Using the Distribution of Relaxation Times

N/A
N/A
arise sambolangi

Academic year: 2024

Membagikan " Bayesian Analysis of Electrochemical Impedance Spectroscopy Data Using the Distribution of Relaxation Times"

Copied!
17
0
0

Teks penuh

(1)

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/276169384

Analysis of Electrochemical Impedance Spectroscopy Data Using the Distribution of Relaxation Times: A Bayesian and Hierarchical Bayesian Approach

Article  in  Electrochimica Acta · June 2015

DOI: 10.1016/j.electacta.2015.03.123

CITATIONS

294

READS

5,226 2 authors:

Francesco Ciucci University of Bayreuth 258PUBLICATIONS   11,399CITATIONS   

SEE PROFILE

Chi Chen Microsoft

101PUBLICATIONS   7,394CITATIONS    SEE PROFILE

All content following this page was uploaded by Chi Chen on 08 July 2018.

The user has requested enhancement of the downloaded file.

(2)

Contents lists available atScienceDirect

Electrochimica Acta

j o u r n a l h o m e p a g e :w w w . e l s e v i e r . c o m / l o c a t e / e l e c t a c t a

Analysis of Electrochemical Impedance Spectroscopy Data Using the Distribution of Relaxation Times: A Bayesian and Hierarchical

Bayesian Approach

Francesco Ciucci

a,b,∗

, Chi Chen

a

aDepartment of Mechanical and Aerospace Engineering, The Hong Kong University of Science and Technology, Hong Kong, China

bDepartment of Chemical and Biomolecular Engineering, The Hong Kong University of Science and Technology, Hong Kong, China

a r t i c l e i n f o

Article history:

Received 14 January 2015

Received in revised form 16 March 2015 Accepted 17 March 2015

Available online 20 March 2015

Keywords:

Electrochemical Impedance Spectroscopy Distribution of Relaxation Times Ridge/Tikhonov Regularization Bayesian Statistics

Lithium-ion Batteries

a b s t r a c t

Electrochemical impedance spectroscopy (EIS) is one of the most important experimental techniques employed in electrochemistry because it can be used to deconvolve physico-chemical phenomena occur- ring at disparate timescales. Unfortunately, the analysis of EIS data is frequently challenging since it can require the selection ofad hocequivalent circuits. The distribution of relaxation times (DRT) method is complementary to the approach of fitting equivalent circuits because the DRT maps the EIS data onto a function containing the timescale characteristics of the system under study. While conceptually sim- ple, the DRT cannot be obtained by simple minimization of the least squares because the corresponding optimization problem is ill posed. Regularization methods, such as ridge/Tikhonov or Lasso regression, add a penalty term to the least squares minimization problem enabling the DRT deconvolution. In this work, we show that such regularization methods may be understood in a Bayesian context. For example, ridge/Tikhonov regression implicitly encapsulates the prior insight that the derivatives of the DRT are regular. We use this Bayesian approach as a starting point to extend the DRT regularization by considering frequency dependent oscillation levels. This approach is shown to be more robust with respect to both discontinuities and over smoothing than typical regularized DRT methods. Furthermore, the Bayesian approach is versatile and may be extended to include more informative priors.

© 2015 Elsevier Ltd. All rights reserved.

1. Introduction

Electrochemical impedance spectroscopy (EIS) is one of the key techniques used in electrochemistry[1–5]and has been utilized in many areas including fuel cells[6–10], batteries[11–14], sen- sors[15–17], capacitors[18], dielectrics[19,20], electrochemical coating[21,22], imaging[23,24], and biology[25,26], just to list a few applications. The EIS is particularly useful in these fields because it is conducted over a broad range of frequencies allowing the deconvolution of physic-chemical phenomena characterized by disparate timescales[27].

The EIS data are acquired by applying small voltage (or current) perturbation to an electrochemical system so as to measure the corresponding current (or voltage)[2]. This is repeated at various frequencies to obtain the EIS spectrum. The latter is typically understood as the ratio between the voltage and the current in frequency space[5]. Namely, it is a complex-valued function defined as the Fourier transform of the potentialv(t) divided by

Corresponding author. Tel.: +85297232394; fax: +85223581543.

E-mail address:[email protected](F. Ciucci).

the Fourier transform of the currenti(t) Z(f)=V(f)

I(f) (1)

where V(f)=

[v(t)] (f) and I(f)=

[i(t)] (f) and where we take the unitary Fourier transform definition:

[h(t)] (f)=

−∞h(t) exp (−2ift)dt. The experimental impedance is then used to understand the physico-chemical properties of the system under study. For this purpose having a reliable model is critical because it aids the experimental data interpretation. Typical EIS models consist of a collection of elementary circuits, e.g., resistors, capacitors, constant phase elements, and Warburg circuits placed in series or in parallel. In spite of notable exceptions[27–38], the circuits are often selectedad hocso as to follow physical intuition and the principle of parsimony [27,39,40]. Furthermore, such equivalent circuits may not be unique, in that several of them may fit the data equally well. This comes to the detriment of the physical insight that one can obtain from the EIS experiments.

One way to bypass the lack of uniqueness of problem-specific equivalent circuits (and therefore complement the analysis of the EIS) is to use the distribution of relaxation times (DRT) method http://dx.doi.org/10.1016/j.electacta.2015.03.123

0013-4686/© 2015 Elsevier Ltd. All rights reserved.

(3)

[10,14,41–54], which models the impedance asZDRT(x, f), wherex is an unknown and possibly large vector. In turnxcan be mapped onto the relaxation characteristics that characterize the electro- chemical system under study. The entries of xare obtained by minimizing the sum of the absolute value of the residuals com- puted between the experimental dataZexp(f) andZDRT(x, f) at the experimental frequenciesfn

S(x)=

N n=1

wn

Zexp (fn)−ZDRT (x, fn)

2

+wn

Zexp (fn)−ZDRT (x, fn)

2

(2) where Z and Z indicate the real and imaginary part of the impedance respectively andwn and wn are suitable weights. If xhas size close to N, the minimization of (2) is ill posed yield- ing solutions highly dependent on the experimental error. There are a number of ways to circumvent this problem. Researchers have employed Fourier transformation and filtering[41,47], Monte Carlo techniques[43,44], maximum entropy methods[55–57], and advanced evolutionary programming[50–52,58]. One particularly popular method consists in minimizing the following expression [10,14,59–66]:

S(x)=

N n=1

wn

Zexp (fn)−ZDRT (x, fn)

2

+wn

Zexp (fn)−ZDRT (x, fn)

2

+P(x) (3)

where the second term on the left hand side is the product of a functionP(x), a penalty, and a positive parameter. The penalty can be, as in ridge (or Tikhonov) regression, the norm of the second derivatives of the DRT obtained fromx.

In this article we aim at answering two open questions regarding DRT analysis:

1. Can the penalty term in the minimization problem (3) be under- stood using statistics?

2. Can we find a statistically motivated method to extend ridge DRT so that the level of regularization (as expressed byin (3)) can vary across the timescales?

We show that the regularized DRT can be derived from Bayesian statistics arguments[67]. In other words, the termP(x) in (3) encapsulates the prior physical information available on the DRT.

For example, ridge regression provides the (prior) information that the qth-order derivative of the DRT is distributed as a Gaussian ran- dom variable with standard deviation1

. Therefore, the smaller is, the larger the oscillations are expected to be. Conversely, a large implies that one expects much smaller oscillations. This answers the first question.

The simple Bayesian approach outlined above assumes, how- ever, that the level of regularization is uniform throughout the entire frequency spectrum. Equivalently, the prior implies that the same oscillation rates in the DRT are expected to occur across all timescales. In order to select a localand yet take only a limited number of tunable parameters, a hierarchy of Bayesian priors need to be used[68]. In this hierarchical approach (we have a prior of the prior itself), the DRT and the optimal penalty level are found simultaneously. This addresses the second question.

More broadly, the Bayesian framework proposed in this arti- cle serves as a starting point for extending DRT regression and for improving the interpretation of the DRT spectra.

2. Theory

2.1. The DRT Method

As outlined in the introduction, the DRT method assumes that the response of the electrochemical system under study is obtained from a distribution of relaxations (see Appendix A for details). Thus, the impedance can be written as

ZDRT(f)=R+

0

g()

1+i2fd (4)

where R and g() are both non negative and where the DRT subscript is used to emphasize that

1[ZDRT(f)] (t) is a sum of decaying exponentials. Since many electrochemical experiments are conducted with a given number of points per decade, the (4) can be more conveniently rewritten as

ZDRT(f)=R+

−∞

(ln)

1+i2fdln (5)

where(ln)=g()≥0. We will use (5) in the remainder of the article.

The main goal of the DRT analysis is to obtain an estimate of (ln). In order to do that, we first need to approximate(ln) and ZDRT(f) using a suitable discretization. Subsequently, we estimate the discrete approximation using regression. The discretization of (ln) can be obtained by expanding the DRT over a given finite basisB= 1(ln), 2(ln), . . ., M(ln)

as[69]

(ln)=

M m=1

xm m(ln)+ediscr(ln) (6)

where thexm’sare scalars and whereediscr(ln) is the discreti- zation error. The latter depends on the basisBchosen and on the particular function(ln). By plugging (6) into (5), we can write the following vector equation

ZDRT=R1+Ax+iAx+eapprox (7)

where (ZDRT)n=ZDRT(fn) with 1≤n≤N,1is a vector withNentries all equal to 1, x=(x1, x2, . . ., xM)T, A and A are real N×M matrices, and (eapprox)nis the error made in approximating (at the frequencyfn) the DRT (5) using the first 3 terms on the right hand side of equation(7). We emphasize thatM, the dimension of the basis, andN, the total number of experimental points, need not be identical. Further, we will consider that our model is simply ZDRT=R1+ A x+iA x. The xof the expansion (6) can then be obtained via regularized regression by solving the following problem with respect to x[53]

x=argmin

x≥0

R1+AxZexp

2

+

AxZexp

2+P(x)

(8)

The first two terms are the sum of the residuals (the measure of the distance between the data and the model) weighted in accor- dance to the matricesandi. If we setP(x)=

L x

2, whereLis a suitable qthorder differentiation matrix, we obtain the ridge DRT.

i In reference to (2), =diag

1

w 1

,

1

w 2

, . . .,

1

w N

= diag

1

w 1

,

1

w 2

, . . .,

1

w N

.

(4)

If insteadP(x)= x1=

M m=1

|xm|, we recover the Lasso DRT[53]. As it will be discussed in the following Section, the penalty term can be understood as a Bayesian prior on the DRT.

2.2. A Bayesian Perspective on DRT

We will suppose that a given EIS experiment is a realization of the following stochastic process:

Zexp=R1+Ax+iAx+E+iE (9) where Eand Eare independent random variables. The center- piece of our analysis is Bayes formula, which, in the context of DRT, can be written as[67,70]

p

x|Zexp,Zexp

p

Zexp,Zexp

=p(x)p

Zexp,Zexp|x

(10) wherep(·) is a probability distribution function (pdf) of the cor- responding random variable in brackets and where the symbol| indicates “conditioned to” or “given”. For example,p

Zexp,Zexp|x

(the likelihood) is the pdf of experimental outcome given xor conditioned to a certainxor equivalently assigned an approxima- tion of(ln). The logic of this Bayesian analysis is illustrated in Fig. 1. In contrast to the frequentist approach, xis interpreted as a random variable rather than a fixed true value that needs to be estimated. To that end, we will aim at constructing the pdf of x given theZexp=Zexp+iZexp(or equivalently givenZexp,Zexp) and the prior information available to the experimenter as provided by the pdfp(x). We will assume that E and E are normally distributed, namelyEN

0,

andEN

0,

. If we further assume that the all entries of Eand Eare independent, then andare diagonal matrices, i.e.,˙=diag

21, 22, . . ., 2N

and

˙=diag

21, 22, . . ., 2N

. Furthermore, we can easily find the likelihood by noting that the residuals on the real and on the

imaginary part of the DRT model impedance,rjandrjrespectively, are normally distributed; specifically,

rn =R+

AxZexp

n=EnN

0, 2n

(11a) rn=

AxZexp

n=ENN

0, 2n

(11b) The last two equations give the likelihood, the pdf of the experi- mental outcome given the model parameters (or equivalently given x), which can be written as

p

Zexp,Zexp|x

=p

Zexp|x

p

Zexp|x

∝exp

−1 2

N n=1

r2n 2n−1

2

N n=1

r2n 2n

(12)

We note that (12) is the typical starting point of maximum likelihood estimation and nonlinear least squares regression as applied in the analysis of EIS equivalent circuits, see Fig. 1.

This is because by maximizing (12) one recovers the least square problem (2) since=diag

1 1,1

2

, . . .,1

N

and = diag

1 1,1

2

, . . .,1

N

.The prior p(x) requires some subjective input, which can be based on our physical insight. Therefore, we can postulate that all entries of xare non-negative, see Appendix A, and that adjacent values ofxdo not vary too much. For exam- ple, if we use a piecewise linear approximation these requirements can be enforced by assuming thatxj≈xj1. This may be written as xj−xj1=jfor 1 <j≤Nwherejis some random error. By noting that dlnd

xj+xj−1

2

xlnj−xj−1 = ln1j, we can also write that the first order derivative approximation is distributed as a selected random variable. For example, we can assume thatln1jis a Gaussian ran- dom variable with mean zero and some fixed standard deviation.

In general, we can take that the approximationLx of the qth order derivative of(ln) is a random variable with prescribed pdf, e.g. (L x)jN

0,1

. This assumption implies that we do not foresee

Fig. 1.Schematic depiction of the ridge and hierarchical DRT regression in relation to Bayesian statistics.

(5)

that the absolute value of the qthorder derivative will be greater than3

(this will occur with <1% probability). If we further assume that the DRT response cannot be negative (this comes from phys- ical arguments as shown in Appendix A), then we can assign the following prior to x

p(x)∝1(x≥0) exp

− 2

L x

2

(13) where1(x≥0) is the indicator function (1 if allxj≥0 and 0 other- wise)[70]. This insight gives the posterior density, i.e., the expected pdf ofxgiven our (prior) physical insight and the experimental data Zexp=Zexp+iZexp. In short we obtain that, seeFig. 1,

p

x|Zexp,Zexp

∝1(x≥0)

exp

−1 2

N n=1

r2n 2n −1

2

N n=1

r2n 2n

2

L x

2

(14)

We can conveniently rewrite (14) as p

x|Zexp,Zexp

∝1(x≥0) exp

−1

2ridge(x, )

(15) whereridge(x,) is the posterior negative log-likelihood (we use the subscript “ridge” to emphasize that this is ridge regression) given by

ridge(x, )=

N n=1

r2n 2n +

N n=1

r2n

2n +

L x

2 (16) If we maximize the posterior likelihoodp

x|Zexp,Zexp

with the constraint that x≥0, then we obtain xMAP, the maximuma posteriori(MAP) estimate of x[71]. Naïvely, the xMAPcan be inter- preted as the “most” likely x, given the data Zexp, and the prior physical knowledge expressed inp(x). We note that the maximiza- tion of the posterior (15) is equivalent to the minimization of the negative log-likelihood (16) multiplied by two. In turn, this mini- mization corresponds to the typical ridge regression problem (8) xMAPridge=argmin

x≥0 ridge(x, ) (17)

We note thatis fixed as a consequence of having chosen the prior (13). We also note that the largeris the more centered towards 0 the norm ofLxwill be. In other words, since we expect small qthderivatives, this prior information (or “belief”) will reflect into a flattened estimatedxMAPridge. This prior will also likely imply a large bias with respect to the ideal x. Conversely, a small implicitly allows the derivativesLxto vary quite broadly. Hence, we expect thatxMAPridgewill be affected by significant noise. This is consistent with the bias-variance trade off reported earlier[53].

Choosing a different prior can give substantially different results. For example we can set that each element of x is non-negative and that it is distributed according to a Laplace dis- tribution, i.e.,xj∼1

xj≥0

Laplace

0,21

, whose pdf isp

xj

=

2exp

2|xj|

forxj≥0 and 0 otherwise. Such distribution is char- acterized by a much “fatter” tail than the Gaussian since exp (−|x|) decays to 0 more slowly than exp

x22

. This choice gives the following negative log-likelihood (multiplied by two)[70,72]

Lasso(x, )=

N n=1

r2n 2n +

N n=1

r2n

2n +x1 (18)

We have used the subscript Lasso to emphasize that this neg- ative log-likelihood is connected to the Lasso regression since the corresponding MAP estimate is obtained from

xMAPLasso=argmin

x≥0 Lasso(x, ) (19)

We note that many other penalties may be found depending on the particular prior information one wishes to assign to the DRT.

2.3. A Hierarchical Bayesian DRT Analysis

In this section we will relax the hypothesis that the regulariza- tion level is uniform across all timescales and concurrently we will seek to keep a minimal amount of tunable parameters. First, the scalarwill be replaced by the vector. Second, we will assume thatis a random vector (endowed with its own prior). We note that this approach is hierarchical in nature since the prior on x depends on, which in turn has its own prior. The goal will be to determine the pdf of bothxand. As above the experimental data Zexp=Zexp+iZexpis given and the starting point of the analysis is Bayes theorem[67,71,73]:

p

x,|Zexp,Zexp

p

Zexp,Zexp

=p(x,)p

Zexp,Zexp|x

(20) We stress again that the main conceptual difference between (10) and (20) is that we consider also the pdf of, seeFig. 1. Up to a multiplicative constant, (20) can be rewritten as[67]

p

x,|Zexp,Zexp

∝p(x,)p

Zexp,Zexp|x

(21) Furthermore, the following holds:

p(x,)=p(x|)p() (22) We take a Gaussian prior on x, that isL xN

0,1

where = diag(␭)iito obtain[68]

p(x|)∝1(x≥0)

M

j=1

j 2exp

j 2(L x)2j

(23)

We note that, in contrast with (13), in (23) the expected level of noise on the qthderivative depends on the timescale.

We choose that all entries ofare independent and identically distributed with a set of parameters, generically identified by a vector, so that

p()=

M

j=1

1

j≥0

pHP

j,

(24)

wherepHP

j,

is a given prior pdf. Using an analogous procedure as the one that leads to (16), we can obtain the hyper parametric negative log-likelihood (multiplied by two)

hyperridge (x,)=

N n=1

r2n 2n+

N n=1

r2n 2n+Lx2

+

M

j=1

−2 lnpHP

j,

−lnj

(25)

ii IfLthe derivative matrix of orderp, then it has dimensionsM×Mand the matrix

= diag() is diagonal such that ()ij=iıijwhereıijis the Kronecker delta. We note that if a finite different approach is used thenM=M−q, and if radial basis functions are used thenM=M.

(6)

where L=1/2Land=diag

1, 2, . . ., M

. The MAP vector xMAPmay be determined along with the hyper parametersas the solution of the following problem[74]:

xmin0&0hyperridge (x,) (26)

The solution of (26) can be obtained by alternate minimization with respect toand x[68]. In other words we use the itera- tive routines outlined in Algorithm 1, where the loop is interrupted when either a given maximum number of iterations is reached or the increment on the iterated xis less than a prescribed amount.

Algorithm 1Algorithm for the alternate minimization used in hierarchical DRT.

k= 0; continue loop = 1;x0=xguess; while continue loop

=argmin

≥0

hyperridge(xk,) xk+1=argmin

x≥0

hyperridge

x,

if (k>kmax&xk+1xk< eps) continue loop = 0; end k=k+ 1

end

It is important to note that the Algorithm 1 is particularly con- venient if the optimal␭* can be obtained analytically from

hyperridge(xk,)=0 (27)

by choosing a suitable hyperprior[68,75]. As derived in Appendix B exponential, Gaussian, gamma, and inverse gamma, give the analyt- ical expressions reported inTable 1. Therefore, the computational

Table 1

List of the hyperprior distributions used and correspondingjcomputed by solving (27). Note that for the exponential and Gaussian priors= , while for the gamma and inverse gamma hyperpriors=

, ˇ

T

.

Hyperprior DistributionpHP

j,

Optimalj

Exponential 1

j0

2exp

2j

1

(L x)2j+

Gaussian 1

j0

2exp

22j

(L x)4j+8 −(L x)2j 4

Gamma 1

j0

ˇ 2−1

j exp

2j

ˇ−1

(L x)2j+

Inverse Gamma 1

j0

ˇ 2−1

j exp

2

j

(ˇ+1)2+4 (L x)2j(ˇ+1)

2(L x)2j

effort necessary to solve one hyperparametric iteration is similar to that of solving a single ridge regression minimization (17), which is a standard quadratic programming problem.

3. Results and Discussion 3.1. Synthetic Experiments

We tested the hyperparametric approach using carefully con- trolled stochastic experiments. In particular, we chose equivalent circuits with analytical DRT and compared the outcome of the stochastic experiments as a function of the hyperprior employed, the regularization level, and the error structure. We used both

Fig. 2.Noiseless impedance of the ZARC circuit, panel (a), and corresponding DRT, panel (c). Piecewise linear DRT, panel (d), and corresponding noiseless impedance, panel (b).

(7)

Table 2

Parameters of the “exact” impedance models.

Parameter Numerical Value

R 10

Rct 50

Rp 50

0 0.01 s

1 1 s

2 10 s

0.7

smooth and piecewise constant(ln) to illustrate the proposed method. In particular, we employed a ZARC model consisting of a CPE element placed in parallel to a resistor with resistanceRctand in series with a resistorR. Its impedance is given by[60]

Z(f)=R+ Rct

1+(i2f0)

(28)

withR,Rct,0> 0 and 0 <≤1. The() of (28) has the following analytical form[60]

()= Rct

2

0sin ((1−)) cosh

ln

0

−cos ((1−)) (29)

The correspondingZ(f) and() are shown inFig. 2where the parameters used for plotting are listed inTable 2.

We also simulated a piecewise constant DRT characterized by the following impedance response:

Z(f)=R+ Rp

ln2

1

ln

1− i 2f1

−ln

1− i 2f2

(30) with1,2> 0 andRp> 0. The corresponding() has the following analytical form

()= Rp

ln2

1

(H(−1)−H(−2)) (31)

whereH() is the Heaviside function.

3.1.1. The ZARC Circuit

We first studied the ZARC circuit simulated using the parameters listed inTable 2. We chose 5 points per decade from 102to 106 Hz. Furthermore, we considered simulated experiments obtained using three different error models, namely

Zexpabs(f)=Z(f)+εabs

Z(f)

E+iE

(32a) Zexpprop(f)=Z(f)+εprop

Z(f)E+iZ(f)E

(32b) Zexpunif(f)=Z(f)+εunif

E+iE

(32c) whereE, EN(0,1). The subscript exp indicates that these are stochastic experiments and the superscript abs, prop, and unif reflect different error structures. These error structures are also connected to distinct weights 1

n2 and 1

n2 in the likelihoods as shown by (11) and (12). In order to roughly ensure the same error

Fig. 3.Outcome of the 1000 stochastic impedance experiments (ZARC circuit) shown at the select frequencies highlighted inFig. 2, panel (a). The error models (32 a), (32 b), and (32 c), correspond to panels (a), (b) and (c) respectively.

(8)

(a) (b)

(c)

Fig. 4.Average distance between the exact and deconvolved DRT shown with its oneconfidence band. Panel (a), (b), and (c) correspond to abs, Re-Im and uniform errors respectively.

(a) (b)

(c) (d)

Fig. 5.Average normalized error as a function of0(Table 3) and corresponding oneband for error model (32 a). Each panel corresponds to a different hyperprior pdf, namely exponential (a), Gaussian (b), gamma (c), and inverse gamma (d).

(9)

level we set thatεabs

Z

=εprop

Z

+

Z

=εunif, where the bracket·indicates the average over the considered frequency span. The outcome of 1000 stochastic experiments withεabs=1001 , is shown as a cloud of points inFig. 3 where only the data at the frequencies highlighted inFig. 2(a) are reported. One imme- diately notices that different structures give qualitatively distinct simulated EIS experiments, which in turn give different stochastic realizations around the noiseless impedanceZ(f). As reported in Fig. 3(a), the cloud of points forZexpabs(f) is wider at lower frequencies because

Z(f)

is large. Additionally, the simulated impedance real- izations are scattered in a circle around the noiselessZ(f) since the errors on the real and imaginary part have an identical distribution.

TheZexpprop(f),Fig. 3(b), gives a cloud of points that is preferentially elongated along the real axis. This is becauseZ(f) is greater than Z(f). Additionally, as the real part ofZ(f) decreases with frequency the size of the cloud diminishes. TheZexpunif(f) is instead character- ized by a constant error level. Hence, as shown inFig. 3(c), the cloud of simulated experiments scattered around the exactZ(f) have all identical radius.

In order to test the performance of the various algorithms for a sufficiently large number of simulated experiments, we first drew 1000 realizations of (32). Subsequently, we performed regression by solving (8) for each stochastic experiment and compared each obtained DRT with an exact DRT. We repeated the same proce- dure for regularization parametersin the range from 105to 1. In order to uniform the presentation, we chose a piecewise linear basis withN=M[53]. Furthermore, we setandin (8) to be iden- tity matrices andLto be the second order differentiation matrix.

We monitored the residual as a function of. The latter is defined

as the distance between the exact DRT taken at discrete locations =((ln1), (ln2), . . ., (lnm))Tand the regressed DRTxMAP normalized with respect to the norm of the exact:

r()=

xMAP

(33)

We note that studying this residual implies a blended frequentist-Bayesian approach because we use xMAP as a point estimate rather than a random variable. This is, however, moti- vated by the need of testing the performance of the method rather than just finding the Bayesian posterior given one single simulated experiment.

Fig. 4shows the meanr’s (solid lines) along with their variability (grey regions) obtained using the three model experiments (32). It is clear that allr’s are qualitatively similar. At smallthe errors are large and characterized by large variance (large grey area). For large ‘s the observed errors are equally large but they are character- ized by lower variance (small grey area) indicating that the other component, the bias, is large. For a suitable bias-variance trade- off obtained at intermediatethe relative error is minimized. The same conclusions can be obtained for the three error models (32) as shown in the panels ofFig. 4. These results are consistent with the conclusions obtained in previous work[53].

We also deconvolved the same synthetic data using the hyper- parametric approach. In other words, we solved (26), where in (25) we setn=n=1, and where we used hyperpriors with exponen- tial, Gaussian, gamma (ˇ= 1.5), and inverse gamma (ˇ= 1) pdfs as described in Appendix B and listed inTable 1. These correspond to thej‘s listed inTable 3. We varied the nominal regularization level

(a) (b)

(c) (d)

Fig. 6.Plots analogous toFig. 5for synthetic experiments sampled with (32 b).

(10)

(a) (b)

(c) (d)

Fig. 7.Plots analogous toFig. 5for synthetic experiments obtained using (32 c).

0= lim

(Lx)2j0

j, in the range from 10−5to 1 .We then plottedr(0), the normalized distance between the regressed DRT and the exact (ln), inFig. 5,Fig. 6, andFig. 7, where each figure displays anal- ogous quantities for the different error structures. Interestingly, we obtained that while exponential, Gaussian, gamma, and inverse gamma hyperpriors behave in a similar manner as the regular ridge

Table 3

Asymptotes ofjfor the hyperprior pdfs studied and corresponding conditions in square brackets.

Hyperprior 0

Exponential 1 1

(L x)2j

(L x)2j

(L x)2j

Gaussian 1

2

1 (L x)2j

(L x)4j4

(L x)4j4

Gamma ˇ−1 ˇ−1

(L x)2j

ˇ1

(L x)2j

ˇ1

(L x)2j

Inverse Gamma ˇ+1

(Lx)j

(L x)2j(ˇ+1)4 2

(L x)2j(ˇ+1)4 2

DRT for low0‘s, they all lead to an abatement of the total error at large0‘s. In other words, the hyperparametric approach applied to the ZARC element does not lead to variance reduction (low reg- ularization levels) but it effectively damps out the effect of bias (high regularization levels). Even for large0‘s the bias decreases and yet the variance is comparable to the optimal DRT. In fact, the size of the grey confidence band is not as small as in the regular ridge method, where the variance nears zero. In essence, the effect of the hyperparameters is to widen the0‘s optimality window. As shown inFig. 5,Fig. 6, andFig. 7this seems to apply to all the error models considered and the hyperparametric approach appears to consistently lower the errors due to bias. In order to complete this discussion, we also illustrate a qualitative comparison of the average MAP estimates against the regular ridge DRT for large reg- ularization parameters. As shown inFig. 8(a) and (b), for= 10−1 and= 1 respectively, theestimated using ridge regression has smaller derivatives and a more pronounced bias with respect to the underlying DRT. Under the same conditions the hyperparametric, Fig. 8(c) and (d), is closer to the original DRT showing better bias control. It is important to note that, while the results shown inFig. 8 were computed using the simulated experiment (32 c), we expect to draw similar conclusions for all the synthetic experiments (32).

Among the various regularization hyperpriors, the ones with Gaussian, exponential, and gamma pdfs appear to perform bet- ter than the inverse gamma. Incidentally, we note that both the exponential and Gaussian j ‘s depend on a single parameter,

(11)

(a) (b)

(c) (d)

Fig. 8.Panels (a) and (b) show the averaged ridge DRT computed far from the optimum at= 10−1(a) and= 1 (b). Analogous plots for the hyperparametric case with 0= 10−1(c) and0= 1 (d).

while thej ‘s obtained using a gamma hyperprior depend on a set of two parameters=

ˇ

T

. We also note that, as shown in Appendix B and reported inTable 1, the gamma hyperprior withˇ= 2 coincides with its exponential counterpart. Due to its increased tunability and overall good performance, in the remain- der of the paper we will focus on the hyperprior with gamma pdf.

This choice will allow us to study the relative importance of the derivative and the fixed component of the optimal regularization level. In particular, we can write the optimalj(Table 1) as

j = 1

1

ˇ1(L x)2j +10 (34)

where0=ˇ1 (Table 3). It is clear that thej is obtained by weighing the qthderivative term along with the level of nominal regularization0and that in generalj0. Further, if qthderiva- tives are small, i.e., (L x)2j ˇ−10 (seeTable 2), then the expected regularization level isj0, giving a regular ridge weight. Con- versely, if the derivatives are large, i.e., (L x)2j ˇ01, thenjdrops toj(L x)ˇ12

j

, under-smoothing the solution. This implies that if we fix0, we will be able to switch between nominal and under smoothing by simply modifyingˇ. In particular,ˇ≈1 generally undersmoothes the DRT, giving large variance, whileˇ1 leads to results analogous to ridge DRT. For small enough0, the regular- ization will be such that the termˇ11(L x)2j is small, again leading

to a DRT analogous to the ridge DRT. This is exemplified inFigs. 5–7 (a) & (c) for0≤10−2.

3.1.2. Circuit with Piecewise Constant DRT

We also performed synthetic experiments with an exact impedanceZ(f) given by (30) and shown in the Nyquist form inFig. 2 (b). This corresponds to the piecewise constant() reported in Fig. 2(d) (we recall thatTable 2contains the parameters used). We stress that we used this type of impedance because its DRT cannot be obtained by ridge regression. In fact, by imposing a penalty on the qthderivative, ridge regression requires that the corresponding DRT is at least differentiable up to order q. We used the model error (32 a) with the frequencies ranging from 10−3to 103Hz with 20 points per decade. We deconvolved the impedance data using the regular ridge DRT and the hyperparametric method with gamma pdf. Furthermore, we tookLto be the 2nd order differentiation matrix. We studied the impact of small, medium, and large relative weights attributed to the derivatives, see (34), by selectingˇ= 1.01, 2 and 30 respectively and0= 0.1.

As predicted, the regular ridge regression method does not capture faithfully the exact DRT. This is shown in Fig. 9 panel (a) and reproduced for reference in panels (b) and (c). The reg- ular method tends to smooth out the entire DRT by penalizing with equal strength the derivatives at every timescale. Conversely, the hyperparametric approach leads to generally better results. As illustrated inFig. 9panels (a) and (b), corresponding to ˇ= 1.01 and 2 respectively, the exact piecewise constant DRT is recovered quite faithfully. If insteadˇ= 30,Fig. 9panel (c), the discontinuity is

(12)

Fig. 9.Piecewise constant DRT recovered by regular and hyperparametric DRT methods (a), (b), and (c). The evolution of the local regularization levelas a function of iteration number (d), (e), and (f) (see Algorithm 1) and the corresponding evolution of the DRT (g), (h), and (i). The first column corresponds toˇ= 1.01, the second toˇ= 2 and the third toˇ= 30.

(13)

Fig. 10.Plot ofr, the normalized distance between hyperparametric DRT and the optimal Re-Im cross validated DRT, as a function of the parameterfˇ.

(a) (b)

(c) (d)

Fig. 11.Normalized distance between the hyperparametric DRT computed atCVand the hyperparametric DRT computed at various nominal regularization levels0with fixedfˇ(orˇ). The casefˇ→ ∞(ridge DRT) is shown in all plots as a red dashed line. Panels (a), (b), (c), and (d) correspond tofˇ= 0.1, 1, 10, 100 respectively.

(14)

not recovered. This behavior can be easily rationalized by looking into (34). Ifˇ≈1, the denominator is large since the derivatives’

contribution overwhelms the nominal regularization level0, i.e.,

ˇ−11 (L x)2j 10. Since the piecewise constant DRT has large numer- ical derivatives at the discontinuity, the local regularization level, j, is close to zero making it possible to capture the correspond- ing jump. Ifˇ is instead in the intermediate range (here we set ˇ= 2), then the orders of magnitude of the termsˇ11(L x)2j and1

0

are closer. In turn, this makes the hyperparametric approach less sensitive to the contribution of the derivatives. As illustrated in Fig. 9panel (b), while the general characteristics of the piecewise constant DRT are recovered, additional features emerge near the discontinuity. Ifˇis even larger, the impact of the derivatives will be further weakened and the hyperparametric method will lead to results close to those obtained by applying regular DRT as shown inFig. 9panel (c).

The hyperparametric approach relies on the refinement of the local regularization level, thereby modifying the regressed(ln) at each iteration. The evolution of the local regularization level is shown inFig. 9panels (d), (e), and (f) forˇ= 1.01, 2 and 30 respec- tively. As explained above, the regularization level declines sharply near the discontinuity forˇ= 1.01. This decrease is weaker forˇ= 2, and almost absent forˇ= 30. Ifˇ= 30, thej‘s are roughly uniform across the entire time spectrum. Concomitantly, with the regular- ization level’s progression, the() will evolve as a function of iteration number as shown inFig. 9panels (g), (h), and (i). Starting from the DRT obtained from regular ridge regression, the hyper- parametric approach will converge to the final() within a few iterations by emphasizing the relevant features.

3.2. Analysis of Lithium – ion battery Experiments

We finally applied the DRT regularization to the Lithium-ion bat- tery (LIB) spectra reported in our earlier publication[53]. We stress that is relevant because, as it is well known, the EIS responses of LIBs are so complicated that they require equivalent circuits with 10 or more parameters. In turn, this has adverse effects on the iden- tifiability of the respective models[76,77]. We note that in contrast with our previous work, we selected the basis elements in (6) to be radial basis functions[78].

As a first step, we screened the regular ridge DRT for all the 12 available spectra and determined for each spectrum the regulariza- tion parameter ¯CVby minimizing the Re-Im cross validation. We found that =¯CV+CV=2.2·105±3.16×106. We then employed the hyperparametric method with gamma hyperprior.

We did this in order to study the relative impact of the parameters ˇand0in (34) andTable 1. We first rewrote (34) in a more con- venient form by defining a new quantityfˇ, which normalizesˇas follows

fˇ= 1 0

ˇ−1 max

j=1,...,M

(L x)2j (35)

This in turn allows us to rewrite (34) as j= 0

1 fˇ

(L x)2j

max

j=1,...,M(L x)2j +1

(36)

where it is clear that the factorfˇdirectly tunes the relative impor- tance of the derivative term. In particular, iffˇis small, then the local derivative will decrease the regularization level. Conversely, for largefˇ the derivatives have little contribution (j0) and ultimately we recover ridge regression.

As a first step we analyzed the impact ofˇat fixed=¯CV. We then performed regularization as a functionfˇand calculated the

normalized distancer

fˇ

between the DRT computed using the regular ridge method and the hyperparametric approach. This is defined asr

fˇ

=

hyper

fˇ,opt,ln

CV(ln)

||CV(ln)|| , whereCV(ln) is the ridge DRT obtained for the cross-validated penalty. Its average is shown inFig. 10. For smallfˇ(ˇ≈1) the derivatives have a large impact on thej ‘s. In turn, the DRT obtained from the hyperpara- metric method deviates from the ridge DRT as shown by a rather largervalue. By increasing thefˇ, this difference decreases and the hyperparametric approach leads to DRTs closer to those obtained from simple regularization. In fact,r

fˇ

tends to 0 asfˇgoes to infinity.

As shown in Section3.1.1, the hyperparametric method with a carefully chosen ˇworks well for 0 ‘s beyond the optimum ¯CV. Thus, as a second step, we explored the effect of 0 given ˇ. We stress that, while an analytical solution of the DRT is not available, we took each individual hyperparametric DRT calcu- lated at CV as the reference condition. We then fixed ˇ and computed the normalized distance between the DRT computed at

hyper and and the DRT at ¯

CV with the same ˇ, namely r()=

fˇ,,ln

hyper

fˇ,CV,ln

||hyper

fˇ,CV,ln

|| . We repeated this computation forfˇ= 0.1, 1, 10 and 100 as shown inFig. 11panels (a), (b), (c), and (d) respectively. InFig. 11we averager() (solid line) along with the square root of the variance computed across the 12 experi- ments. Consistently with the intuition gained from expression (36) and as shown inFig. 11(d), for largefˇ(orˇ) the hyperparametric approach leads to solutions very similar to those obtained using the regularized DRT method (dashed line). Conversely, for smallerfˇ(ˇ progressively closer to 1) ther() is generally characterized by a smaller slope, seeFig. 11panels (a), (b), and (c). In particular, the slope decreases as a function offˇ. This indicates that the closerˇ is to 1, the less sensitive is the method to variations of0.

In order to further support these insights, we computed the DRTs obtained using the regular and hyperparametric approach at ¯CV and at0much greater than the optimum. Panels (a) and (b) of Fig. 12show the DRTs computed at0CVandoff−CV=100·¯CV

respectively. As displayed inFig. 12(a), if0increases, the typical

(a)

(b)

Fig. 12.Panel (a) shows the ridge DRT obtained at the Re-Im cross validated regu- larization levelCV(CV) and at 100·CV(off-CV). Panel (b) shows ridge (off-CV) and hyperparametric DRT (hyper), which are both computed at0= 100·CV.

(15)

ridge DRT is significantly flattened, such that the peaks are hardly visible. In contrast, the hyperparametric DRT with0=100·¯CV

andfˇ≈10 is capable of detecting the relevant DRT peaks with only minimal bias with respect to the ridge DRT at0CVas shown in Fig. 12(b).

4. Conclusions

In this article we have shown that the ridge regularization as applied to DRT can be obtained using Bayesian statistics. We used this insight to expand the classical ridge/Tikhonov methodology and develop hierarchical DRT methods. In doing that, we developed ways to adjust the levels of regularization in a timescale dependent fashion.

We employed both synthetic and real experiments to eval- uate the performance of the hierarchical DRT in comparison to traditional ridge/Tikhonov DRT and we illustrated that the hyper- parametric DRT is able to mitigate the effect of the bias found at large nominal regularization levels. In this regard, the hyperpara- metric approach is more robust in comparison to traditional ridge regression, allowing a reduced sensitivity to bias at high regulariza- tion levels. We were able to show this for real battery data, where even at high regularization levels the estimated DRTs are compa- rable to those obtained using optimized regularization levels by cross-validation. Furthermore we also showed, by means of a set of stochastic experiments, that the hyperparametric DRT is capable of faithfully capturing discontinuities in the DRT spectrum.

Lastly, we stress that understanding the DRT analysis in a Bayesian context opens up further opportunities for future devel- opment of this technique. By using informative priors and p

Referensi

Dokumen terkait