Bayesian local false discovery rate for sparse count data with application to the discovery of hotspots in protein domains

(1)

Bayesian local false discovery rate for sparse count data with application to the discovery of hotspots in protein domains

Item Type Article

Authors Gauran, Iris Ivy M.;Park, Junyong;Rattsev, Ilia;Peterson, Thomas A.;Kann, Maricel G.;Park, DoHwan

Citation Gauran, I. I. M., Park, J., Rattsev, I., Peterson, T. A., Kann, M.

G., & Park, D. (2022). Bayesian local false discovery rate for sparse count data with application to the discovery of hotspots in protein domains. The Annals of Applied Statistics, 16(3). https://

doi.org/10.1214/21-aoas1551 Eprint version Publisher's Version/PDF

DOI

10.1214/21-aoas1551

Publisher Institute of Mathematical Statistics Journal The Annals of Applied Statistics

Rights Archived with thanks to The Annals of Applied Statistics Download date 2024-01-09 21:59:12

Link to Item

http://hdl.handle.net/10754/679873

(2)

Supplementary Material to “Bayesian Local False Discovery Rate for sparse count data with application

to the discovery of hotspots in protein domains”

APPENDIX A: POSTERIOR DISTRIBUTIONS IN SECTION 3.4 In this section, we derive the posterior distributions of φ₀ = (η, λ, θ), φ₁, π₀, C, τ and z_N which are mentioned in Section 3.4. Estimation of the local FDR using Gibbs sampling requires full conditional distributions to be specified and sampled from. To implement the Gibbs sampler, an ordering of the (`₀+`₁+2) parameters is necessary. We proceed by specifying the conditional posterior distribution of C and τ. This is followed by the discussion on the conditional posterior distribution ofπ₀,φ₀ and φ₁.

When f₀ is ZIGP, the parameters are φ = (φ₀,φ₁, π₀, C) and the full conditional distributions are specified only up to a constant of proportion- ality. The posterior distribution of φ can be expressed in terms of the full likelihood function and further simplified as follows

f(φ|x_N,z_N) ∝ L(φ|x_N,z_N)g(φ) (A.1)

whereg(φ) =g(C |τ)g(τ)g(λ)g(η)g(θ)g(π₀)g(φ₁).

A.1. Conditional Posterior Distribution of C and τ. From the prior specification in (3.5) and (3.6), the conditional posterior density ofC given all the other parameters is

P(C =`|xN,zN,φ₀,φ₁, π0, τ)∝L(φ|x_N,zN)g(C =`|τ)g(τ)

=





 Y

j≤`

{π₀f₀(j|φ₀)}ⁿ^j Y

j≥`+1

f(j|φ₀,φ₁, π₀)ⁿ^j







P(`|τ)g(τ)

∞

X

`=0





 Y

j≤`

{π₀f0(j|φ₀)}ⁿ^j Y

j≥`+1

f(j|φ₀,φ₁, π0)ⁿ^j







P(`|τ)g(τ)

where P(` | τ) = e^−ττ^`

`! and g(τ) ≡ G(τ | κτ, ϑτ) is the density function of the Gamma distribution with parameters κτ and ϑτ. In practice, we use a truncated version of the denominator so that we have an approximate distribution which is the multinomial distribution

C|x_N,z_N,φ₀,φ₁, π₀, τ ≈ Multinomial(1,q) (A.2)

(3)

whereq= (q0, q1, . . . , qK) and q_` is defined as

q_`=





 Y

j≤`

{π₀f₀(j|φ₀)}ⁿ^j Y

j≥`+1

f(j|φ₀,φ₁, π₀)ⁿ^j







P(`|τ)g(τ)

X

`≤K





 Y

j≤`

{π₀f0(j|φ₀)}ⁿ^j Y

j≥`+1

f(j|φ₀,φ₁, π0)ⁿ^j







P(`|τ)g(τ) (A.3)

for `= 0,1, . . . , K and

k

P

`=0

q`= 1.

Moreover, the conditional posterior density ofτ depends only onC, that is,f(τ |C)∝g(C |τ)g(τ) whereg(τ)≡ G(τ |κτ, ϑτ) is the conjugate prior.

After the necessary calculations, the conditional posterior distribution of τ givenC is

f(τ |x_N,z_N,φ₀,φ₁, π₀, C)∝ G(τ|C+κ_τ, ϑ_τ + 1) (A.4)

whereG(·|a, b) is a density of Gamma distribution with parametersaandb.

A.2. Conditional Posterior Distribution ofz_N andπ₀. Before we can proceed to specify the conditional posterior distribution ofπ0, we need to look into the vector of latent variables first because the full likelihood includes the terms

π

N

P

i=1

zi

0 (1−π₀)^N⁻

N

P

i=1

zi

.

To specify the posterior distribution of z_N, we take into account the zero assumption thatx_iis generated fromf₀whenx_i≤Cfor a givenC. From the zero assumption, we havezi= 1 with probability 1 whenxi ≤C. Otherwise, we have z_i |φ₀,φ₁, π₀, C ∼ Bernoulli(p_i) where

p_i≡P(z_i = 1|φ₀,φ₁, π₀, C) = π₀f₀(x_i|φ₀) f(xi|φ₀, φ1, π0).

From the key assumption thatf(xi) =π0f0(xi |φ0) when xi ≤C, then the value of p_i indeed reduces to 1 for values of x_i ≤ C. Hence, we specify the conditional posterior distribution ofzi as

z_i |x_N,φ₀,φ₁, π₀, C∼Bernoulli (p_i) (A.5)

(4)

where p_i = max

I(x_i ≤C), π₀f₀(x_i|φ₀) f(x_i|φ₀,φ₁, π₀)

, for any i= 1,2, . . . , N and I(·) is an indicator function. When a given z_N = (z₁, z₂, . . . , z_N) is available, we can compute the number of samples from f0 and f1, N0 and N1, respectively as follows:

N₀ = X

j≤K

n_0j = X

j≤K

X

i≥1

z_iI(x_i =j), (A.6)

N₁ = X

j≤K

n_1j = X

j≤K

X

i≥1

(1−z_i)I(x_i =j).

(A.7)

Using (A.6) and (A.7), we can specify the posterior distribution of π₀ given the rest of the parameters as

π0|xN,zN,φ₀,φ₁, π0, C∼ B

π0

N0+ 1, N1+ 1 (A.8)

where B(·|N₀ + 1, N₁ + 1) is the Beta distribution with shape parameters N0+ 1 andN1+ 1.

A.3. Conditional Posterior Distribution ofφ₀ = (η, λ, θ). When f0 is ZIGP, the conditional posterior distribution of the null distribution parameters given the rest of the parameters is given by

f(φ₀|xN,zN,φ₁, C, π0) ∝ f(xN,zN |φ₀,φ₁, C)g(φ₀)

= g(φ₀) Y

i∈{i:z_i=1,1≤i≤N}

f₀(x_i)

= g(φ₀) Y

0≤j≤K

f₀(j)ⁿ^0j

whereg(φ₀) =g(η)g(λ)g(θ) =I_(0,1)(η) × I_(0,1)(θ) × λ^−0.5I_(0,∞)(λ) and Y

0≤j≤K

f₀(j)ⁿ^0j ∝ h

η+ (1−η)e^−λin00h

(1−η)λe^−λi

P

j≥1

n0j

e

−θP

j≥1

jn0j

×

Y

j≥1

(λ+θj)^j−1 j!

!n0j

.

If we definef(A|Rest) for some variableA means the conditional distribution ofA given all other data and parameters, the above expression can be

(5)

reduced to the following conditional posterior densities:

f(λ|Rest) ∝ h

η+ (1−η)e^−λin00

λ

−0.5+P

j≥1

n0j

e

−λP

j≥1

n0jY

j≥1

(λ+θj)^j−1 j!

!n0j

f(η|Rest) ∝ h

η+ (1−η)e^−λin00

(1−η)

P

j≥1

n0j

f(θ|Rest) ∝ e

−θP

j≥1

jn0jY

j≥1

(λ+θj)^j−1 j!

!n0j

.

The sampling scheme from the full conditionalsf(λ|Rest), f(η|Rest) and f(θ | Rest) are non-trivial because they do not reduce analytically to any well-known distribution with an available random variate generation. Hence, we rely on the Metropolis-Hastings algorithm instead.

Specifically, the MH algorithm is performed in vector form, that is, jumping in the three-dimensional space of φ₀ = (η, λ, θ) where the null distribution parameters are updated simultaneously. [1] stressed that there is no natural advantage to altering one parameter at a time except for potential computational savings. Meanwhile,φ₀ belongs to the constrained parameter space [0,1]×(0,∞)×[0,1] and notR³. However, after the appropriate transformation, we can assume that the conditional posterior distribution of φ₀ given the rest of the parameters is multivariate normal with known variance matrixΣ.

Following the work of [2], in order to obtain draws of the constrained parameters (η, λ, θ), we draw unconstrained random variables from the sampler and transform them to the constrained space [2]. Suppose φ₀ is the vector of the constrained parameters whose full conditional density isf(φ₀ |Rest).

Let g be the bijection from the space of φ₀ to the Euclidean space R³. The density of ϕ₀ =g(φ₀) is then f g⁻¹(ϕ₀)|Rest

· | det=(ϕ₀) | where

= = ∂φ₀/∂ϕ₀. Given ϕ₀ = g(φ₀), a proposed ϕ^?₀ will be accepted with probability

min (

1, f g⁻¹(ϕ^?₀)|Rest

) (A.9)

(6)

APPENDIX B: ADAPTIVE METROPOLIS-HASTINGS WITHIN GIBBS SAMPLING

Bayesian methods using Markov Chain Monte Carlo (MCMC) simulation radically influenced current statistical research and have debuted countless new avenues of performing inference [3,4]. As an overview, MCMC simulation is a general method based on drawing samples from approximate distributions and then correcting those draws to better approximate the target posterior distribution [1]. The sampling is performed sequentially wherein the distribution of the sampled draws depend on the distribution of the last value drawn, thereby forming a Markov chain [1,3].

Among the MCMC simulation methods, we are interested in implementing the Gibbs sampler and the Metropolis-Hastings algorithm. Applied in the Bayesian context, Gibbs sampling is a technique for drawing dependent samples from a multidimensional posterior distribution of the model parameters [5, 6, 7]. Also referred to as “alternating conditional sampling”, each iteration of the Gibbs sampler cycles through the subvectors of the parameter vector, sayθ, drawing each parameter or set of parameters conditional on the value of all the others. As discussed in [1], suppose θ= (θ₁,θ₂, . . . ,θ_p) can be divided intopsubvectors [1]. An ordering of thepsubvectors is chosen and at each iterationt, eachθ^(t)_j is sampled from the conditional distribution given all the other components ofθ, that is,

p(θ_j |θ^(t−1)_−j ,y) where θ^(t−1)_−j =

θ^(t)₁ , . . . ,θ^(t)_j−1,θ^(t−1)_j+1 , . . . ,θ^(t−1)_p .

This indicates that each subvector θ_j is updated conditional on the most recent values of the other components of θ, which include the components already updated at iteration t and the values of remaining components at iterationt−1.

With the availability of inexpensive, high-speed computing, [8] mentioned that using Gibbs sampler would allow researchers to avoid difficult analytical calculations, but rather deal with a sequence of easier calculations. In line with this, [9] illustrated that by freeing the statistician from the daunting calculations and numerical integration, the main focus can be shifted to the statistical aspects of the problem [9]. Another advantage pointed out by [7]

is that certain full conditionals reduce analytically to well-known distributions, for which special methods for efficient random variate generation are available.

Meanwhile, another MCMC technique useful for sampling from posterior distributions is the Metropolis-Hastings (MH) algorithm which was developed by [10] and generalized by [11]. As described by [3], a chain (θ^(t)) is

(7)

generated by drawing θ^? values from a proposal distribution qt(θ^? |θ^(t−1)) and settingθ^(t) to







θ^?, with probability min

r = p(θ^?|y) p(θ^(t−1) |y),1

θ^(t−1), otherwise

[1] pointed out the algorithm requires the calculation of the ratio r for all (θ^t−1, θ^?), t = 1, . . . , T and when the jump is not accepted, that is θ^(t) = θ^(t−1), still counts as an iteration in the algorithm.

It can be noted that a key component in implementing the MH algorithm is the choice of the proposal distribution. It is crucial to specify a proposal distribution which provides a close approximation of the posterior distribution because it affects the chain’s ability to move efficiently across the state space and the speed at which the chain converges [4]. When the acceptance rate is low and the correlation among the draws is high, the chain can be trapped indefinitely in a local mode thereby resulting to a slow convergence [12,13].

[1] argued that it is difficult to provide general advice on efficient jumping rules [1]. However, they provided some insights for the case of multivariate normal random walk proposal distributions with the normal jumping ker- nel centered on the current point and with the same shape as the target distribution, that is,

q(θ^? |θ^(t−1)) =N_p(θ^? |θ^(t−1), c²Σ).

Among this class of proposal distributions, [1] mentioned that the most efficient has scale c ≈ 2.4/√

p, where efficiency is measured relative to independent sampling from the posterior distribution [1]. Further, they described that the optimal jumping rule has acceptance rate around 44% in one-dimension, reducing to roughly 23% in high-dimension (p >5).

Given the pivotal role played by the proposal distribution, several ex- tensions and adaptive methodologies were proposed to use the preliminary draws to “tune” the proposal to the target. As an active research area, the literature on adaptive MH methods is wide-ranging and can be categorized into several groups. Among these groups, we focus on diminishing adaptation schemes. According to [4], a diminishing adaptation MH sampler performs the standard accept/reject step but in contrast to a traditional MH algorithm, it updates the proposal distribution using the history of the draws.

It is referred to as “diminishing” because the updating of the proposal distribution settles down asymptotically in terms of the number of iterations.

(8)

Theoretical work involving diminishing adaptation were developed by [14], [15] and [16], among others.

Consequently, [4] argued that although more theoretical work on adaptive sampling can be expected, the existing framework already provide sufficient justification and guidelines to build adaptive MH samplers for challenging problems.

For this problem, implementing a Gibbs sampler seemed to be a natural choice initially. However, the zero inflated null distribution is not condition- ally conjugate and it is not trivial to draw samples from the full conditional distribution. Hence, we embedded the Metropolis-Hastings algorithm within a Gibbs sampler structure. The details on how the algorithm is carried out are provided in Section 4 of the main text.

(9)

APPENDIX C: PROPOSED METHODS IN SECTION 4

C.1. Semiparametric Model for Bayesian False Discovery Rate in Section 4.2. In contrast to the scenario described in Section 4.1, we consider a nonparametric distribution forf1instead off. The prior distribution ofΥis given by D(β) whereβ= (, , . . . , , γ, γ, . . . , γ) = (·1_C+1, γ· 1_P−C−1). To reflect the zero assumption, we assign≈0 so that Υ₀, . . . ,Υ_C are generated to be almost 0 for a givenC. Note thatzi = 0 corresponding tox_i≤C for a givenC from the zero assumption. This leads ton_0j = 0 for j≤C. The posterior distribution ofΥis

Υ|(x_N,z_N,β,φ₀, C) ∼ D(β) (C.1)

where

β_j =







, 0≤j≤C

γ+n1j, C < j ≤K γ, K < j≤P

forj = 0,1,2, . . . , P and P

j≤P

Υ_j = 1. The implementation of the algorithm with zero assumption is similar to the algorithm described in Section 4.1 except for the modification in the concentration parameterβ in Step (8).

Algorithm for Semiparametric Model:

Step 8 Gibbs step for Υ: Generate Υ^(t) from (C.1) and compute ψ_j = π₀^(t)f₀^(t)(j|φ^(t)₀ ) + (1−π₀^(t))Υ^(t)_j for 0≤j≤P.

Once we obtain samples φ^(t) and z^(t)_N, we compute the local false discovery rate using

fdr(j|x_N) = E_z_N_,φ|x_N[ fdr(j|φ,x_N,z_N) ] (C.2)

≈ 1 T

T

X

t=1

fdr(j|φ^(t),xN,z^(t)_N)

for fdr(j|φ,xN,zN) = ^π⁰_f(j|φ,x^f⁰^(j|φ,x^N^,z^N⁾

N,zN) and mutation counts j = 0,1, . . . , K.

We reject the null hypothesis if

fdr(j|x_N)≤α= 0.05.

(C.3)

(10)

C.2. Parametric Model for Bayesian False Discovery Rate in Section 4.3. Instead if a nonparametric distribution forf₁, we consider a parametric distribution in this framework. We use the shifted Generalized Poisson (GP) distribution to reflect the zero assumption such that for a given C,x∼f₁(x) whereX=W+C andW is the given parametric distribution forW ≥0. The conditional posterior density is

f(φ₁|Rest) ∝ f(x_N,z_N |φ₁)f(φ₁) where f(x_N,z_N | φ₁) ∝ Y

i≥1

f₁(x_i|φ₁)^1−zⁱ = Y

j>C

g(j−(C+ 1)|φ₁)ⁿ^1j and g(x) is the density function of GP distribution.

The main drawback for using Poisson distribution to model count data is its inability to account for overdispersion. Hence, we considered the case where the alternative distribution is GP. However, since GP is not condi- tionally conjugate, we employ extra Metropolis-Hastings (MH) steps to draw samples ofφ₁ = (δ, ν). We modify Step (1b) in 4.1 by specifying φ⁽⁰⁾₁ and

∆⁽⁰⁾, where the latter is the initial value of the covariance matrix for the proposal distribution. Then, we follow Steps (2) to (7). We replace Step (8) with the following MH steps:

Algorithm for Parametric Model:

Step 8 Generate φ^(t)₁ = (δ^(t), ν^(t)) from the following algorithm:

(a) Randomly generateu_t from bivariate Standard Normal and let ϕ^(t)₁ =

∆^(t)1/2

u_t+φ^(t)₁ .

(b) Accept φ^(t+1)₁ = h⁻¹(ϕ^(t)₁ ) with probability defined in (A.9) in Supplementary Material whereh(φ₁) =h(δ, ν) = (logδ,log_1−ν^ν ).

Otherwise, setφ^(t+1)₁ =φ^(t)₁ .

Compute ψ_j^(t) =π₀^(t) f₀^(t)(j) + (1−π^(t)₀ )f₁^(t)(j).

This is followed by Step (9) which is an updating procedure similar to Step (7) in Section 4.1 except we replaceΣ^(t) by ∆^(t) and φ^(t)₀ by φ^(t)₁ . Finally, we repeat Steps (2) to (9) fort= 1,2, . . . , T.

(11)

APPENDIX D: ADDITIONAL RESULTS

We performed additional simulation studies on different numbers of positions in a protein domainN to provide a clearer picture on the difference between the full and empirical Bayesian approaches.

When the truef0 andf1are well-separated as exemplified by the scenario wherein the true null distributionf₀ is ZIP with θ= 0, we can observe that the results for all methods, both fully Bayesian and empirical Bayes method, coincide. In this case, all empirical and full Bayesian methods are controlled in terms of their FDR and the TPR of all methods are approaching to 1 as N increases. In Figure 1, it can be observed that when N is at least 500, then all methods “catch up”. Also, the variability in the FDR and TPR of all the methods can be distinguished when N is small (0 ≤ N < 200) or moderately small (200≤N <500).

Meanwhile, the histograms are displayed in Figure 2 if f0 and f1 are heavily mixed. The corresponding numerical comparison ofFDR when[ f₀ is ZIGP(η= 0.4, λ, θ),f₁ is shifted Binomial,π₀= 0.80, across varying values ofN,λand θis presented in Figure 3. Table 1 in the main manuscript is a specific case where λ= 2 andθ= 0.3.

According to our simulation studies, we can see that the fully Bayesian methods tend to control FDR for N is less than 200 while the empirical Bayes (EB) approaches have inflated FDR. As discussed in the main text, asN increases, the EB methods show improvement in controlling FDR and obtain more TPR than the full Bayesian methods. We recommend the use of full Bayesian methods when N < 200 to ensure control FDR. We pro- pose the use of EB methods when the number of positions is large, i.e.

when N is at least 800. On the other hand, when N is moderately small (200≤N < 500) and N is moderately large (500 ≤N <800), we tend to see different results in terms of superiority of the methods. An EB method (One-Stage procedure) needs at a moderately small number of positions to (marginally) control a given level of FDR. We reiterate proposing estimation of the overdispersion parameter to facilitate the guidelines and not just rely solely on the number of positions. There is no universally supe- rior method among the proposed fully Bayesian models and the empirical Bayesian methods used as benchmark.

(12)

Fig 1: Numerical Comparison when the truef₀ andf₁ are well-separated

0.00 0.01 0.02 0.03 0.04 0.05

250 500 750 1000

N

FDR

(a) FDR, C= 10

0.94 0.96 0.98 1.00

250 500 750 1000

N

TPR

Procedure Bayesian − NP Bayesian − P Bayesian − SP EB − One Stage EB − Two Stage Storey

(b) TPR,C= 10

0.00 0.01 0.02 0.03 0.04 0.05

250 500 750 1000

N

FDR

(c) FDR,C= 5

0.85 0.90 0.95 1.00

250 500 750 1000

N

TPR

Procedure Bayesian − NP Bayesian − P Bayesian − SP EB − One Stage EB − Two Stage Storey

(d) TPR,C= 5

(13)

0 2 4 6 8 10 121416182022 24

051015051015

(a)N = 50, λ= 2

0 2 4 6 810 13 16 19 22 25 28 31

051015202530051015202530

(b)N = 100, λ= 2

02 4 6 810 13 16 19 22 25 28 31

010203040506070010203040506070

(c)N = 200, λ= 2

0 2 4 6 8 101214161820222426

024681012024681012

(d)N = 50, λ= 3

0 246 810 13 16 19 22 25 28 31 34

051015202530051015202530

(e)N = 100, λ= 3

024 6810 13 16 19 22 25 28 31 34

01020304050600102030405060

(f) N= 200, λ= 3

0 2 4 6 8 101214161820222426

024681012024681012

(g)N = 50, λ= 4

02 468 10 1316 19 2225 28 3134

05101520250510152025

(h)N = 100, λ= 4

02468 10 1316 19 2225 28 3134

0102030405001020304050

(i)N = 200, λ= 4 Fig 2: Histograms when f0 is ZIGP(η = 0.40, λ, θ = 0.30), f1 is Binomial, π₀ = 0.80, andC= 5

(14)

Fig3:NumericalComparisonof[FDRwhenf0isZIGP(η=0.4,λ,θ),f1isshiftedBinomial,π0=0.80,across varyingvaluesofN,λandθ.Table1inthemainmanuscriptisaspecificcasewhereλ=2andθ=0.3.

(15)

REFERENCES

[1] Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A.and Rubin, D. B.(2014) Bayesian Data Analysis.

[2] Raim, A. M., Neerchal, N. K.andMorel, J. G.(2017). An extension of generalized linear models to finite mixture outcome distributions.Journal of Com- putational and Graphical Statistics, (just-accepted).

[3] Robert, C. P. and Casella, G. (2005). Monte Carlo Statistical Methods.

Springer Texts in Statistics.

[4] Giordani, P.andKohn, R.(2010). Adaptive independent Metropolis-Hastings by fast estimation of mixtures of normals.Journal of Computational and Graphical Statistics, 19(2), 243-259.

[5] Geman, S.andGeman, D.(1993). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images.Journal of Applied Statistics, 20(5-6), 25-62.

[6] Gelfand, A. E.andSmith, A. F.(1990). Sampling-based approaches to calcu- lating marginal densities.Journal of the American Statistical Association, 85(410), 398-409.

[7] Gilks, W. R., Best, N. G. and Tan, K. K. C. (1995). Adaptive Rejection Metropolis sampling within Gibbs sampling.Applied Statistics, 455-472.

[8] Casella, G. and George, E. I. (1992). Explaining the Gibbs sampler. The American Statistician, 46(3), 167-174.

[9] Smith, A. F.and Gelfand, A. E. (1992). Bayesian statistics without tears: a sampling–resampling perspective.The American Statistician, 46(2), 84-88.

[10] Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953). Equation of state calculations by fast computing ma- chines.The Journal of Chemical Physics, 21(6), 1087-1092.

[11] Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications.Biometrika, 57(1), 97-109.

[12] Liu, J. S.(2008). Monte Carlo strategies in Scientific Computing.Springer Sci- ence & Business Media.

[13] Luengo, D. and Martino, L. (2013) Fully adaptive Gaussian mixture Metropolis-Hastings algorithm. In Proceedings: IEEE International Conference on Acoustics, Speech and Signal Processing.

[14] Holden, L., Hauge, R. and Holden, M. (2009). Adaptive independent Metropolis–Hastings.The Annals of Applied Probability, 19(1), 395-413.

[15] Haario, H., Saksman, E. andTamminen, J. (2001). An adaptive Metropolis algorithm.Bernoulli, 7(2), 223-242.

[16] Atchad´e, Y. F.andRosenthal, J. S.(2005). On adaptive Markov Chain Monte Carlo algorithms.Bernoulli, 11(5), 815-828.