Statistics authors titles recent submissions

(1)

A Unified Method for Exact Inference in

Random-effects Meta-analysis via Monte Carlo

Conditioning

Shonosuke Sugasawa

Risk Analysis Research Center, The Institute of Statistical Mathematics Hisashi Noma

Department of Data Science, The Institute of Statistical Mathematics

Summary

Random-effects meta-analyses have been widely applied in evidence synthesis for

various types of medical studies to adequately address between-studies heterogeneity.

However, standard inference methods for average treatment effects (e.g., restricted

maximum likelihood estimation) usually underestimate statistical errors and possibly

provide highly overconfident results under realistic situations; for instance, coverage

probabilities of confidence intervals can be substantially below the nominal level. The

main reason is that these inference methods rely on large sample approximations even

though the number of synthesized studies is usually small or moderate in practice.

Also, random-effects models typically include nuisance parameters, and these

meth-ods ignore variability in the estimation of such parameters. In this article we solve

this problem using a unified inference method without large sample approximation

for broad application to random-effects meta-analysis. The developed method

pro-vides accurate confidence intervals with coverage probabilities that are exactly the

same as the nominal level. The exact confidence intervals are constructed based on

the likelihood ratio test for an average treatment effect, and the exact p-value can

be defined based on the conditional distribution given the maximum likelihood

es-timator of the nuisance parameters via the Monte Carlo conditioning method. As

(2)

specific applications, we provide exact inference procedures for three types of

meta-analysis: conventional univariate meta-analysis for pairwise treatment comparisons,

meta-analysis of diagnostic test accuracy, and multiple treatment comparisons via

network meta-analysis. We also illustrate the practical effectiveness of these methods

via real data applications.

Key words: Confidence interval; Likelihood ratio test; Meta-analysis; Random-effect

(3)

1 Introduction

In evidence-based medicine, meta-analysis has been an essential tool for

quantita-tively summarizing multiple studies and producing integrated evidence. In general,

the treatment effects from different sources of evidence are heterogeneous due to

var-ious factors such as study designs, participating subjects, and treatment

administra-tion. Therefore, such heterogeneity should be adequately addressed when combining

evidence sources, otherwise statistical errors may be seriously underestimated and

possibly result in misleading conclusions (Higgins and Green, 2011). The

random-effects model has been widely used to address these heterogeneities in most medical

meta-analyses. The applications cover various types of systematic reviews, for

exam-ple, conventional univariate meta-analysis (DerSimonian and Laird, 1986; Whitehead

and Whitehead, 1991), meta-analysis of diagnostic test accuracy (Reitsuma et al.,

2005), network meta-analysis for comparing the effectiveness of multiple treatments

(Salanti, 2012), and individual participant meta-analysis (Riley et al., 2010).

However, in random effect meta-analysis, most existing standard inference

meth-ods for average treatment effect parameters underestimate statistical errors under

realistic situations of medical meta-analysis, for example, the coverage probabilities

of standard inference methods are usually smaller than the nominal confidence levels,

even when the model is completely specified (Brockwell and Gordon, 2001; 2007).

This may lead to highly overconfident conclusions. The main reason for this

no-table problem is that random-effects models typically include heterogeneity

variance-covariance parameters other than the average treatment effect, and most inference

methods depend on large sample approximations for the number of trials to be

syn-thesized, which is usually small or moderate in medical meta-analysis. In other words,

the variability of the estimation of nuisance parameters is possibly ignored and the

total statistical error can be underestimated when constructing confidence intervals.

Recently, several confidence intervals that aim to improve the undercoverage

prop-erty have been developed, for example, by Brockwell and Gordon (2007), Henmi and

(4)

Noma et al. (2017), Guolo (2012), and Sidik and Jonkman (2002). Although

cover-age properties are relatively improved under realistic situations, the validity of these

methods is substantially guaranteed using large sample approximations. Moreover,

most of these methods were developed in the context of traditional direct pairwise

comparisons; therefore, the methods have limited applicability in recent, more

ad-vanced types of meta-analysis that use the complicated multivariate models noted

above.

In this paper, we develop a unified method for constructing exact confidence

in-tervals for random effect models so that their coverage probabilities are equal to the

nominal level regardless of the number of studies. To effectively circumvent the effects

of nuisance parameters, we focus on the sufficiency of the maximum likelihood

estima-tors of these parameters. After that, we consider the likelihood ratio test (LRT) for

the average treatment effect, and we define itsp-value based on the conditional

dis-tribution given the maximum likelihood estimator of the nuisance parameters rather

than the unconditional distribution of the test statistic. For computing the p-value

exactly, we adopt the Monte Carlo conditioning technique proposed by Lindqvist and

Taraldsen (2005), and the confidence interval can be derived by inverting the LRT.

As a result, the proposed method does not rely on the asymptotic approximation;

therefore, the derived confidence intervals have exact coverage probabilities and the

proposed method can be generally applied to various types of meta-analysis involving

complicated multivariate random effect models.

In Section 2, we first provide a Monte Carlo algorithm for computing an exact

p-value of the LRT and derive an exact confidence interval under general

statisti-cal models. In Section 3, the proposed method is applied to the simplest univariate

random-effects meta-analysis for direct pairwise comparisons. We conduct

simula-tions for evaluating the empirical performances of the proposed confidence interval,

and illustrate our method using the well-known meta-analysis of intravenous

mag-nesium for suspected acute myocardial infarction (Teo et al., 1991). In Sections 4

and 5, we discuss the applications to meta-analysis for diagnostic test accuracy and

(5)

is provided in Section 6.

2 Exact test and confidence interval

We supposey1, . . . , yn are independent and each has the density or probability mass

function fi(yi;φ, ψ) with parameter of interest φ and nuisance parameter ψ. For

example, in the univariate meta-analysis described in Section 3,nand yi correspond

to the number of studies and estimated treatment effect in theith study, respectively,

and we use the model yi ∼ N(µ, τ2 +σi2) with known σi2 to estimate the average

treatment effectµ, so thatφ=µandψ=τ2in this case. In general,yi,φandψcould

be multivariate, but we assume in this section that all of them are one-dimensional in

order to make our presentation simpler. Multivariate cases are considered in Sections

4 and 5. The likelihood ratio test (LRT) statistic for testing null hypothesisH0:φ=

φ0 is

Tφ0(Y) =−2

max

ψ L(Y, φ0, ψ)−maxφ,ψ L(Y, φ, ψ)

,

where Y = (y1, . . . , yn)t, and L(Y, φ0, ψ) = Pni=1logfi(yi;φ, ψ) is the log-likelihood

function. Under some regularity conditions, the asymptotic distribution of Tφ0(Y)

under H0 is χ2(1) as n _{→ ∞}. However, when the sample size n is not large as is

often the case in meta-analysis, the approximation is not accurate enough. The main

reason is that there is an unknown nuisance parameter ψ and its estimation error

is not ignorable when n is not large. To overcome this problem, we calculate the

p-value of the statisticTφ0(Y) based on the conditional distribution Y|ψb(φ0), where

b

ψ(φ0) is the maximum likelihood estimator of ψunder H0. Owing to the sufficiency

of the maximum likelihood estimator, the distribution ofY_|ψb(φ0) is free from the

nui-sance parameterψ, which enables us to compute an exact p-values without using any

asymptotic approximations. The key tool is the Monte Carlo conditioning suggested

in Lindqvist and Taraldsen (2005).

To describe the general methodology, we further assume that Y can be expressed

as Y = H(U, φ, ψ) for some function H and random variable U whose distribution

(6)

withU _∼N(0,1). Now, the maximum likelihood estimatorψbunder H0 satisfies the

following equation:

Lψ(Y, φ0,ψb) = 0,

whereLψ =∂L/∂ψ is the partial derivative of the likelihood functionL with respect

to the nuisance parameter ψ. Under H0, Y can be expressed as Y = H(U, φ0, ψ);

thereby, the above equation can be rewritten as

δ(U,ψ, ψb )_≡Lψ(H(U, φ0, ψ), φ0,ψb) = 0.

where the expectation is taken with respect to the distribution ofU, and

w(U) =

controls the efficiency of the Monte Carlo approximation in (1). However, the detailed

discussion of this issue would extend of the scope of this paper; thus, we consider in

this paper only the uniform prior,π(ψ) = 1. The algorithm for computing the exact

p-value of the LRT for testing H0:φ=φ0 is given as follows.

Algorithm 1. (Monte Carlo method for the exact p-value of LRT)

1. Compute the LRTTφ0(Y) and the constrained maximum likelihood estimatorψb

(7)

3. The Monte Carlo approximation of the exact p-value is given by

PB b=1I

n Tφ0(Y

(b)

∗ )≥Tφ0(Y)

o

w(U(b)₎ PB

b=1w(U(b))

. (3)

Note that as B _{→ ∞}, the Monte Carlo approximation (3) converges to the exact

p-value (1) regardless of the sample sizen.

Using the exact p-value of the LRT of H0 :φ=φ0, the exact confidence interval

of φ with nominal level 1₋α can be constructed as the set of φ† _{such that the}

exactp-value of the LRT of H0 : φ= φ† is larger than α. Although the confidence

limits cannot be expressed in closed form, they can be computed by simple numerical

methods, for example the bisectional method that repeatedly bisects an interval and

selects a subinterval in which a root exists until the process converges numerically, see

Section 2 in Burden and Faires (2010). Whenφorψare multivariate (vector-valued)

parameters, this method can be easily extended to the case. Whenψis multivariate,

which is typical in many applications, the absolute value symbol in the weight (2)

should be recognized as determinant since ∂ψ/∂ψb is a matrix. On the other hand,

whenφis multivariate, we need to construct a confidence region (CR) rather than a

confidence interval. In this case, the bisectional method cannot be directly applied,

and methods for CRs would depend on each setting. In Section 4, we present a

diagnostic meta-analysis in which a CR is traditionally used, and provide a feasible

algorithm to compute an exact CR.

3 Univariate random-effects meta-analysis

3.1 The random-effects model

The univariate random-effects model has been widely used in meta-analysis due to

its parametric simplicity. However, the accuracy of the inference is poor when the

number of studies is small. We consider solving this problem using the exact LRT and

(8)

and thaty1, . . . , ynare the estimated treatment effects. We consider the random-effect

model:

yi =θi+ei, θi =µ+εi, i= 1, . . . , n, (4)

whereθi is the true effect size of theith study, andµis the average treatment effect.

Here ei and εi are independent error terms within and across studies, respectively,

assumed to be distributed as ei ∼ N(0, σ2i) and εi ∼ N(0, τ2). The within-studies

variancesσ_i2s are usually assumed to be known and fixed to their valid estimates cal-culated from each study. On the other hand, the across varianceτ2is an unknown

pa-rameter representing the heterogeneity between studies. Under these settings, Hardy

and Thompson (1996) considered the likelihood-based approach for estimating the

average treatment effectµ.

3.2 Exact confidence intervals of model parameters

We first consider an exact confidence interval ofµby using Algorithm 1 in the previous

section. To begin with, we consider the null hypothesisH0 : µ =µ0 with nuisance

parameterτ2_{. Since} _y

The minimization can be achieved in standard ways such as the iterative method in

Hardy and Thompson (1996).

From (5), the constrained maximum likelihood estimator _bτ_c2 of τ2 under H0

(9)

can be expressed asyi =µ0+ui

q

τ2₊_σ2

i under H0. Substituting the expression for

yi in the above equation, we have

G(U, τ2,_bτ_c2)_≡

and the solution is given by

τ_∗2(U) =

Regarding the weight (2), using the implicit function theorem, it holds that

w(U) =

Algorithm 1, and the confidence interval ofµ by inverting the LRT.

An exact confidence interval ofτ2 can be derived as well. By a similar derivation

to that for µ, the exact p-values of the LRT of H0 :τ2 = τ02 can be computed from

so that the exact confidence interval ofτ2 can be similarly constructed.

3.3 Simulation study of coverage accuracy

We evaluated the finite sample performances of the proposed exact (EX) confidence

interval of µ together with existing methods widely used in practice. We

consid-ered the restricted maximum likelihood (REML) method, the DirSimonian and Laird

(10)

(Hardy and Thompson, 1996). When implementing the EX method, we used 1000

Monte Carlo samples to compute the exactp-value. We fixed the true average

treat-ment effectµat₋0.80, and the heterogeneity varianceτ2 at 0.10 and 0.20. Following

Rockwell and Gordon (2001), within-study variancesσ2

is were generated from a scaled

chi-squared distribution with 1 degree of freedom, multiplied by 0.25, and truncated

to lie within the interval [0.009,0.6]. We changed the number of studiesnover 3,5,7

and 9, and set the nominal levelα to 0.05. Based on 2000 simulation runs, we

calcu-lated the coverage probabilities (CP) and average lengths (AL) of the four confidence

intervals. The results, shown in Table 1, indicate that the confidence intervals from

the three methods other than EX are too liberal to achieve the appropriate nominal

level 0.95 even whenn= 9. On the other hand, the EX method produces reasonable

confidence intervals with appropriate coverage probabilities even when nis 3.

Table 1: Simulated coverage probabilities (CP) and average lengths (AL) of 95% confidence intervals from the proposed exact (EX) method, the restricted maximum likelihood (REML) method, the DirSimonian and Laird (DL) method, and the like-lihood ratio (LR) method.

CP AL

n 3 5 7 9 3 5 7 9

τ2= 0.10 EX 0.956 0.957 0.948 0.951 2.079 1.156 0.869 0.714

REML 0.809 0.848 0.852 0.870 1.004 0.729 0.620 0.548

DL 0.815 0.849 0.855 0.869 1.007 0.729 0.621 0.547

LR 0.868 0.876 0.885 0.899 1.126 0.807 0.677 0.592

τ2= 0.20 EX 0.945 0.953 0.943 0.947 2.459 1.400 1.063 0.878

REML 0.795 0.839 0.857 0.880 1.195 0.905 0.787 0.699

DL 0.800 0.845 0.855 0.873 1.197 0.905 0.784 0.696

LR 0.833 0.879 0.887 0.901 1.340 0.999 0.849 0.748

3.4 Example: treatment of suspected acute myocardial infarction

Here we applied the proposed method to a meta-analysis of the treatment of suspected

acute myocardial infarction with intravenous magnesium (Teo et al., 1991), which is

well-known as it yielded conflicting results between meta-analyses and large clinical

trials (LeLorier et al., 1997). For the dataset, we constructed a 95% confidence

(11)

EX method (with 10000 Monte Carlo samples in evaluating thep-value) as well as the

REML, DL and LR methods considered in the previous section. Moreover, we also

applied Peto’s fixed effect (PFE) method (Yusuf, 1985). The results are given in Table

2. The confidence intervals from the DL, LR, REML, and PFE methods were narrower

than that of the EX method since the first four methods cannot adequately account

for the additional variability in estimating the between study varianceτ2_{. Therefore,}

the confidence intervals from the first four methods did not coverµ= 1, which does

not change the interpretation of the results. On the other hand, the proposed EX

method produced a longer confidence interval than the other four methods while also

coveringµ = 1, that is, the corresponding test for µ = 1 was not significant with a

5% significant level.

Table 2: Estimates and confidence intervals of the average treatment effect of intra-venous magnesium on myocardial infarction based on six methods.

Method Estimate 95% CI

EX 0.449 (0.150, 1.103)

DL 0.448 (0.233, 0.861)

LR 0.449 (0.192, 0.903)

REML 0.437 (0.213, 0.895)

PFE 0.471 (0.280, 0.791)

4 Bivariate Meta-analysis of Diagnostic Test Accuracy

4.1 Bivariate random-effects model

There has been increasing interest in systematic reviews and meta-analyses of data

from diagnostic accuracy studies. For this purpose, a bivariate random-effect model

(Reitsma et al., 2005 and Harbord et al., 2007) is widely used. Following Reitsma et al.

(2005), we defineµAiandµBi as the logit-transformed true sensitivity and specificity,

(12)

bivariate normal distribution:

where µA and µB are the average logit-transformed sensitivity and specificity, and

σA(>0) and σB(>0) are standard deviations ofµAi and µBi, respectively. Here the

parameterρ_∈(₋1,1) allows correlation betweenµAi andµBi. The unknown

param-eters are µA, µB, σA2, σB2 and ρ. Let yAi and yBi be the observed logit-transformed

sensitivity and specificity, and we assume that



be more valuable than separate confidence intervals since sensitivity and specificity

might be highly correlated. Reitsma et al. (2005) suggested the 100(1₋α)% joint

CR forµas the interior points of the ellipse defined as

µA=µbA+cαsbAcost, µB =µbB+cαbsBcos(t+ arccosρb), t∈[0,2π), (8)

where µ_bA and µbB are the maximum likelihood estimates of µA and µB, bsA and bsB

are estimated standard errors of µ_bA and µbB, respectively, and cα is the square root

of the upper 100α% point of theχ2 _{distribution with 2 degrees of freedom. The joint}

CR (8) is approximately valid; specifically, the coverage error converges to 1₋α as

the number of studies n goes to infinity. However, when n is not sufficiently large,

the coverage error is not negligible, and the region (8) undercovers the trueµ.

4.2 Exact CR of sensitivity and specificity

We consider an exact CR ofµunder the models (6) and (7) based on the Monte Carlo

method given in Section 2. Letψ= (σ_A2, σ_B2, ρ)t _{be a vector of nuisance parameters,}

(13)

LRT statistic of the null hypothesis H0:µ=µ0 is given by

Under H0, the constrained maximum likelihood estimatorψbcsatisfies the following

equations:

solution can be numerically obtained by minimizing the sum of squared values of

three equations with respect toψ. Moreover, concerning the weight (2), we may use

the numerical derivative given U. Hence, we can compute the exact p-value of the

LRT of H0:µ=µ0 from Algorithm 1.

From the LRT of H0, the (1−α)% exact CR of µ can be defined as ECRα =

{µ;p(µ)_≥α_}, wherep(µ) denotes the exactp-value of the test statisticTµ(Y). Since

µ is two-dimensional in this case, the computing boundary _{µ;p(µ) = α_} is not

(14)

sufficiently large numbers of points. To this end, we first divide the interval [0,2π) by

M points 0 =t1 <_{· · ·}< tM <2π. For eachm= 1, . . . , M, we computerm satisfying

p(µ_b+ (rmcostm, rmsintm)) =α,

which can be carried out via numerical methods (e.g. the bisectional method).

4.3 Example: screening test accuracy for alcohol problems

Here we provide a re-analysis of the dataset given in Kriston et al. (2008), including

n = 14 studies regarding a short screening test for alcohol problems. Following

Reitsuma et al. (2005), we used logit-transformed values of sensitivity and specificity,

denoted byyAiandyBi, respectively, and associated standard errorssAiandsBi. For

the bivariate summary data, we fitted the bivariate models (6) and (7), and computed

95% CRs of µ based on the approximated CR of the form (8) given in Reitsma

et al. (2005). Moreover, we computed the proposed exact CR with 1000 Monte

Carlo samples for calculating p-values of the LRT, and M = 200 evaluation points

that were smoothed by a 7-point moving average for the CR boundary. Following

Reitsuma et al. (2005), the obtained two CRs of (µA, µB) were transformed to the

scale (logit(µA),1−logit(µB)), where logit(µA) and 1−logit(µB) are the sensitivity

and false positive rate, respectively. The obtained two CRs are presented in Figure

1 with a plot of the observed data, summary points _bµ, and the summary receiver

operating curve. The approximate CR is smaller than the exact CR, which may

indicate that the approximation method underestimates the variability of estimating

nuisance variance parameters.

5 Network Meta-analysis

5.1 Multivariate random-effects model

Suppose there are p treatments in contract to a reference treatment, and let yir be

(15)

0.0 0.1 0.2 0.3 0.4 0.5

0.5

0.6

0.7

0.8

0.9

1.0

False Positive Rate

Sensitivity

●

data

summary estimate

SROC Approximate CR Exact CR

Figure 1: Approximated and exact confidence regions (CRs) and summary receiver operating characteristics (SROC) curve.

consider the following multivariate random-effects model:

yi ∼N(θi, Si), θi ∼N(β,Σ), i= 1, . . . , n. (9)

where yi = (yi1, . . . , yip)t, θi = (θi1, . . . , θip)t is a vector of true treatment effects in

the ith study, β = (β1, . . . , βp)t is a vector of average treatment effects, and Si is

the within-study variance-covariance matrix. Here we focus on the model (9) known

as the contrast-based model (Salanti et al., 2008; Dias and Ades, 2016), which is

commonly used in practice.

In network meta-analysis, each study contains onlypi(< p) treatments (pi usually

ranges from 2 to 5); thereby, several components in yi cannot be defined. When

the corresponding treatments are not involved in the ith study, the corresponding

components in yi and Si are shrunk to the sub-vector and sub-matrix, respectively,

in the model (9). Moreover, when the references treatment is not involved in theith

(16)

a quasi-small data set is added to the reference arm, e.g. 0.001 events for 0.01 patients.

To clarify the setting in whichyiandSi are shrunk to the sub-vector and sub-matrix,

respectively, we introduce an index aij ∈ {1, . . . , p}, j = 1, . . . , pi, representing the

treatment estimates that are available in theith study, and define the p-dimensional

vectorxij of 0’s, excluding theaijth element that is equal to 1. Moreover, we define

Xi = (xi1, . . . , xipi)

t_{, and}_y

i andSi are the shrunkenpi-dimensional vector andpi×pi

matrix ofyi and Si, respectively. The model (9) can be rewritten as

yi∼N(Xiθi, Si), θi ∼N(β,Σ), i= 1, . . . , n. (10)

Regarding the structure of between study variance Σ, since there are rarely enough

studies to identify the unstructured model of Σ, the compound symmetry structure

Σ = τ2P(0.5) is used in most cases (White, 2015), where P(ρ) is a matrix with all

diagonal elements equal to 1 and all off-diagonal elements equal toρ.

We define y = (yt₁, . . . , ynt)t, X = (X1t, . . . , Xnt)t, Z = diag(X1, . . . , Xn), and

u= (ut

1, . . . , utn)twithui∼N(0, τ2P(0.5)) independently for eachi. The hierarchical

model (10) can be expressed as the following random-effects model:

y=Xβ+Zu+ε, (11)

whereε_∼N(0, S) with S = diag(S1, . . . , Sn). The unknown parameters in (11) are

β and τ2_{. The log-likelihood of the model (11) is given by}

L(β, τ2) =₋1

2log|V(τ

2₎

| −1₂(y₋Xβ)tV(τ2)−1(y₋Xβ),

whereV(τ2) =τ2Z_{In⊗P(0.5)}Zt+S and ⊗denotes the Kronecker product. Note

thaty is anN-dimensional vector and N =Pn_I₌₁pi is the total number of

(17)

5.2 Exact confidence interval of the average treatment effect

In network meta-analysis, we are interested in not only the average treatment effects

β1, . . . , βp in contrast to the reference treatment, but also the treatment differences

βj −βk, j 6= k. To handle these issues in a unified manner, we focus on the linear

combination η = ct_β _{with known vector} _c_{. Define a full-rank} _p_×_p _matrix _A _such

that the first element of Aβ is η. The parameter η is equivalent to β1 when we use

XA−1 _{instead of}_X_{in the model (11), so that it is sufficient to consider an exact test}

and confidence interval ofβ1.

DefineW1 andW2 to beN×1 andN×(p−1) matrices such thatX= (W1, W2),

andω = (β2, . . . , βp)t. The model (11) can be rewritten as

y=W1β1+W2ω+Zu+ε. (12)

We first consider testing of H0 :β1=β10, noting that ω and τ2 are nuisance

param-eters. The LRT statistics can be defined as

Tβ10(Y) = min

β1,ω,τ2

L(β1, ω, τ2)−min

ω,τ2L(β10, ω, τ

2₎_,

where

L(β1, ω, τ2) = log|V(τ2)|+ (y−W1β1−W2ω)tV(τ2)−1(y−W1β1−W2ω).

Under H0, the constrained maximum likelihood estimator ωbc and τbc2 satisfy the

fol-lowing equations:

W₂tV(τ_b_c2)−1r(β10,ωbc) = 0

trV(τ_b_c2)−1Q ₋r(β10,bωc)tV(τbc2)−1QV(τbc2)−1r(β10,ωbc) = 0,

(13)

(18)

ofV(τ2) such thatA(τ2)A(τ2)t₌_V₍_τ2_{). The equation (13) can be rewritten as}

solution of the above equation with respect to τ2. Hence, the solutions of (14) with

respect toω and τ2 are given byω∗(u) =ω∗(u, τ∗2(u)) andτ∗2(u), respectively.

Con-cerning the weight functionw(U), from the implicit function theorem, it follows that

w(U) =

On the other hand, because derivation of analytical expressions of the partial

deriva-tives with respect to τ2 or _bτc2 requires tedious algebraic calculation, we can use

(19)

compute the exactp-value of LRT, and the exact confidence interval can be obtained

as well by inverting the LRT.

5.3 Example: Schizophrenia data

Ades et al. (2010) carried out a network meta-analysis of antipsychotic medication

for prevention of relapse of schizophrenia; this analysis includes 15 trials comparing

eight treatments with placebo. In each trial, the outcomes available were the four

outcome states at the end of follow-up: relapse, discontinuation of treatment due to

intolerable side effects and other reasons, not reaching any of the three endpoints,

and still in remission. We here considered the last outcome and adopted the odds

ratio as the treatment effect measure.

We applied the multivariate random-effects model (11) with a compound

symme-try structure of Σ. The reference treatment was set to “Placebo”. The estimates of

between-studies standard deviation τ were 0.28 for the ML and 0.52 for the REML

estimation methods, respectively, which shows that there is substantial heterogeneity

between studies.

In Table 3 we present the results of three confidence intervals based on the EX

method, the LR-based method withp-value calculated by the asymptotic distribution,

and the REML method. The number of Monte Carlo samples in applying the EX

method was consistently set to 20000. In this analysis, the confidence intervals of

the EX method were much wider than those of the LR method, and indicated larger

uncertainty in the inference of average treatment effects. On the other hand, the

confidence intervals of the REML method were wider than those of the EX method

in the last three treatments although the REML method produced narrower intervals

than the EX method in the first five treatments; this may be due to the difference

between the ML and REML estimates of between study standard deviation (0.28 for

(20)

Table 3: Maximum likelihood (ML) and restricted maximum likelihood (REML) estimates of average treatment effects and confidence intervals from exact (EX), like-lihood ratio (LR) and REML methods in the application to network-meta analysis of schizophrenia data.

Placebo v.s. ML EX LR REML REML

Olanzapine 4.91 (2.11, 9.85) (2.67, 8.55) 4.52 (2.15, 9.50) Amisulpride 3.38 (1.19, 9.54) (1.56, 7.30) 3.36 (1.22, 9.31) Zotepine 2.66 (0.67, 13.10) (0.91, 7.74) 2.66 (0.68, 10.32) Aripiprazole 2.07 (0.66, 6.52) (0.93, 4.58) 2.07 (0.67, 6.38)

Ziprasidone 5.03 (1.84, 14.14) (2.32, 10.73) 4.90 (1.81, 13.26) Paliperidone 2.08 (0.65, 6.66) (0.90, 4.81) 2.08 (0.65, 6.67)

Haloperidol 2.65 (1.13, 5.74) (1.26, 5.12) 2.36 (0.94, 5.95) Risperidone 5.46 (2.52, 12.88) (2.40, 11.82) 5.05 (1.74, 14.64)

6 Discussions

We developed an exact method for constructing confidence intervals of the average

treatment effects in random-effects meta-analysis. The proposed confidence interval

is based on the LRT, and we proposed a Monte Carlo method to compute its exact

p-value without using large sample approximation. In terms of specific applications,

we discussed three types of analysis, univariate analysis, diagnostic

meta-analysis, and network meta-meta-analysis, and demonstrated the usefulness of the proposed

method. The developed exact inference method can be adapted to a variety of

ap-plications, e.g., the multivariate individual participant data meta-analysis (Burke et

al., 2016).

Although we considered exact confidence intervals based on the LRT and the

max-imum likelihood estimators for model parameters, other statistical methods might be

adopted in a similar manner to produce exact confidence intervals. For example,

the REML estimator of heterogeneity variance-covariance can be used instead of the

maximum likelihood estimator since the REML estimator is also sufficient statistics

based on the fact that the restricted likelihood can be derived as the marginal

likeli-hood integrated with respect to mean parameters or regression coefficients (Harville,

1974). Moreover, we could adopt other testing procedures, e.g., the Wald test and

(21)

because they use the same principle of theoretical justification of exactness.

In addition, the numerical results from our simulations and the illustrative

exam-ples suggest that statistical methods in the random-effects models should be selected

carefully in practice. Historically, there have been many discrepant results between

meta-analyses and subsequent large randomized clinical trials (LeLorier et al, 1997),

and in these cases the meta-analyses have typically tended to provide false results as

in the magnesium example in Section 3.4. Many systematic biases, for example,

pub-lication bias (Begg and Berlin, 1988; Easterbrook et al., 1991), might be important

sources of these discrepancies, but we should also be aware of the risk of providing

overconfident and misleading interpretations caused by the statistical methods based

on large sample approximations. Considering these risks, accurate inference methods

would be preferred in practice. Although there have not been any exact inference

methods that can be broadly applied in random-effects meta-analyses, our approach

in this article may provide an explicit solution to this relevant problem.

Finally, methodological research on extensions of random-effects meta-analyses to

more complicated statistical models are still in progress (e.g., multivariate network

meta-analyses, Riley et al., 2017), and the small sample problems generally exist

in most of these applications. Our methods are applicable to these complicated

models as well as more advanced approaches that might arise in future research.

The developed methods should be effective tools as a unified exact methodological

framework to obtain accurate solutions in medical evidence synthesis.

Acknowledgements

This research was partially supported by CREST (grant number: JPMJCR1412),

JST and JSPS KAKENHI Grant Numbers JP17K19808, JP15K15954, JP16H07406.

References

[1] Ades, A. E., Mavranezouli, I., Dias, S., Welton, N. J., Whittington, C. and

(22)

in Health, 13, 976-983.

[2] Begg, C. and Berlin, J. A. (1988). Publication bias: a problem in interpreting

medical data. Journal of the Royal Statistical Society: Series A, 151, 419-463.

[3] Brockwell, S. E. and Gordon, I. R. (2001). A comparison of statistical methods

for meta-analysis.Statistics in Medicine, 20, 825-840.

[4] Brockwell, S. E. and Gordon, I. R. (2007). A simple method for inference on an

overall effect in meta-analysis. Statistics in Medicine, 26, 4531-4543.

[5] Burden, R. L. and Faires, J. D. (2010).Numerical Analysis, 9th Edition,

Brooks-Cole Publishing.

[6] Burke, D. L., Ensor, J. and Riley, R. D. (2016). Meta-analysis using individual

participant data: one-stage and two-stage approaches, and why they may differ.

Statistics in Medicine, 36, 855-875.

[7] DerSimonian, R and Laird, N. M. (1986). Meta-analysis in clinical trials.

Con-trolled Clinical Trials, 7, 177-188.

[8] Dias, S. and Ades, A. E. (2016). Absolute or relative effects? Arm-based

syn-thesis of trial data. Research Synthesis Methods, 7, 23-28.

[9] Easterbrook, P. J., Gopalan, R., Berlin, J. A. and Matthews, D. R. (1991).

Publication bias in clinical research. The Lancet, 337, 867-872.

[10] Guolo, A. (2012). Higher-order likelihood inference in analysis and

meta-regression. Statistics in Medicine, 31, 313-327.

[11] Harbord, R. M., Deeks, J. J., Egger, H., Whiting, P. and Sterne. J. A. C.

(2007). A unification of models for meta-analysis of diagnostic accuracy studies.

Biostatistics, 8, 239-251.

[12] Hardy, R. J. and Thompson, S.G. (1996). A likelihood approach to

(23)

[13] Harville, D. A. (1974). Bayesian inference for variance components using only

error contrasts. Biometrika, 61, 383-385.

[14] Henmi, M and Copas J. (2010). Confidence intervals for random effects

meta-analysis and robustness to publication bias. Statistics in Medicine, 29,

2969-2983.

[15] Higgins, J. P. T. and Green, S. (2011) Cochrane Handbook for Systematic

Reviews of Interventions, Version 5.1.0 (The Cochrane Collaboration, Oxford).

[16] Jackson, D and Bowden, J. (2009). A re-evaluation of the ‘quantile

approxi-mation method’ for random effects meta-analysis. Statistics in Medicine, 28,

338-348.

[17] Jackson, D. and Riley, R. D. (2014). A refined method for multivariate

meta-analysis and meta-regression. Statistics in Medicine, 33, 541-554.

[18] Kriston, L., H¨oelzel, L., Weiser, A., Berner, M., and Haerter, M. (2008).

Meta-analysis: Are 3 Questions Enough to Detect Unhealthy Alcohol Use? Annals

of Internal Medicine, 149, 879-888.

[19] LeLorier, J., Gregoire, G., Benhaddad, A., Lapierre, J. and Derderian, F.

(1997). Discrepancies between meta-analyses and subsequent large randomized,

controlled trials. The New England Journal of Medicine, 337, 536-542.

[20] Lindqvist, B. H. and Taraldsen, G. (2005). Monte Carlo conditioning on a

sufficient statistic. Biometrika, 92, 451-464.

[21] Lockhart, R. A., O’reilly, F. J. and Stephens, M. A. (2007). Use of the Gibbs

sampler to obtain conditional tests, with applications. Biometrika, 94, 992-998.

[22] Lu, G. and Ades, A. E. (2006). Assessing evidence inconsistency in mixed

treat-ment comparisons. Journal of the American Statistical Association, 101,

(24)

[23] Noma, H. (2011). Confidence intervals for a random-effects meta-analysis based

on Bartlett-type corrections. Statistics in Medicine, 30, 3304-3312.

[24] Noma, H., Nagashima, K., Maruo, K., Gosho, M. and Furukawa, T. A. (2017).

Bartlett-type corrections and bootstrap adjustments of likelihood-based

infer-ence methods for network meta-analysis. Statistics in Medicine, to appear.

[25] Reitsma, J. B., Glas, A. S., Rutjes, A. W. S., Scholten, R. J. P. M., Bossuyt, P.

M. and Zwinderman, A. H. (2005). Bivariate analysis of sensitivity and

speci-ficity produces informative summary measures in diagnostic reviews. Journal

of Clinical Epidemiology, 58, 982-90.

[26] Riley, R. D., Jackson, D., Salanti, G., Burke, D. L., Price, M., Kirkham, J.

and White, I. R. (2017). Multivariate and network meta-analysis of multiple

outcomes and multiple treatments: rationale, concepts, and examples BMJ,

358, j3932.

[27] Riley, R. D., Lambert, P. C. and Abo-Zaid, G. (2010). Meta-analysis of

indi-vidual participant data: rationale, conduct, and reporting. BMJ, 340, c221.

[28] Salanti, G. (2012). Indirect and mixed-treatment comparison, network, or

multiple- treatments meta-analysis: many names, many benefits, many

con-cerns for the next generation evidence synthesis tool.Research Synthesis

Meth-ods, 3, 80-97.

[29] Salanti, G, Higgins, J. P., Ades, A. and Ioannidis, J. P. (2008). Evaluation

of networks of randomized trials. Statistical Methods in Medical Research, 17,

279-301.

[30] Sidik, K and Jonkman, J. N. (2002). A simple confidence interval for

meta-analysis.Statistics in Medicine, 21, 3153-3159.

[31] Teo, K. K., Yusuf, S., Collins, R., Held, P. H. and Peto, R. (1991). Effects of

intravenous magnesium in suspected acute myocardial infarction: overview of

(25)

[32] White, I. R. (2015). Network meta-analysis.The State Journal, 15, 951-985.

[33] White, I. R., Barrett, J. K., Jackson, D. and Higgins, J. P. T. (2012).

Con-sistency and inconCon-sistency in network meta-analysis: model estimation using

multivariate meta-regression.Research Synthesis Methods, 3, 111-125.

[34] Whitehead, A. and Whitehead, J. (1991). A general parametric approach to

the meta-analysis of randomized clinical trials. Statistics in Medicine, 10,

1665-1677.

[35] Yusuf, S., Peto, R., Lewis, J., Collins, R. and Sleight, P. (1985). Beta blockade

during and after myocardial infarction: an overview of the randomized trials.