Directory UMM :Data Elmu:jurnal:I:Insurance Mathematics And Economics:Vol26.Issue2-3.2000:

(1)

Credibility using semiparametric models and a loss function with a

constancy penalty

Virginia R. Young

School of Business, University of Wisconsin-Madison, Madison, WI 53706, USA Received 1 June 1998; received in revised form 1 September 1999; accepted 24 November 1999

Abstract

In credibility ratemaking, one seeks to estimate the conditional mean of a given risk. The most accurate estimator (as measured by squared error loss) is the predictive mean. To calculate the predictive mean one needs the conditional distribution of losses given the parameter of interest (often the conditional mean) and the prior distribution of the parameter of interest. Young (1997. ASTIN Bulletin 27, 273–285) uses kernel density estimation to estimate the prior distribution of the conditional mean. She illustrates her method with simulated data from a mixture of a lognormal conditional over a lognormal prior and finds that the estimated predictive mean is more accurate than the linear Bühlmann credibility estimator. However, generally, in her example, the estimated predictive mean was more accurate only up to the 95th percentile of the marginal distribution of claims. Beyond that point, the credibility estimator occasionally diverged widely from the true predictive mean.

To reduce this divergence, we propose using the loss function of Young and De Vylder (2000. North American Actuarial Journal, 4(1), 107–113). Their loss function is a linear combination of a squared-error term and a term that encourages the estimator to be close to constant, especially in the tails of the distribution of claims, where Young (1997) noted the difficulty with her semiparametric approach. We show that by using this loss function, the problem of upward divergence noted in Young (1997) is reduced. We also provide a simple routine for minimizing the loss function, based on the discussion of De Vylder in Young (1998a. North American Actuarial Journal 2, 101–117). © 2000 Elsevier Science B.V. All rights reserved.

Keywords: Kernel density estimation; Claim estimation; Bayesian estimation

1. Introduction

In credibility ratemaking, one seeks to estimate the conditional mean of a given risk. The most accurate estimator (as measured by squared error loss) is the predictive mean. To calculate the predictive mean one needs a conditional distribution of losses given the parameter of interest (often the conditional mean) and a prior distribution of the parameter of interest. In this paper, we use a semiparametric mixture model to represent the insurance losses of a portfolio of risks: We choose a parametric conditional loss distribution for each risk with unknown conditional mean that varies across the risks. This conditional distribution may depend on parameters other than the mean, and we use the data to estimate those parameters. Then, we apply techniques from non-parametric density estimation to estimate the distribution of the conditional means. This method has appeared earlier in Young (1997, 1998a), but we summarize it in Section 2 to make this paper complete.

E-mail address: [email protected] (V.R. Young)

(2)

When Young (1997) applied this semiparametric method to data simulated from a lognormal–lognormal mixture, the resulting predictive mean occasionally diverged upward from the true predictive mean for large claims. To reduce this divergence, we propose using the loss function of Young and De Vylder (2000). Their loss function is a linear combination of a squared-error term and a term that encourages the estimator to be close to constant, especially in the tails of the distribution of claims, where Young (1997) observed the difficulty with her semiparametric approach. We show that by using this loss function, the problem of upward divergence problem noted in Young (1997) is reduced. We also provide a simple numerical routine for minimizing the loss function, based on the discussion of De Vylder in Young (1998a). See Section 3 for a discussion of this loss function and its minimizing solution.

In Section 4, we apply our methodology to simulated data from a mixture of a lognormal conditional over a lognormal prior, as in Young (1997). We show that our method can reduce the divergence between the estimated and true predictive means, even when we use a gamma conditional instead of a lognormal conditional. We summarize our work in Section 5.

2. Semiparametric mixture model

2.1. Notation and assumptions

Assume that the underlying claim of risk i per unit of exposure is a conditional random variable Y|θi, i=1, 2,. . .,I, with probability density function f(y|θi). For each of the I risks, one observes the average claims per unit of exposurexxxi=(xi1, xi2,. . .,xiTi) with an associated exposure vectorwwwi=(wi1, wi2,. . .,wiTi), i=1, 2,. . ., I. Thus,

the observed average claim xit is the arithmetic average of wit claims, each of which is an independent realization of the conditional random variable Y|θi. For example, if a risk is a class of homogeneous policies, then xit may be the average claim per policy in the tth policy period of the ith risk class, and wit may be the number of policies in the ith class during the tth policy period.

Assume that the parameterθis the conditional mean, E[Y|θ]₌θ. Assume that parameters, other than the conditional mean, are fixed across the risks. The loss distribution of a given risk is, therefore, characterized by its conditional mean, although that mean is generally unknown. Denote the probability density function ofθbyπ(θ), also called the structure function (Bühlmann, 1967, 1970). The structure function characterizes how the conditional meanθ

varies from risk to risk.

The goal of credibility theory is to estimate the conditional mean E[Y|θ] of a risk, given that the risk’s claim experience isxxx and exposure iswww. As in Young (1997), set the credibility formula equal to the predictive mean

E[Y_{| ¯}x] given the weighted sample averagex_¯, weighted by the exposurewww. Also restrict attention to parametric conditional distributions for which E[Y|θ]₌θ,∗ the sample mean is a sufficient statistic forθ, and the functional form of f(y|θ) is closed under averaging.† Families of densities that satisfy these properties are (1) the normal, with meanθand fixed varianceσ2, (2) the gamma, with meanθ₌α/βand fixed shape parameterα, and (3) the inverse gaussian, with meanθand fixedλ₌θ3/Var[X_|θ].

2.2. Kernel density estimation

Young (1997) uses kernel density estimation (Silverman, 1986) to estimate the probability density functionπ(θ). A kernel K acts as a weight function and satisfies the condition R_−∞∞ K(t )dt ₌ 1. If one were to observe the conditional meansθ1,θ2,. . .,θI, then the kernel density estimate ofπ(θ) with kernel K would be given by

∗_{Thus, the predictive mean equals the posterior mean; that is,}_E_[_Y_{| ¯}_x_]₌_E_[_θ_{| ¯}_x_].

(3)

1

in which hi is a positive parameter called the window width. Assume that the kernel is symmetric about zero. Since one observes only dataxxxiandwwwiand not the true conditional meansθi, one might use the sample meanx¯i to estimateθi consistently, i=1, 2,. . ., I, (Serfling, 1980).‡ In the expression in (2.1), one might wish to weight the terms in the sum according to the relative number of claims for the ith risk so that the expectation ofθ is the sample mean

When Young (1997) applied this method to data simulated from a lognormal–lognormal mixture, the resulting predictive mean occasionally diverged upward from the true predictive mean for large claims. By constraining the estimator to be close to a constant, one can reduce this problem. In the next section, we present a loss function that is a linear combination of a squared-error penalty and a first-derivative penalty. The first-derivative penalty constrains the credibility estimator so that it is not too large when the claim experience is large.

3. Credibility formula using a constancy penalty

We propose solving the following optimization problem to blend the goals of accuracy and of constancy, especially constancy in the tails of the claim distribution: Find the credibility estimator d : [a, b]_→RRRthat minimizes the loss

Z b

in whichλis a positive constant,fˆis the estimated marginal pdf ofx_¯, andµ_ˆ is the estimated predictive mean. Specifically,µ(_ˆ x)_¯ ₌RE[Y_|θ]π (θ_ˆ _{| ¯}x)dθ _{= ˆ}E[θ_{| ¯}x]. The squared-error term constrains the credibility formula d to be accurate, while the first-derivative penalty constrains d to be close to a constant and, thus, not be too large for large claims.

If one letsλapproach 0, note that the optimal d is the predictive mean. At the other extreme, if one letsλapproach

∞, then the optimal d converges to the estimated grand mean. Thus, asλincreases, the penalty on large claims decreases and the amount of subsidy from those with smaller claims increases. Also, as noted by Young and De Vylder (2000), the resulting credibility estimator d is unbiased for allλ.

We first present a method for minimizing (3.1) and then we suggest two criteria for choosing the penalty parameter

λ. The minimizing solution to (3.1) is the same as the solution d to the following boundary-value problem given by the Euler–Lagrange equations (Fox, 1987):

λd′′(x)_¯ ₋d(x)_¯ f (ˆ x)_¯ _{= − ˆ}µ(x)_¯ f (ˆx),_¯ (3.2a)

‡_{One problem with using the sample mean}

¯

xito estimateθiis that for small sample sizes, the sample mean might not give a good estimate of

(4)

forx_¯_∈[a, b], subject to the boundary conditions

d′(a)₌0₌d′(b). (3.2b)

To solve (3.2), we replace the differential equation with a difference equation and use linear algebra to solve the resulting difference equation (Milne, 1970; Keller, 1992). Partition the interval [a, b] into J₊1 equal subintervals,

tj=a+jδ, j=0, 1,. . ., J+1, in whichδ = (b−a)/(J +1). We also have (brief) occasion to use t₋1=a−δ, and tJ+2=a+(J+2)δ=b+δ. Define dj=d(tj),fˆj = ˆf (tj), andµˆj = ˆµ(tj), for j=0, 1,. . ., J+1. Similarly, define d−1

and dJ+2.

By replacing the derivatives with central divided differences, (3.2) becomes

λdj+1−2dj+dj−1

This system results in the following matrix equation

A⇀d=⇀r , (3.4)

in which A is a (J₊2)_×(J₊2) tri-diagonal matrix with diagonal entries

Aj j = − Solving (3.4) is straightforward by standard mathematical software. The time-consuming part is in calculatingfˆj andµ_ˆj, j=0, 1,. . ., J+1.

To choose the penalty parameterλ, one might setλso that some particular property of the credibility estimator d is attained. For example, if due to vagaries in the data, the estimated predictive mean grows at a rate faster than the claims (i.e., has derivative greater than one), then one might wish to chooseλto be the smallest nonnegative number such that maxj=1,... ,J(dj+1−dj−1)/2δ≤1. Alternatively, one might wish to constrain the credibility estimator to

be concave, i.e., have rate of growth that decreases with respect to claims. In this case, chooseλto be the smallest nonnegative number such that maxj=1,...,J(dj+1−2dj+dj−1)/δ2≤0. In the example in the next section, we use

both methods for determiningλ.

4. Simulated data from a lognormal–lognormal mixture

In this section, we assume that we are given individual claim data; that is, wit=1, for all risks i and policy periods

t, and X₌Y. We model the lognormal–lognormal mixture as follows:

f (x_|φ)₌ 1

in whichσ>0 is a known parameter, and

(5)

in whichµ>0 andτ>0 are known parameters. That is, (ln X)|φ_∼N(lnφ,σ2), and lnφ_∼N(lnµ,τ2). The marginal

predictive mean is a function of the statisticv

µ(x)₌E(Xn₊1|x)=exp For each simulation run, she simulated r₌100 values of φ, or risks. For each of the 100 risks, she simulated

ni=wi=5 claims. To estimate the distribution of the conditional means, she used kernel density estimation with the Epanechnikov kernel,

Also, she used a fixed window width h, but truncated it if the prior would have otherwise been positive for negative values ofθ; see Young (1997) for more details. Instead of assuming that the conditional is lognormal, she assumed that the coefficient of variation is constant from risk to risk and, therefore, fit a gamma conditional to each risk. She used the estimated prior density along with the gamma conditional to estimate the marginal density of X, as well as the predictive mean.

For n₌w₌1, she compared the estimated predictive mean, µ(x)_ˆ , with the true predictive mean, µ(x). The estimated predictive mean was fairly accurate up to the 95th percentile, namely 6500, but occasionally diverged upward for values of x above that amount, as seen in Fig. 1. There we display the estimated and the true predictive means up to the 99.9th percentile of X, namely 22 632, for one run of the data. In order to reduce this divergence and still have an unbiased credibility estimator, we solved (3.4) with a₌0, b₌22 632, J₌100. As a reasonableness check for J, we comparedR22 632

0 µ(t )ˆ f (t )ˆ dt =2364.4, with the estimated value

In order to constrain d so that its first central divided differences are no greater than 1, the smallest value ofλis 25.7. See Fig. 1 for the graph of the credibility estimator d whenλ₌25.7. Note that for x<6500, the three credibility estimators agree well, and for x>6500, the true predictive mean is estimated more closely by the solution to (3.4) than by the estimated predictive mean.

As an aside, the estimated predictive mean satisfies maxj₌1,... ,J((µˆj₊1−2µˆj + ˆµj₋1)/δ2)≤0, so choosing

(6)

Fig. 1. Credibility estimators.

and true predictive means than if we were to chooseλbased on second divided differences. In other cases, such as when the divergence is not too great, choosingλbased on the first-divided-difference criterion worked well also.

5. Summary

We have shown how to use a penalized square-error loss function to ‘smooth’ the estimated predictive mean from a semiparametric model. Specifically, we used a constancy penalty to prevent the estimated predictive mean from diverging too greatly from the true predictive mean. Because, in practice, one will not know the true predictive mean, we suggested two possible choices for the constancy parameterλ. The one that seems to give better results is the smallest nonnegativeλso that the first divided differences of the credibility estimator are less than or equal to 1.

Acknowledgements

I thank Professor De Vylder for suggesting the loss function in (3.1) and for encouraging me in this work.

References

Bühlmann, H., 1967. Experience rating and credibility. ASTIN Bulletin 4, 199–207. Bühlmann, H., 1970. Mathematical Models in Risk Theory. Springer, New York. Fox, C., 1987. An Introduction to the Calculus of Variations. Dover, New York.

Keller, H.B., 1992. Numerical Methods for Two-Point Boundary-Value Problems. Dover, New York.

Milne, W.E., 1970. Numerical Solution of Differential Equations, 2nd revised and enlarged edition. Dover, New York. Serfling, R.J., 1980. Approximation Theorems of Mathematical Statistics. Wiley, New York.

Silverman, B.W., 1986. Density Estimation for Statistics and Data Analysis. Chapman & Hall, London. Young, V.R., 1997. Credibility using semiparametric models. ASTIN Bulletin 27, 273–285.

Young, V.R., 1998a. Credibility using a loss function from spline theory: parametric models with a one-dimensional sufficient statistic, with discussion. North American Actuarial Journal 2, 101–117.

Young, V.R., 1998b. Robust Bayesian credibility using semiparametric models. ASTIN Bulletin 28, 187–203.