CONTINUOUS MODELS
5.2 Creating New Distributions
5.2.6 Splicing
Another method for creating a new distribution is by splicing. This approach is similar to mixing in that it might be believed that two or more separate processes are responsible for generating the losses. With mixing, the various processes operate on subsets of the population. Once the subset is identified, a simple loss model suffices. For splicing, the processes differ with regard to the loss amount. That is, one model governs the behavior of losses in some interval of possible losses while other models cover the other intervals.
Definition 5.8 makes this precise.
Definition 5.8 A k-component spliced distribution has a density function that can be expressed as follows:
๐๐(๐ฅ) =
โงโช
โชโจ
โชโช
โฉ
๐1๐1(๐ฅ), ๐0< ๐ฅ < ๐1, ๐2๐2(๐ฅ), ๐1< ๐ฅ < ๐2,
โฎ โฎ
๐๐๐๐(๐ฅ), ๐๐โ1< ๐ฅ < ๐๐.
For๐= 1,โฆ, ๐, each๐๐ >0and each๐๐(๐ฅ)must be a legitimate density function with all probability on the interval(๐๐โ1, ๐๐). Also,๐1+โฏ+๐๐= 1.
EXAMPLE 5.8
Demonstrate that Model 5 in Section 2.2 is a two-component spliced model.
The density function is ๐(๐ฅ) =
{
0.01, 0โค๐ฅ <50, 0.02, 50โค๐ฅ <75,
and the spliced model is created by letting๐1(๐ฅ) = 0.02, 0 โค ๐ฅ < 50, which is a uniform distribution on the interval from 0 to 50, and๐2(๐ฅ) = 0.04,50 โค ๐ฅ < 75, which is a uniform distribution on the interval from 50 to 75. The coefficients are then
๐1= 0.5and๐2= 0.5. โก
It was not necessary to use density functions and coefficients, but this is one way to ensure that the result is a legitimate density function. When using parametric models, the motivation for splicing is that the tail behavior may be inconsistent with the behavior for small losses. For example, experience (based on knowledge beyond that available in the current, perhaps small, data set) may indicate that the tail follows the Pareto distribution, but there is a positive mode more in keeping with the lognormal or inverse Gaussian distributions. A second instance is when there is a large amount of data below some value but a limited amount of information elsewhere. We may want to use the empirical distribution (or a smoothed version of it) up to a certain point and a parametric model beyond that value. Definition 5.8 is appropriate when the break points๐0,โฆ, ๐๐are known in advance.
Another way to construct a spliced model is to use standard distributions over the range from๐0to๐๐. Let๐๐(๐ฅ)be the๐th such density function. Then, in Definition 5.8, replace๐๐(๐ฅ)with๐๐(๐ฅ)โ[๐บ(๐๐) โ๐บ(๐๐โ1)]. This formulation makes it easier to have the break points become parameters that can be estimated.
0 0.002 0.004 0.006 0.008 0.01
0 50 100 150 200 250
f(x)
x
Pareto Exponential
Figure 5.1 The two-component spliced density.
Neither approach to splicing ensures that the resulting density function will be contin- uous (i.e. the components will meet at the break points). Such a restriction could be added to the specification.
EXAMPLE 5.9
Create a two-component spliced model using an exponential distribution from0to๐ and a Pareto distribution(using๐พin place of๐)from๐toโ.
The basic format is
๐๐(๐ฅ) =
โงโช
โจโช
โฉ
๐1๐โ1๐โ๐ฅโ๐
1 โ๐โ๐โ๐, 0< ๐ฅ < ๐, ๐2
๐ผ๐พ๐ผ(๐ฅ+๐พ)โ๐ผโ1
๐พ๐ผ(๐+๐พ)โ๐ผ , ๐ < ๐ฅ <โ.
However, we must force the density function to integrate to 1. All that is needed is to let๐1=๐ฃand๐2= 1 โ๐ฃ. The spliced density function becomes
๐๐(๐ฅ) =
โงโช
โจโช
โฉ
๐ฃ ๐โ1๐โ๐ฅโ๐
1 โ๐โ๐โ๐, 0< ๐ฅ < ๐, (1 โ๐ฃ)๐ผ(๐+๐พ)๐ผ
(๐ฅ+๐พ)๐ผ+1, ๐ < ๐ฅ <โ
, ๐, ๐ผ, ๐พ, ๐ >0,0< ๐ฃ <1.
Figure 5.1 illustrates this density function using the values ๐ = 100, ๐ฃ = 0.6, ๐= 100,๐พ = 200, and๐ผ= 4. It is clear that this density is not continuous. โก 5.2.7 Exercises
5.1 Let ๐ have cdf ๐น๐(๐ฅ) = 1 โ (1 +๐ฅ)โ๐ผ, ๐ฅ, ๐ผ > 0. Determine the pdf and cdf of ๐ =๐๐.
5.2 (*) One hundred observed claims in 1995 were arranged as follows: 42 were between 0 and 300, 3 were between 300 and 350, 5 were between 350 and 400, 5 were between 400 and 450, 0 were between 450 and 500, 5 were between 500 and 600, and the remaining 40 were above 600. For the next three years, all claims are inflated by 10% per year. Based on the empirical distribution from 1995, determine a range for the probability that a claim exceeds 500 in 1998 (there is not enough information to determine the probability exactly).
5.3 Let ๐ have a Pareto distribution. Determine the cdf of the inverse, transformed, and inverse transformed distributions. Check Appendix A to determine if any of these distributions have special names.
5.4 Let๐have a loglogistic distribution. Demonstrate that the inverse distribution also has a loglogistic distribution. Therefore, there is no need to identify a separate inverse loglogistic distribution.
5.5 Let๐ have a lognormal distribution with parameters๐and๐. Let๐=๐๐. Show that ๐ also has a lognormal distribution and, therefore, the addition of a third parameter has not created a new distribution.
5.6 (*) Let๐have a Pareto distribution with parameters๐ผand๐. Let๐ = ln(1 +๐โ๐).
Determine the name of the distribution of๐ and its parameters.
5.7 Venter [124] notes that if ๐ has a transformed gamma distribution and its scale parameter๐has an inverse transformed gamma distribution (where the parameter๐is the same in both distributions), the resulting mixture has the transformed beta distribution.
Demonstrate that this is true.
5.8 (*) Let๐have a Poisson distribution with meanฮ. Letฮhave a gamma distribution with mean 1 and variance 2. Determine the unconditional probability that๐ = 1.
5.9 (*) Given a value ofฮ = ๐, the random variable ๐has an exponential distribution with hazard rate functionโ(๐ฅ) = ๐, a constant. The random variable ฮhas a uniform distribution on the interval(1,11). Determine๐๐(0.5)for the unconditional distribution.
5.10 (*) Let๐have a Poisson distribution with meanฮ. Letฮhave a uniform distribution on the interval(0,5). Determine the unconditional probability that๐โฅ2.
5.11 (*) A two-point mixed distribution has, with probability ๐, a binomial distribution with parameters๐= 2 and๐ = 0.5, and with probability1 โ๐, a binomial distribution with parameters๐= 4and๐= 0.5. Determine, as a function of๐, the probability that this random variable takes on the value 2.
5.12 Determine the probability density function and the hazard rate of the frailty distribu- tion.
5.13 Suppose that๐|ฮhas a Weibull survival function๐๐|ฮ(๐ฅ|๐) = ๐โ๐๐ฅ๐พ,๐ฅโฅ 0, and ฮhas an exponential distribution. Demonstrate that the unconditional distribution of๐is loglogistic.
5.14 Consider the exponentialโinverse Gaussian frailty model with ๐(๐ฅ) = ๐
2โ
1 +๐๐ฅ, ๐ >0.
(a) Verify that the conditional hazard rateโ๐|ฮ(๐ฅ|๐)of๐|ฮis indeed a valid hazard rate.
(b) Determine the conditional survival function๐๐|ฮ(๐ฅ|๐).
(c) If ฮ has a gamma distribution with parameters ๐ = 1 and ๐ผ replaced by 2๐ผ, determine the marginal or unconditional survival function of๐.
(d) Use (c) to argue that a given frailty model may arise from more than one combi- nation of conditional distributions of๐|ฮand frailty distributions ofฮ.
5.15 Suppose that๐has survival function๐๐(๐ฅ) = 1 โ๐น๐(๐ฅ), given by (5.3). Show that ๐1(๐ฅ) = ๐น๐(๐ฅ)โ[E(ฮ)๐ด(๐ฅ)]is again a survival function of the form given by (5.3), and identify the distribution ofฮassociated with๐1(๐ฅ).
5.16 Fix ๐ โฅ 0, and define an โEsscher-transformedโ frailty random variable ฮ๐ with probability density function (or discrete probability mass function in the discrete case) ๐ฮ๐ (๐) =๐โ๐ ๐๐ฮ(๐)โ๐ฮ(โ๐ ), ๐โฅ0.
(a) Show thatฮ๐ has moment generating function
๐ฮ๐ (๐ง) =๐ธ(๐๐งฮ๐ ) = ๐ฮ(๐งโ๐ ) ๐ฮ(โ๐ ) . (b) Define the cumulant generating function ofฮto be
๐ฮ(๐ง) = ln[๐ฮ(๐ง)], and use (a) to prove that
๐โฒฮ(โ๐ ) =E(ฮ๐ )and๐ฮโฒโฒ(โ๐ ) = Var(ฮ๐ ).
(c) For the frailty model with survival function given by (5.3), prove that the associated hazard rate may be expressed asโ๐(๐ฅ) =๐(๐ฅ)๐โฒฮ[โ๐ด(๐ฅ)], where๐ฮis defined in (b).
(d) Use (c) to show that
โโฒ๐(๐ฅ) =๐โฒ(๐ฅ)๐ฮโฒ[โ๐ด(๐ฅ)] โ [๐(๐ฅ)]2๐ฮโฒโฒ[โ๐ด(๐ฅ)].
(e) Prove using (d) that if the conditional hazard rateโ๐|ฮ(๐ฅ|๐)is nonincreasing in๐ฅ, thenโ๐(๐ฅ)is also nonincreasing in๐ฅ.
5.17 Write the density function for a two-component spliced model in which the density function is proportional to a uniform density over the interval from 0 to1,000 and is proportional to an exponential density function from1,000toโ. Ensure that the resulting density function is continuous.
5.18 Let๐have pdf๐(๐ฅ) = exp(โ|๐ฅโ๐|)โ2๐forโโ< ๐ฅ <โ. Let๐ =๐๐. Determine the pdf and cdf of๐.
5.19 (*) Losses in 1993 follow the density function๐(๐ฅ) = 3๐ฅโ4, ๐ฅโฅ 1, where๐ฅis the loss in millions of dollars. Inflation of 10% impacts all claims uniformly from 1993 to 1994. Determine the cdf of losses for 1994 and use it to determine the probability that a 1994 loss exceeds 2,200,000.
5.20 Consider the inverse Gaussian random variable๐with pdf (from Appendix A) ๐(๐ฅ) =
โ ๐ 2๐๐ฅ3exp
[
โ ๐ 2๐ฅ
(๐ฅโ๐ ๐
)2]
, ๐ฅ >0, where๐ >0and๐ >0are parameters.
(a) Derive the pdf of the reciprocal inverse Gaussian random variable1โ๐. (b) Prove that the โjointโ moment generating function of๐and1โ๐is given by
๐(๐ง1, ๐ง2) =E
(๐๐ง1๐+๐ง2๐โ1)
=
โ ๐ ๐โ 2๐ง2
exp
โโ
โโ
โ ๐โโ(
๐โ 2๐2๐ง1) (
๐โ 2๐ง2) ๐
โโ
โโ
โ ,
where๐ง1< ๐โ( 2๐2)
and๐ง2< ๐โ2.
(c) Use (b) to show that the moment generating function of๐is ๐๐(๐ง) =๐ธ(
๐๐ง๐)
= exp [๐
๐ (
1 โ
โ 1 โ2๐2
๐ ๐ง )]
, ๐ง < ๐ 2๐2.
(d) Use (b) to show that the reciprocal inverse Gaussian random variable1โ๐has moment generating function
๐1โ๐(๐ง) =E
(๐๐ง๐โ1)
=
โ ๐ ๐โ 2๐งexp
[๐ ๐
( 1 โ
โ 1 โ2
๐๐ง )]
, ๐ง < ๐ 2.
Hence prove that1โ๐has the same distribution as๐1+๐2, where๐1has a gamma distribution,๐2has an inverse Gaussian distribution, and๐1is independent of๐2. Also, identify the gamma and inverse Gaussian parameters in this representation.
(e) Use (b) to show that
๐= 1 ๐
(๐โ๐ ๐
)2
has a gamma distribution with parameters๐ผ = 1
2 and the usual parameter๐ (in Appendix A) replaced by2โ๐.
(f) For the mgf of the inverse Gaussian random variable๐in (c), prove by induction on๐that, for๐= 1,2,โฆ, the๐th derivative of the mgf is
๐๐(๐)(๐ง) =๐๐(๐ง)
๐โ1
โ
๐=0
(๐+๐โ 1)!
(๐โ๐โ 1)!๐! (1
2 )๐+3๐2
๐๐โ2๐ ( ๐
2๐2 โ๐ง )โ๐+๐
2
and hence that the inverse Gaussian random variable has integer moments E[๐๐] =
๐โโ1 ๐=0
(๐+๐โ 1)!
(๐โ๐โ 1)!๐! ๐๐+๐
(2๐)๐, ๐= 1,2,โฆ.
(g) The modified Bessel function,๐พ๐(๐ฅ)may be defined, for half-integer values of the index parameter๐, by๐พโ๐(๐ฅ) =๐พ๐(๐ฅ), together with
๐พ๐+1
2
(๐ฅ) =
โ๐ 2๐ฅ๐โ๐ฅ
โ๐ ๐=0
(๐+๐)!
(๐โ๐)!๐! ( 1
2๐ฅ )๐
, ๐= 0,1,โฆ.
Use part (f) to prove that, for๐ผ >0,๐ >0, and๐= 0,1,โฆ,
โซ
โ 0
๐ฅ๐โ32๐โ๐ผ๐ฅโ2๐ฅ๐ ๐๐ฅ= 2 ( ๐
2๐ผ )๐2โ14
๐พ๐โ1
2
(โ2๐ผ๐) .
5.3 Selected Distributions and Their Relationships