Splicing - Creating New Distributions

CONTINUOUS MODELS

5.2 Creating New Distributions

5.2.6 Splicing

Another method for creating a new distribution is by splicing. This approach is similar to mixing in that it might be believed that two or more separate processes are responsible for generating the losses. With mixing, the various processes operate on subsets of the population. Once the subset is identified, a simple loss model suffices. For splicing, the processes differ with regard to the loss amount. That is, one model governs the behavior of losses in some interval of possible losses while other models cover the other intervals.

Definition 5.8 makes this precise.

Definition 5.8 A k-component spliced distribution has a density function that can be expressed as follows:

𝑓_𝑋(𝑥) =

⎧⎪

⎪⎨

⎪⎪

⎩

𝑎1𝑓1(𝑥), 𝑐0< 𝑥 < 𝑐1, 𝑎2𝑓2(𝑥), 𝑐1< 𝑥 < 𝑐2,

⋮ ⋮

𝑎_𝑘𝑓_𝑘(𝑥), 𝑐_𝑘−1< 𝑥 < 𝑐_𝑘.

For𝑗= 1,…, 𝑘, each𝑎_𝑗 >0and each𝑓_𝑗(𝑥)must be a legitimate density function with all probability on the interval(𝑐_𝑗−1, 𝑐_𝑗). Also,𝑎1+⋯+𝑎_𝑘= 1.

EXAMPLE 5.8

Demonstrate that Model 5 in Section 2.2 is a two-component spliced model.

The density function is 𝑓(𝑥) =

{

0.01, 0≤𝑥 <50, 0.02, 50≤𝑥 <75,

and the spliced model is created by letting𝑓₁(𝑥) = 0.02, 0 ≤ 𝑥 < 50, which is a uniform distribution on the interval from 0 to 50, and𝑓₂(𝑥) = 0.04,50 ≤ 𝑥 < 75, which is a uniform distribution on the interval from 50 to 75. The coefficients are then

𝑎1= 0.5and𝑎2= 0.5. □

It was not necessary to use density functions and coefficients, but this is one way to ensure that the result is a legitimate density function. When using parametric models, the motivation for splicing is that the tail behavior may be inconsistent with the behavior for small losses. For example, experience (based on knowledge beyond that available in the current, perhaps small, data set) may indicate that the tail follows the Pareto distribution, but there is a positive mode more in keeping with the lognormal or inverse Gaussian distributions. A second instance is when there is a large amount of data below some value but a limited amount of information elsewhere. We may want to use the empirical distribution (or a smoothed version of it) up to a certain point and a parametric model beyond that value. Definition 5.8 is appropriate when the break points𝑐₀,…, 𝑐_𝑘are known in advance.

Another way to construct a spliced model is to use standard distributions over the range from𝑐0to𝑐_𝑘. Let𝑔_𝑗(𝑥)be the𝑗th such density function. Then, in Definition 5.8, replace𝑓_𝑗(𝑥)with𝑔_𝑗(𝑥)∕[𝐺(𝑐_𝑗) −𝐺(𝑐_𝑗−1)]. This formulation makes it easier to have the break points become parameters that can be estimated.

0 0.002 0.004 0.006 0.008 0.01

0 50 100 150 200 250

f(x)

Pareto Exponential

Figure 5.1 The two-component spliced density.

Neither approach to splicing ensures that the resulting density function will be continuous (i.e. the components will meet at the break points). Such a restriction could be added to the specification.

EXAMPLE 5.9

Create a two-component spliced model using an exponential distribution from0to𝑐 and a Pareto distribution(using𝛾in place of𝜃)from𝑐to∞.

The basic format is

𝑓_𝑋(𝑥) =

⎧⎪

⎨⎪

⎩

𝑎₁𝜃⁻¹𝑒⁻^𝑥^∕^𝜃

1 −𝑒⁻^𝑐^∕^𝜃, 0< 𝑥 < 𝑐, 𝑎2

𝛼𝛾^𝛼(𝑥+𝛾)⁻^𝛼⁻¹

𝛾^𝛼(𝑐+𝛾)⁻^𝛼 , 𝑐 < 𝑥 <∞.

However, we must force the density function to integrate to 1. All that is needed is to let𝑎1=𝑣and𝑎2= 1 −𝑣. The spliced density function becomes

𝑓_𝑋(𝑥) =

⎧⎪

⎨⎪

⎩

𝑣 𝜃⁻¹𝑒⁻^𝑥^∕^𝜃

1 −𝑒⁻^𝑐^∕^𝜃, 0< 𝑥 < 𝑐, (1 −𝑣)𝛼(𝑐+𝛾)^𝛼

(𝑥+𝛾)^𝛼⁺¹, 𝑐 < 𝑥 <∞

, 𝜃, 𝛼, 𝛾, 𝑐 >0,0< 𝑣 <1.

Figure 5.1 illustrates this density function using the values 𝑐 = 100, 𝑣 = 0.6, 𝜃= 100,𝛾 = 200, and𝛼= 4. It is clear that this density is not continuous. □ 5.2.7 Exercises

5.1 Let 𝑋 have cdf 𝐹_𝑋(𝑥) = 1 − (1 +𝑥)⁻^𝛼, 𝑥, 𝛼 > 0. Determine the pdf and cdf of 𝑌 =𝜃𝑋.

5.2 (*) One hundred observed claims in 1995 were arranged as follows: 42 were between 0 and 300, 3 were between 300 and 350, 5 were between 350 and 400, 5 were between 400 and 450, 0 were between 450 and 500, 5 were between 500 and 600, and the remaining 40 were above 600. For the next three years, all claims are inflated by 10% per year. Based on the empirical distribution from 1995, determine a range for the probability that a claim exceeds 500 in 1998 (there is not enough information to determine the probability exactly).

5.3 Let 𝑋 have a Pareto distribution. Determine the cdf of the inverse, transformed, and inverse transformed distributions. Check Appendix A to determine if any of these distributions have special names.

5.4 Let𝑋have a loglogistic distribution. Demonstrate that the inverse distribution also has a loglogistic distribution. Therefore, there is no need to identify a separate inverse loglogistic distribution.

5.5 Let𝑌 have a lognormal distribution with parameters𝜇and𝜎. Let𝑍=𝜃𝑌. Show that 𝑍 also has a lognormal distribution and, therefore, the addition of a third parameter has not created a new distribution.

5.6 (*) Let𝑋have a Pareto distribution with parameters𝛼and𝜃. Let𝑌 = ln(1 +𝑋∕𝜃).

Determine the name of the distribution of𝑌 and its parameters.

5.7 Venter [124] notes that if 𝑋 has a transformed gamma distribution and its scale parameter𝜃has an inverse transformed gamma distribution (where the parameter𝜏is the same in both distributions), the resulting mixture has the transformed beta distribution.

Demonstrate that this is true.

5.8 (*) Let𝑁have a Poisson distribution with meanΛ. LetΛhave a gamma distribution with mean 1 and variance 2. Determine the unconditional probability that𝑁 = 1.

5.9 (*) Given a value ofΘ = 𝜃, the random variable 𝑋has an exponential distribution with hazard rate functionℎ(𝑥) = 𝜃, a constant. The random variable Θhas a uniform distribution on the interval(1,11). Determine𝑆_𝑋(0.5)for the unconditional distribution.

5.10 (*) Let𝑁have a Poisson distribution with meanΛ. LetΛhave a uniform distribution on the interval(0,5). Determine the unconditional probability that𝑁≥2.

5.11 (*) A two-point mixed distribution has, with probability 𝑝, a binomial distribution with parameters𝑚= 2 and𝑞 = 0.5, and with probability1 −𝑝, a binomial distribution with parameters𝑚= 4and𝑞= 0.5. Determine, as a function of𝑝, the probability that this random variable takes on the value 2.

5.12 Determine the probability density function and the hazard rate of the frailty distribution.

5.13 Suppose that𝑋|Λhas a Weibull survival function𝑆_𝑋|Λ(𝑥|𝜆) = 𝑒⁻^𝜆𝑥^𝛾,𝑥≥ 0, and Λhas an exponential distribution. Demonstrate that the unconditional distribution of𝑋is loglogistic.

5.14 Consider the exponential–inverse Gaussian frailty model with 𝑎(𝑥) = 𝜃

2√

1 +𝜃𝑥, 𝜃 >0.

(a) Verify that the conditional hazard rateℎ_𝑋|Λ(𝑥|𝜆)of𝑋|Λis indeed a valid hazard rate.

(b) Determine the conditional survival function𝑆_𝑋|Λ(𝑥|𝜆).

(c) If Λ has a gamma distribution with parameters 𝜃 = 1 and 𝛼 replaced by 2𝛼, determine the marginal or unconditional survival function of𝑋.

(d) Use (c) to argue that a given frailty model may arise from more than one combi- nation of conditional distributions of𝑋|Λand frailty distributions ofΛ.

5.15 Suppose that𝑋has survival function𝑆_𝑋(𝑥) = 1 −𝐹_𝑋(𝑥), given by (5.3). Show that 𝑆1(𝑥) = 𝐹_𝑋(𝑥)∕[E(Λ)𝐴(𝑥)]is again a survival function of the form given by (5.3), and identify the distribution ofΛassociated with𝑆1(𝑥).

5.16 Fix 𝑠 ≥ 0, and define an “Esscher-transformed” frailty random variable Λ_𝑠 with probability density function (or discrete probability mass function in the discrete case) 𝑓Λ_𝑠(𝜆) =𝑒⁻^𝑠𝜆𝑓Λ(𝜆)∕𝑀Λ(−𝑠), 𝜆≥0.

(a) Show thatΛ_𝑠has moment generating function

𝑀_Λ_𝑠(𝑧) =𝐸(𝑒^𝑧^Λ^𝑠) = 𝑀_Λ(𝑧−𝑠) 𝑀Λ(−𝑠) . (b) Define the cumulant generating function ofΛto be

𝑐Λ(𝑧) = ln[𝑀Λ(𝑧)], and use (a) to prove that

𝑐^′_Λ(−𝑠) =E(Λ_𝑠)and𝑐_Λ^′′(−𝑠) = Var(Λ_𝑠).

(c) For the frailty model with survival function given by (5.3), prove that the associated hazard rate may be expressed asℎ_𝑋(𝑥) =𝑎(𝑥)𝑐^′_Λ[−𝐴(𝑥)], where𝑐Λis defined in (b).

(d) Use (c) to show that

ℎ^′_𝑋(𝑥) =𝑎^′(𝑥)𝑐_Λ^′[−𝐴(𝑥)] − [𝑎(𝑥)]²𝑐_Λ^′′[−𝐴(𝑥)].

(e) Prove using (d) that if the conditional hazard rateℎ_𝑋|Λ(𝑥|𝜆)is nonincreasing in𝑥, thenℎ_𝑋(𝑥)is also nonincreasing in𝑥.

5.17 Write the density function for a two-component spliced model in which the density function is proportional to a uniform density over the interval from 0 to1,000 and is proportional to an exponential density function from1,000to∞. Ensure that the resulting density function is continuous.

5.18 Let𝑋have pdf𝑓(𝑥) = exp(−|𝑥∕𝜃|)∕2𝜃for−∞< 𝑥 <∞. Let𝑌 =𝑒^𝑋. Determine the pdf and cdf of𝑌.

5.19 (*) Losses in 1993 follow the density function𝑓(𝑥) = 3𝑥⁻⁴, 𝑥≥ 1, where𝑥is the loss in millions of dollars. Inflation of 10% impacts all claims uniformly from 1993 to 1994. Determine the cdf of losses for 1994 and use it to determine the probability that a 1994 loss exceeds 2,200,000.

5.20 Consider the inverse Gaussian random variable𝑋with pdf (from Appendix A) 𝑓(𝑥) =

√ 𝜃 2𝜋𝑥³exp

[

− 𝜃 2𝑥

(𝑥−𝜇 𝜇

)2]

, 𝑥 >0, where𝜃 >0and𝜇 >0are parameters.

(a) Derive the pdf of the reciprocal inverse Gaussian random variable1∕𝑋. (b) Prove that the “joint” moment generating function of𝑋and1∕𝑋is given by

𝑀(𝑧1, 𝑧2) =E

(𝑒^𝑧¹^𝑋⁺^𝑧²^𝑋⁻¹)

√ 𝜃 𝜃− 2𝑧2

exp

⎛⎜

⎜⎜

⎝ 𝜃−√(

𝜃− 2𝜇²𝑧₁) (

𝜃− 2𝑧₂) 𝜇

⎞⎟

⎟⎟

⎠ ,

where𝑧1< 𝜃∕( 2𝜇²)

and𝑧2< 𝜃∕2.

𝑒^𝑧𝑋)

= exp [𝜃

𝜇 (

1 −

√ 1 −2𝜇²

𝜃 𝑧 )]

, 𝑧 < 𝜃 2𝜇².

(d) Use (b) to show that the reciprocal inverse Gaussian random variable1∕𝑋has moment generating function

𝑀1∕𝑋(𝑧) =E

(𝑒^𝑧𝑋⁻¹)

√ 𝜃 𝜃− 2𝑧exp

[𝜃 𝜇

( 1 −

√ 1 −2

𝜃𝑧 )]

, 𝑧 < 𝜃 2.

Hence prove that1∕𝑋has the same distribution as𝑍1+𝑍2, where𝑍1has a gamma distribution,𝑍2has an inverse Gaussian distribution, and𝑍1is independent of𝑍2. Also, identify the gamma and inverse Gaussian parameters in this representation.

(e) Use (b) to show that

𝑍= 1 𝑋

(𝑋−𝜇 𝜇

has a gamma distribution with parameters𝛼 = ¹

2 and the usual parameter𝜃 (in Appendix A) replaced by2∕𝜃.

(f) For the mgf of the inverse Gaussian random variable𝑋in (c), prove by induction on𝑘that, for𝑘= 1,2,…, the𝑘th derivative of the mgf is

𝑀_𝑋⁽^𝑘⁾(𝑧) =𝑀_𝑋(𝑧)

𝑘−1

∑

𝑛=0

(𝑘+𝑛− 1)!

(𝑘−𝑛− 1)!𝑛! (1

2 )^𝑘+3𝑛₂

𝜃^𝑘⁻²^𝑛 ( 𝜃

2𝜇² −𝑧 )−^𝑘+𝑛

and hence that the inverse Gaussian random variable has integer moments E[𝑋^𝑘] =

𝑘∑−1 𝑛=0

(𝑘+𝑛− 1)!

(𝑘−𝑛− 1)!𝑛! 𝜇^𝑛⁺^𝑘

(2𝜃)^𝑛, 𝑘= 1,2,….

(g) The modified Bessel function,𝐾_𝜆(𝑥)may be defined, for half-integer values of the index parameter𝜆, by𝐾−𝜆(𝑥) =𝐾_𝜆(𝑥), together with

𝐾_𝑚₊¹

(𝑥) =

√𝜋 2𝑥𝑒⁻^𝑥

∑𝑚 𝑗=0

(𝑚+𝑗)!

(𝑚−𝑗)!𝑗! ( 1

2𝑥 )𝑗

, 𝑚= 0,1,….

Use part (f) to prove that, for𝛼 >0,𝜃 >0, and𝑚= 0,1,…,

∫

∞ 0

𝑥^𝑚⁻³²𝑒⁻^𝛼𝑥⁻^2𝑥^𝜃 𝑑𝑥= 2 ( 𝜃

2𝛼 )^𝑚₂−¹₄

𝐾_𝑚₋¹

(√2𝛼𝜃) .

5.3 Selected Distributions and Their Relationships

Dalam dokumen Book LOSS MODELS FROM DATA TO DECISIONS (Halaman 86-91)