BASIC DISTRIBUTIONAL QUANTITIES
3.4 Tails of Distributions
Thetailof a distribution (more properly, the right tail) is the portion of the distribution corresponding to large values of the random variable. Understanding large possible loss values is important because these have the greatest effect on total losses. Random variables that tend to assign higher probabilities to larger values are said to be heavier tailed. Tail weight can be a relative concept (model A has a heavier tail than model B) or an absolute concept (distributions with a certain property are classified as heavy tailed). When choosing models, tail weight can help narrow the choices or can confirm a choice for a model.
3.4.1 Classification Based on Moments
Recall that in the continuous case, theπth raw moment for a random variable that takes on only positive values (like most insurance payment variables) is given byβ«0βπ₯ππ(π₯)ππ₯. Depending on the density function and the value of π, this integral may not exist (i.e.
it may be infinite). One way of classifying distributions is on the basis of whether all moments exist. It is generally agreed that the existence of all positive moments indicates a (relatively) light right tail, while the existence of only positive moments up to a certain value (or existence of no positive moments at all) indicates a heavy right tail.
EXAMPLE 3.9
Demonstrate that for the gamma distribution all positive moments exist but for the Pareto distribution they do not.
For the gamma distribution, the raw moments are ππβ² =
β«
β 0
π₯ππ₯πΌβ1πβπ₯βπ Ξ(πΌ)ππΌ ππ₯
=β«
β 0
(π¦π)π(π¦π)πΌβ1πβπ¦
Ξ(πΌ)ππΌ πππ¦, making the substitutionπ¦=π₯βπ
= ππ
Ξ(πΌ)Ξ(πΌ+π)<βfor allπ >0. For the Pareto distribution, they are
ππβ² =
β«
β 0
π₯π πΌππΌ (π₯+π)πΌ+1 ππ₯
=β«
β
π (π¦βπ)ππΌππΌ
π¦πΌ+1ππ¦, making the substitutionπ¦=π₯+π
=πΌππΌβ«
β π
βπ π=0
(π π )
π¦πβπΌβ1(βπ)πβπππ¦, for integer values ofπ.
The integral exists only if all of the exponents onπ¦in the sum are less thanβ1, that is, ifπβπΌβ 1<β1for allπor, equivalently, ifπ < πΌ. Therefore, only some moments
exist. β‘
By this classification, the Pareto distribution is said to have a heavy tail and the gamma distribution is said to have a light tail. A look at the moment formulas in Appendix A reveals which distributions have heavy tails and which do not, as indicated by the existence of moments.
It is instructive to note that if a distribution does not have all its positive moments, then it does not have a moment generating function (i.e. ifπ is the associated random variable, then E(ππ§π) = βfor allπ§ >0). However, the converse is not true. The lognormal distribution has no moment generating function even though all its positive moments are finite.
Further comparisons of tail behavior can be made on the basis of ratios of moments (assuming they exist). In particular, heavy-tailed behavior is typically associated with large values of quantities such as the coefficient of variation, the skewness, and the kurtosis (see Definition 3.2).
3.4.2 Comparison Based on Limiting Tail Behavior
A commonly used indication that one distribution has a heavier tail than another distribution with the same mean is that the ratio of the two survival functions should diverge to infinity (with the heavier-tailed distribution in the numerator) as the argument becomes large. The divergence implies that the numerator distribution puts significantly more probability on large values. Note that it is equivalent to examine the ratio of density functions. The limit of the ratio will be the same, as can be seen by an application of LβHΛopitalβs rule:
π₯βlimβ
π1(π₯) π2(π₯) = lim
π₯ββ
π1β²(π₯) π2β²(π₯) = lim
π₯ββ
βπ1(π₯)
βπ2(π₯) = lim
π₯ββ
π1(π₯) π2(π₯).
0 0.00005 0.0001 0.00015 0.0002 0.00025 0.0003 0.00035 0.0004 0.00045
50 70 90 110 130 150
x
f(x)
Pareto Gamma
Figure 3.7 The tails of the gamma and Pareto distributions.
EXAMPLE 3.10
Demonstrate that the Pareto distribution has a heavier tail than the gamma distribution using the limit of the ratio of their density functions.
To avoid confusion, the lettersπ andπwill be used for the parameters of the gamma distribution instead of the customaryπΌandπ. Then, the required limit is
π₯βlimβ
πPareto(π₯) πgamma(π₯) = lim
π₯ββ
πΌππΌ(π₯+π)βπΌβ1 π₯πβ1πβπ₯βππβπΞ(π)β1
=π lim
π₯ββ
ππ₯βπ (π₯+π)πΌ+1π₯πβ1
> π lim
π₯ββ
ππ₯βπ (π₯+π)πΌ+π
and, either by application of LβHΛopitalβs rule or by remembering that exponentials go to infinity faster than polynomials, the limit is infinity. Figure 3.7 shows a portion of the density functions for a Pareto distribution with parametersπΌ= 3andπ = 10and a gamma distribution with parametersπΌ= 1
3 andπ = 15. Both distributions have a mean of 5 and a variance of 75. The graph is consistent with the algebraic derivation.
β‘
3.4.3 Classification Based on the Hazard Rate Function
The hazard rate function also reveals information about the tail of the distribution. Distribu- tions with decreasing hazard rate functions have heavy tails. Distributions with increasing hazard rate functions have light tails. In the ensuing discussion, we understand βdecreasingβ
to mean βnonincreasingβ and βincreasingβ to mean βnondecreasing.β That is, a decreasing function can be level at times. The exponential distribution, which has a constant hazard rate, is therefore said to have both a decreasing and an increasing hazard rate. For distributions with monotone hazard rates, distributions with exponential tails divide the distributions into heavy-tailed and light-tailed distributions.
Comparisons between distributions can be made on the basis of the rate of increase or decrease of the hazard rate function. For example, a distribution has a lighter tail than
another if its hazard rate function is increasing at a faster rate. Often, these comparisons and classifications are of interest primarily in the right tail of the distribution, that is, for large functional values.
EXAMPLE 3.11
Compare the tails of the Pareto and gamma distributions by looking at their hazard rate functions.
The hazard rate function for the Pareto distribution is β(π₯) = π(π₯)
π(π₯) = πΌππΌ(π₯+π)βπΌβ1 ππΌ(π₯+π)βπΌ = πΌ
π₯+π,
which is decreasing. For the gamma distribution we need to be a bit more clever because there is no closed-form expression forπ(π₯). Observe that
1
β(π₯) = β«π₯βπ(π‘)ππ‘
π(π₯) = β«0βπ(π₯+π¦)ππ¦ π(π₯) ,
and so, ifπ(π₯+π¦)βπ(π₯)is an increasing function ofπ₯for any fixedπ¦, then1ββ(π₯) will be increasing inπ₯and thus the random variable will have a decreasing hazard rate.
Now, for the gamma distribution, π(π₯+π¦)
π(π₯) = (π₯+π¦)πΌβ1πβ(π₯+π¦)βπ π₯πΌβ1πβπ₯βπ =
( 1 +π¦
π₯ )πΌβ1
πβπ¦βπ,
which is strictly increasing inπ₯ provided thatπΌ <1 and strictly decreasing inπ₯ if πΌ >1. By this measure, some gamma distributions have a heavy tail (those withπΌ <1) and some (withπΌ >1) have a light tail. Note that whenπΌ= 1, we have the exponential distribution and a constant hazard rate. Also, even thoughβ(π₯)is complicated in the gamma case, we know what happens for largeπ₯. Becauseπ(π₯)andπ(π₯)both go to 0 asπ₯ββ, LβHΛopitalβs rule yields
π₯βlimββ(π₯) = lim
π₯ββ
π(π₯)
π(π₯) = β lim
π₯ββ
πβ²(π₯)
π(π₯) = β lim
π₯ββ
[π ππ₯lnπ(π₯)
]
= β lim
π₯ββ
π ππ₯
[
(πΌβ 1) lnπ₯βπ₯ π ]
= lim
π₯ββ
(1
π βπΌβ 1 π₯
)
= 1 π.
That is,β(π₯)β1βπasπ₯ββ. β‘
3.4.4 Classification Based on the Mean Excess Loss Function
The mean excess loss function also gives information about tail weight. If the mean excess loss function is increasing in π, the distribution is considered to have a heavy tail. If the mean excess loss function is decreasing inπ, the distribution is considered to have a light tail. Comparisons between distributions can be made on the basis of whether the mean excess loss function is increasing or decreasing. In particular, a distribution with an increasing mean excess loss function has a heavier tail than a distribution with a decreasing mean excess loss function.
In fact, the mean excess loss function and the hazard rate are closely related in several ways. First, note that
π(π¦+π) π(π) =
exp [
ββ«0π¦+πβ(π₯)ππ₯] exp
[
ββ«0πβ(π₯)ππ₯] = exp [
ββ«
π¦+π π β(π₯)ππ₯
]
= exp [
ββ«
π¦ 0
β(π+π‘)ππ‘ ]
.
Therefore, if the hazard rate is decreasing, then for fixedπ¦it follows thatβ«0π¦β(π+π‘)ππ‘ is a decreasing function of π, and from the preceding, π(π¦+π)βπ(π) is an increasing function ofπ. But from (3.5) the mean excess loss function may be expressed as
π(π) = β«πβπ(π₯)ππ₯ π(π) =
β«
β 0
π(π¦+π) π(π) ππ¦.
Thus, if the hazard rate is a decreasing function, then the mean excess loss functionπ(π) is an increasing function of π because the same is true of π(π¦+π)βπ(π) for fixed π¦. Similarly, if the hazard rate is an increasing function, then the mean excess loss function is a decreasing function. It is worth noting (and is perhaps counterintuitive), however, that the converse implication is not true. Exercise 3.29 gives an example of a distribution that has a decreasing mean excess loss function, but the hazard rate is not increasing for all values. Nevertheless, the implications just described are generally consistent with the preceding discussions of heaviness of the tail.
There is a second relationship between the mean excess loss function and the hazard rate. Asπββ,π(π)andβ«πβπ(π₯)ππ₯go to zero. Thus, the limiting behavior of the mean excess loss function asπββmay be ascertained using LβHΛopitalβs rule because formula (3.5) holds. We have
πβlimβπ(π) = lim
πββ
β«πβπ(π₯)ππ₯ π(π) = lim
πββ
βπ(π)
βπ(π) = lim
πββ
1 β(π)
as long as the indicated limits exist. These limiting relationships may be useful if the form ofπΉ(π₯)is complicated.
EXAMPLE 3.12
Examine the behavior of the mean excess loss function of the gamma distribution.
Becauseπ(π) =β«πβπ(π₯)ππ₯βπ(π)andπ(π₯)is complicated,π(π)is complicated.
Butπ(0) =E(π) =πΌπ, and, using Example 3.11, we have
π₯βlimβπ(π₯) = lim
π₯ββ
1
β(π₯) = 1
π₯βlimββ(π₯) =π.
Also, from Example 3.11, β(π₯) is strictly decreasing in π₯ for πΌ < 1 and strictly increasing inπ₯forπΌ >1, implying thatπ(π)is strictly increasing fromπ(0) =πΌπto π(β) =πforπΌ <1and strictly decreasing fromπ(0) =πΌπtoπ(β) =πforπΌ >1. For πΌ= 1, we have the exponential distribution for whichπ(π) =π. β‘
3.4.5 Equilibrium Distributions and Tail Behavior
Further insight into the mean excess loss function and the heaviness of the tail may be obtained by introducing the equilibrium distribution (also called the integrated tail distribution). For positive random variables withπ(0) = 1, it follows from Definition 3.3 and (3.5) withπ = 0that E(π) =β«0βπ(π₯)ππ₯or, equivalently,1 =β«0β[π(π₯)βE(π)]ππ₯, so that
ππ(π₯) = π(π₯)
E(π), π₯β₯0, (3.11)
is a probability density function. The corresponding survival function is ππ(π₯) =
β«
β
π₯ ππ(π‘)ππ‘= β«π₯βπ(π‘)ππ‘
E(π) , π₯β₯0. The hazard rate corresponding to the equilibrium distribution is
βπ(π₯) = ππ(π₯)
ππ(π₯) = π(π₯)
β«π₯βπ(π‘)ππ‘ = 1 π(π₯)
using (3.5). Thus, the reciprocal of the mean excess function is itself a hazard rate, and this fact may be used to show that the mean excess function uniquely characterizes the original distribution. We have
ππ(π₯) =βπ(π₯)ππ(π₯) =βπ(π₯)πββ«0π₯βπ(π‘)ππ‘, or, equivalently,
π(π₯) = π(0) π(π₯)πββ«0π₯
[ 1
π(π‘)
]ππ‘
usingπ(0) =E(π).
The equilibrium distribution also provides further insight into the relationship between the hazard rate, the mean excess function, and the heaviness of the tail. Assuming that π(0) = 1, and thus π(0) = E(π), we have β«π₯βπ(π‘)ππ‘ = π(0)ππ(π₯), and from (3.5),
β«π₯βπ(π‘)ππ‘=π(π₯)π(π₯). Equating these two expressions results in π(π₯)
π(0) = ππ(π₯) π(π₯).
If the mean excess function is increasing (which is implied if the hazard rate is decreasing), thenπ(π₯) β₯ π(0), which is obviously equivalent to ππ(π₯) β₯ π(π₯)from the preceding equality. This, in turn, implies that
β«
β 0
ππ(π₯)ππ₯β₯ β«0βπ(π₯)ππ₯.
But E(π) =β«0βπ(π₯)ππ₯from Definition 3.3 and (3.5) ifπ(0) = 1. Also,
β«
β 0
ππ(π₯)ππ₯=
β«
β 0
π₯ππ(π₯)ππ₯,
since both sides represent the mean of the equilibrium distribution. This may be evaluated using (3.9) withπ’= β, π= 2, andπΉ(0) = 0to give the equilibrium mean, that is,
β«
β 0
ππ(π₯)ππ₯=
β«
β 0
π₯ππ(π₯)ππ₯= 1 E(π)β«
β 0
π₯π(π₯)ππ₯= E(π2) 2E(π). The inequality may thus be expressed as
E(π2)
2E(π) β₯E(π)
or usingVar(π) =E(π2) β [E(π)]2asVar(π)β₯[E(π)]2. That is, the squared coefficient of variation, and hence the coefficient of variation itself, is at least 1 if π(π₯) β₯ π(0).
Reversing the inequalities implies that the coefficient of variation is at most 1 ifπ(π₯)β€π(0), which is in turn implied if the mean excess function is decreasing or the hazard rate is increasing. These values of the coefficient of variation are consistent with the comments made here about the heaviness of the tail.
3.4.6 Exercises
3.25 Using the methods in this section (except for the mean excess loss function), compare the tail weight of the Weibull and inverse Weibull distributions.
3.26 Arguments as in Example 3.10 place the lognormal distribution between the gamma and Pareto distributions with regard to heaviness of the tail. To reinforce this conclusion, consider: a gamma distribution with parametersπΌ= 0.2,π= 500; a lognormal distribution with parametersπ= 3.709290,π = 1.338566; and a Pareto distribution with parameters πΌ = 2.5,π = 150. First demonstrate that all three distributions have the same mean and variance. Then, numerically demonstrate that there is a value such that the gamma pdf is smaller than the lognormal and Pareto pdfs for all arguments above that value, and that there is another value such that the lognormal pdf is smaller than the Pareto pdf for all arguments above that value.
3.27 For a Pareto distribution withπΌ > 2, compareπ(π₯)toπ(0)and also determine the coefficient of variation. Confirm that these results are consistent with the Pareto distribution being heavy tailed.
3.28 Let π be a random variable that has the equilibrium density from (3.11). That is, ππ(π¦) =ππ(π¦) = ππ(π¦)βE(π)for some random variableπ. Use integration by parts to show that
ππ(π§) = ππ(π§) β 1 π§E(π) wheneverππ(π§)exists.
3.29 You are given that the random variableπ has probability density functionπ(π₯) = (1 + 2π₯2)πβ2π₯, π₯β₯0.
(a) Determine the survival functionπ(π₯).
(b) Determine the hazard rateβ(π₯).
(c) Determine the survival functionππ(π₯)of the equilibrium distribution.
(d) Determine the mean excess functionπ(π₯).
(e) Determinelimπ₯βββ(π₯)andlimπ₯ββπ(π₯).
(f) Prove thatπ(π₯)is strictly decreasing butβ(π₯)is not strictly increasing.
3.30 Assume thatπhas probability density functionπ(π₯), π₯β₯0.
(a) Prove that
ππ(π₯) = β«π₯β(π¦βπ₯)π(π¦)ππ¦ E(π) . (b) Use (a) to show that
β«
β
π₯ π¦π(π¦)ππ¦=π₯π(π₯) +E(π)ππ(π₯). (c) Prove that (b) may be rewritten as
π(π₯) = β«π₯βπ¦π(π¦)ππ¦ π₯+π(π₯) and that this, in turn, implies that
π(π₯)β€ E(π) π₯+π(π₯). (d) Use (c) to prove that, ifπ(π₯)β₯π(0), then
π(π₯)β€ E(π) π₯+E(π) and thus
π[πE(π)]β€ 1 π+ 1,
which forπ= 1implies that the mean is at least as large as the (smallest) median.
(e) Prove that (b) may be rewritten as ππ(π₯) = π(π₯)
π₯+π(π₯)
β«π₯βπ¦π(π¦)ππ¦ E(π) and thus that
ππ(π₯)β€ π(π₯) π₯+π(π₯).
3.5 Measures of Risk