Ž .
Geoderma 98 2000 1–3
www.elsevier.nlrlocatergeoderma
Discussion
On the role of Shannon’s entropy
as a measure of heterogeneity
Miguel Angel Martın
a,), Jose-Manuel Rey
b´
´
a
Departamento de Matematica Aplicada, E.T.S.I.A. Agronomos, Uni´ ´ Õersidad Politecnica de´
Madrid, 28040 Madrid, Spain
b
Departamento de Analisis Economico, Uni´ ´ Õersidad Complutense, Campus de Somosaguas,
28223 Madrid, Spain
Received 29 June 1999; accepted 24 January 2000
Ž .
Ibanez et al. 1998 proposed the use of Shannon’s entropy to analyze the
˜
diversity of the world pedosphere on the basis of data compiled by the F.A.O. at the scale 1:5,000,000. Here we will try to provide some mathematically founded arguments to justify the use and interpretation of Shannon’s information entropy as a measure of diversity and homogeneity.
Information entropy h is computed from a discrete probability distribution p : ii s1, 2, . . . , N via Shannon’s formula h4 s yÝip log p . This quantityi i was originally proposed by Shannon as a measure of the average information content that is gained from observing the realization of an experiment with N possible outcomes with probabilities of occurence given by p , p , . . . , p .1 2 N
Ž .
Well-known mathematical facts are that a h attains its maximum value log N Ž .
only in the equiprobable case, that is pis1rN for all i’s, and b h vanishes in
Ž .
the case that some pjs1 and thus pis0 if i/j . These two extreme
Ž . Ž .
situations, respectively, correspond with a the most informative case and b Ž . the least informative case, since observing the actual outcome provides a much
Ž .
rich information being all outcomes equally probable and b very poor informa-tion in the case that outcome j has all the chances to occur. Also, the number h depends continuously on the probabilities p so that similar distributions renderi close values of h.
)Corresponding author.
Ž .
E-mail addresses: mamartin@mat.etsia.upm.es M.A. Martın , j-man@ccee.ucm.es´ ŽJ.-M. Rey ..
0016-7061r00r$ - see front matterq2000 Elsevier Science B.V. All rights reserved.
Ž .
( ) M.A. Martın, J.-M. Reyr´ Geoderma 98 2000 1–3 2
The index h is a natural measure of diversity: we have hslog 2 if odd,
4
even are considered as the possible outcomes of a fair die toss, whereas hslog
4 Ž
6 for the natural outcome space 1,2,3,4,5,6 . In general, the finer or more .
diverse the space of possible outcomes is the bigger the associated value of the entropy h is.
The index h can also be understood as a measure of the homogeneity or Ž . evenness of the distribution according to the following interpretation: case a above corresponds with the most homogeneous case since all outcomes will be equally present in a large sample of independent realizations of the experiment,
Ž .
whereas case b produces a highly heterogeneous distribution because all
observations will yield the same outcome. Notice that the closer to 1rN the
Ž .
values of p are the more homogeneous the distribution is , the more diversei the composition of a sample of n realization of the experiment is and, therefore, a value of h close to the maximum log N indicates an even proportional contribution of every output in a large independent sample.
The entropy index has apparently been considered in some contexts as another index from a collection in order to measure such things as disorder, asymmetry, or information. It is important, however, to realize that that is not the case: it is an essential theoretical fact that any quantity that is intended to serve as a measure of such things satisfying natural properties has to be a
Ž
multiple of the index h. This is the content of the Khinchine theorem Khinchine, .
1957 .
The ideas above have led to considering entropy as a successful natural device to gauge inequality or heterogeneity, for instance in economics to
Ž .
measure inequality in income distribution Theil, 1967 . In this setting, N is the total number of people in the population and p is interpreted as the share ofi person i in total income so that the more uniform the distribution of the p ’s is,i the more equally total income is distributed through population. Diversity is thus described by entropy trough the unevenness of the distribution of probabilities among the different outcomes.
Also a way of using entropy in the field of soil sciences has been proposed by Ž .
Martın and Taguas 1998 . In this paper, a fractal modeling of particle soil
´
Ž . Ž
distribution PSD is given and the entropy dimension that is, the exponent of scaling of the entropy computed at different partition scales with respect to the
. Ž .
partition size is used to characterize PSD. In Taguas et al. 1999 , the fractal structure hypothesis of PSD is successfully checked for many different soils and, as a consequence, entropy dimension becomes a useful tool in order to
charac-Ž .
terize textural classes Martın and Taguas, 1999 . Again entropy, or its power-law
´
behaviour within a range of size scales, gives a measure of textural richness in terms of the evenness of the probability distribution of the sizes of particles.
Ž .
In Ibanez et al. 1998 , h is computed from the areal proportional contribu-
˜
( )
M.A. Martın, J.-M. Reyr´ Geoderma 98 2000 1–3 3
useful information-content properties are to be satisfied, it is the only possibility. Whereas, a proper interpretation of its meaning is a subtle issue which depends
Ž on the context, we feel that its use as a measure of distribution evenness and
.
thus diversity in the sense mentioned above is justified in general from a Ž .
mathematical perspective, and the case of Ibanez et al.
˜
1998 is not anexception.
References
Ibanez, J.J., De-Alba, S., Lobo, A., Zucarello, V., 1998. Pedodiversity and global soil patterns at˜
Ž .
coarse scales with Discussion . Geoderma 83, 71–192.
Khinchine, A.I., 1957. Mathematical Foundations of Information Theory. Dover Publications, New York.
Martın, M.A., Taguas, F.J., 1998. Fractal modeling, characterization and simulation of particle-size´
distribution in soil. Proc. R. Soc. London, A 454, 1457–1468.
Martın, M.A., Taguas, F.J., 1999. Quantitative characterization of soil textures by mean of the´
entropy dimension, preprint.
Taguas, F.J., Martın, M.A., Perfect, E., 1999. Simulation and testing of selfsimilar structures using´
iterated function systems. Geoderma 88, 191–203.