Evolution of the distribution of attention
In chapter 7, we attempted to describe the compound result of the autonomous development of an Internet-native culture production and of the continued usage of the Internet as a distribution medium for media publishing. The interaction of these factors leads to a popularity distribution for creators (beneficiaries of the Creative Contribution), which is the basis for our model.
To explain what we mean by“popularity distribution for creators”, we need to define a few quantities. If each work wj has popularity popðwjÞ, and the relative contribution of a given creator cito wjis contribðci; wjÞ, the total popularity1for a given creator is:
popðciÞ ¼X
j
contribðci; wjÞ popðwjÞ
For example, if creator A contributed 50% of work w1, 25% of work w2and 60%
of work w3, the popularity of A is:
popðAÞ ¼ 0:5 popðw1Þ þ 0:25 popðw2Þ þ 0:6 popðw3Þ:
The popularity distribution for creators is the distribution of popðciÞ: it deter-mines the distribution of the rewards. We will speak equivalently of usage distri-bution, because our measurement system collects usage clues (see appendix C).
A number of heterogeneous factors (such as classical media publishing and unauthorized file sharing) already interact to drive the attention devoted to var-ious productions. This interaction will be stronger once the Creative Contribution is put in place and file sharing is recognized as a legitimate activity. It will result – as it already does – in attention patterns (i.e. popularity distributions) that do not formally follow Zipf’s law but are often closely approximated by it. We will discuss below the impact of this possible divergence, but for the time being, let us accept modeling the reported usage pattern by a Zipf law and consider only how the associated parameter– and thus the diversity of usage – will vary.
193
Note that actual rewards will be distributed according to the observed distribu-tion, not the model: this model is used only to determine the overall amount of rewards, i.e. to set a global scale factor. To do this, we will assume that the input to the reward system will be a Zipf law with parameter . This leaves us with 3 parameters to set: the value of , the size of the overall universe (the number of creators to be rewarded), and the minimum reward amount to be distributed.
As explained in section 7.2, one can only speculate about the likely observed value of . In chapter 7, we predict that it will start at the high value of 1:0 (the classical Zipf law, but actually less concentrated than present copyright rewards), and then progressively decrease to 0:9 (a more diverse distribution of attention), possibly becoming as low as 0:8 in the long term. The interested reader is re-ferred to figure 7.2 for a comparison between the corresponding distributions.2 Given the uncertainty of and the likelihood that it will vary, it is important to build a model which can function at constant total cost regardless of the precise value of , at least for the predictable future.
The reward level and size parameters
To set the remaining two parameters, we start from the decision that a certain number of people should be rewarded at or above a certain minimum level in a given country, at the time when the Creative Contribution is introduced. This choice is not arbitrary, but it is clearly based on a programmatic decision rather than on some more fundamental principle: there is no obvious ground truth that immediately tell us that “we should reward so many people by at least this amount”.
We chose to set this minimum reward at $200/year (or the local currency equivalent), and the number of people who should receive it at 2-2.5% of individ-ual Internet contributors who “produce and publish some contents for sharing over the Internet in a 3 month period” (See Deroin 2010), and we did our best to substantiate this decision (see page 100), but we acknowledge that a different choice could be made. However, there are some important constraints. The num-ber of creators to be rewarded at the minimum level cannot be raised arbitrarily, not only because it would make the reward system too expensive, but also be-cause it is not clear that there are enough deserving creators out there to justify it, or that they would come out of the woodwork if a reward was available.
All the computer programs used to experiment with our model are distributed in parallel with the publication of this book as free software3, and readers are encouraged to experiment with other possible values, should they wish to do so.
Our choice for the threshold below which rewards will not be distributed is
$40/year (justified on page 99). Once these 2 choices have been made, the other decisions follow for a given value of the diversity parameter .
Setting an initial value for the universe size
For the time being, let us assume a proportional reward, where creators are re-warded proportionally to the measured usage of their works. Let’s assume that the value of the parameter of Zipf’s law for the initially observed diversity of usage will be ¼ 1:0, and we wish to have at least n ¼ 230; 000 creators receiving ¤150/
year or more. The following formula immediately gives the total number of re-warded creators
N¼ exp lnðnÞ þ1
ln 150 30
where expðxÞ is the exponential of x and lnðnÞ is the natural logarithm of n. The formula is obtained as follows. According to Zipf’s law, the reward for the nthcreator is:
rewardðnÞ ¼R n
where Ris a constant, or scale factor, that sets the overall level of the rewards.
The last creator being rewarded (the one which receives the smallest amount) receives:
rewardmin¼R N Dividing one equation by the other:
rewardðnÞ rewardmin ¼N
n Now take the natural logarithm of both sides:
ln rewardðnÞ rewardmin
¼ lnðNÞ lnðnÞ
rearrange:
lnðNÞ ¼ lnðnÞ þ1
ln rewardðnÞ rewardmin
and take the exponential of both sides:
N¼ exp lnðnÞ þ1
ln rewardðnÞ rewardmin
Plugging in the values rewardðnÞ ¼ 150, rewardmin¼ 30, n ¼ 230; 000, and
¼ 1 gives:
N¼ exp lnð230; 000Þ þ1 1ln 150
30
the total cost of rewards and their distribution 195
¼ exp lnð230; 000Þ þ lnð5Þf
¼ exp lnð230; 000 5Þf ¼ 1; 150; 000
Running the model
All we now need is to work out the total reward, that is, the sum of all the re-wards:
rewardtot ¼XN
m¼1
rewardðmÞ ¼XN
m¼1
rewardminN m
rewardtot¼ rewardmin NXN
m¼1
1 m rewardtot ¼ rewardmin N HNðÞ
where HN;is the Nthharmonic number, already introduced in Appendix A. This formula immediately gives the total reward, the only step that requires some sim-ple assistance is the computation of HNðÞ.
How many creators are rewarded as diversity increases?
If we are now in a situation where the observed diversity of use corresponds to another value of Zipf’s law parameter, say ’ we can find the new number of rewarded creators N’ that will lead to the same total cost in this new situation:
N’’ HN’ð’Þ ¼ N HNðÞ
Solving this equation for N’ is not that trivial, and is easier done by approximation techniques or using numerical tables. For instance in the example above with N¼ 1; 150; 000 and ¼ 1:0, the solution corresponding to ¼ 0:9 is N’ ¼ 2; 140; 000 and for ¼ 0:8 it is N’ ¼ 3; 490; 000.
Impact of a divergence of the observed usage from Zipf’s law
Only when a measurement system is fully in place can we judge if the reported usage follows Zipf’s law. What are the consequences if it does not? We have de-signed a constant cost reward system, and this cost can be distributed according to the observed usage, but with some adjustments.
If we want to keep the minimal reward and still distribute the same total amount of rewards, we will have two differences in comparison with what would have happened with the Zipf law fitted to the observed usage:
– the number of rewardees will be different;
– the level of use corresponding to the minimal reward will be different.
The second effect could be the most problematic one if it forces us to measure usage precisely at much lower levels than modeled in appendix C. Fortunately, this is easy to avoid so long as one does not try to reward an excessive number of creators, staying clear of the level of attention where real use is mixed with noise.
If we take the example of the 2 million most popular files in 10 weeks of usage of eDonkey P2P, where we have a significant divergence between the best-fitting Zipf’s law and the observed data (see table 13.1), the level of usage for the 1,000,000th most popular work is only 24% lower in the observed data than in the model.