Likelihood Curvature - Data Analysis of and Results from Observations of the Cosmic Microwave B

the trace, especially on multiprocessor machines (Sievers, 2004, in prep).

The usefulness of the second expression becomes clear if we introduce an extra factor of CC⁻¹ into the determinant term, giving

dlog (L) dq =1

2T r

∆∆^T−C

C⁻¹WqC⁻¹

(2.32)

We can see that we reach the maximum of the likelihood, where the gradient is zero, at the point where the matrix formed by the data ∆∆^T “most closely” matches the covariance matrix C. In addition, we can see how the gradient will respond to the addition of an expected signal, which usually requires a matrix to describe rather than a vector. This is the key to understanding the contribution to the power spectrum from other signals, discussed in Section 2.5. Unfortunately, calculating the gradient using this expression is computationally expensive, requiringnbin matrix- matrix multiplications. We can get one matrix multiplication for free because of the trace, but we have to pay for the others. Since we need the derivative foreach bin, this requires a factor of order the number of bins more work to calculate the gradient using this formula rather than Equation 2.30. When the number of bins becomes large (for the CBI, we have typically around 20), this factor can be the difference between being able to run on a typical desktop machine and having to run on a supercomputer, or the difference between being able to run on a supercomputer and not being able to extract a power spectrum at all.

where F is the second derivative matrix, defined below in Equation 2.34. This is the fundamental algorithm we use to find the set ofqB that give the best fitting spectrum, and once we haveF and

dlog(L)

dq for a model, we can update theqB to get a better fitting model. Fortunately, it turns out that we can get an approximate curvature matrix, which will also work in Newton’s method, for only marginally more computational effort than the exact gradient. Let us differentiate both Equations 2.30 and 2.32. Recall that we have by definition restricted ourselves to the class of covariance matrices expressable by Equation 2.21,

C =X

qBWB+ N

This means that the only contributions to derivatives come from differentiating C itself, and all other factors are constant. We can differentiate (2.30) to get two equivalent expressions for the curvature matrix

d²log (L)

dqBdqB^′ ≡ F =−∆^TC⁻¹WBC⁻¹WB^′C⁻¹∆+1

2T r WBC⁻¹WB^′C⁻¹

(2.34)

F =−T r

∆∆^T−C

C⁻¹WBC⁻¹WB^′C⁻¹

−1

2T r WBC⁻¹WB^′C⁻¹

(2.35)

We now have some choices we can make as to how to proceed from here. An early suggestion in Bond et al. (e.g.1998) was to note that at the maximum of the likelihood the first term in (2.35) is approximately zero, and so we can approximate the curvature matrix by

F ≃F≡ 1

2Tr WBC⁻¹WB^′C⁻¹

(2.36)

This approximation F to the curvature matrix is called the Fisher matrix. It is the expected curvature averaged over many data sets if the current model were true. Calculating the Fisher matrix requires us to both create and store CBC⁻¹ for every band, which is requires nbin matrix-matrix multiplications. The program MADCAP (Borrill, 1999), used in de Bernardis et al. (2000), uses Equation 2.34 to calculate the exact curvatureF rather than the Fisher matrix. The first term in

Equation 2.34 is quick to calculate, as it is simply a series of matrix times vector operations. Let us label this term 2D. The second term is again the Fisher matrix, only with the opposite sign. So,F takes about as much effort to calculate as F. So, we have two ways of writing the curvature, one of which is approximate

d²log (L) dqBdqB^′

= 2D −F≃F (2.37)

So it must be true thatD ≃F, and we have then the key result that

F ≃ D (2.38)

This is a new way of measuring the curvature (Sievers, 2004, in prep.) that greatly increases the speed of measuring the spectrum and halves the memory requirements. Why does this do so? Because, with a single inversion of the covariance matrix we can use this equation, along with Equation 2.30 to calculate both the exact gradient and approximate curvature of the likelihood surface! This increases the execution speed by a factor of the number of bins, which for modern experiments is often a few dozen. It is also a more accurate description of the curvature than the Fisher matrix, which has been used successfully for years (including in Mason et al. (2003) and Pearson et al. (2003)). To see this note that

F= 2D −F =D+ (D −F) = F + 2(D −F) (2.39)

So the correction we need to apply to F in order to get F is twice as large as that required by D. This means the algorithm converges to the maximum of the likelihood in fewer iterations. To calculate F one needs to store the set of matrix products C⁻¹WB. This doubles the storage/memory requirements. Because these products are never calculated using D, they don’t need to be stored.

Practically speaking, usingDmeans that one can do the analysis in Pearson et al. (2003) on a desktop PC in thirty minutes that took several hours to do using F on a 32 CPU Alpha supercomputer (GS320 with 733 MHz alpha CPUs). While this method had not yet been developed at the time of

our first-year papers, it has since been adopted into our analysis pipeline and will be used for all upcoming spectrum measurements. Also note that we could continue to differentiateDto be able to approximate the likelihood over successively larger areas. Since, when we are far from the maximum, the error in the step is predominantly due to the third derivative rather than the difference between DandF, we may be able to converge in fewer steps, though I have yet to investigate this in detail.

Incidentally, the errors in the band powers are easy to estimate when we have an (approximate) curvature matrix. To reasonably high accuracy for most experiments, the error onqB is simply that of the Gaussian approximation to the likelihood surface,FBB⁻¹ (see, e.g., Press et al., 1992). There are also higher accuracy approximations available for more detailed work (Bond et al., 2000), and one can always map out the likelihood surface by direct evaluation, but for the CBI these give very similar results to the errors (for further discussion, see Sievers et al., 2003).

Dalam dokumen Data Analysis of and Results from Observations of the Cosmic Microwave Background with the Cosmic Background (Halaman 43-46)