Monte Carlo Truth - PHOSPHOR Fit

HCAL ECAL

Chapter 5 Analysis

5.6 PHOSPHOR Fit

5.6.2 Monte Carlo Truth

The phosphor Fit relies on precise knowledge of the true values of the E^γ scale and resolution in the simulation. These are also necessary to validate the method.

Therefore, a robust and accurate method to extract these values is desirable.

The estimation of the Monte Carlo trueE^γ scale and resolution would be straight- forward if theE^γresponse followed a simple known density function, like the Gaussian or the Crystal Ball [122]. We could then simply fit it to the simulated E^γ response data and interpret its location and scale parameters as the E^γ scale and resolution.

Unfortunately, this does not seem to be the case.

In Figure 5.11 we fit the Crystal Ball line shape to theE^γ response of photons in a subclass ofµµγevents. We select only photons in the ECAL barrel withET >30 GeV and R9 >0.94. The fitted function describes the data very poorly due to the tails.

The tails are much heavier in the data than the function can model, biasing the fit.

144

- 1 (%) E/Etrue

-40 -20 0 20 40 60 80

Events / ( 0.05 % )

10-2

10-1

1 10 10

= -0.3799 +/- 0.021 µ

= 1.603 +/- 0.015 σ

n = 5.1 +/- 7.0

- 1 (%) E/Etrue

-4 -3 -2 -1 0 1 2

Events / ( 0.05 % )

0 20 40 60 80 100

Pulls χ2

-6 -4 -2 0 2 4 6

trueBins of E/E

0 5 10 15 20 25 30 35

- 1 (%) E/Etrue

-12 -10 -8 -6 -4 -2 0 2 4

Pulls2χ

0 100 200 300 400 500

- 1 (%) E/Etrue

-12 -10 -8 -6 -4 -2 0 2 4

Residuals2χ

-80 -60 -40 -20 0 20 40

Crystal Ball Fit 99% Containment Barrel

> 0.94

[30, 100) GeV

γ∈ ET

5867 events /ndof: 3.09e+05/85 χ2

p-value: 0

Figure 5.11: An example of E^γ response mismodeling by the CB line shape. These events have photons in the ECAL barrel with E_T >30 GeV and R₉ > 0.94. The fit range contains 99% of all events. (Top Left) The data and the fitted function on a logarithmic y-axis scale with an x-axis range covering 99.99994% (5-σ) of the data. (Top Middle) The data and the fitted function on a linear y-axis scale with an x-axis range covering 95.4% (2-σ) of the data.

145

- 1 (%) E/Etrue

-40 -20 0 20 40 60 80

Events / ( 0.05 % )

10-2

10-1

1 10

- 1 (%) E/Etrue

-4 -3 -2 -1 0 1 2

Events / ( 0.05 % )

0 20 40 60 80 100

Pulls χ2

-6 -4 -2 0 2 4 6

Bins of E/E

0 5 10 15 20

- 1 (%) E/Etrue

-1 -0.5 0 0.5 1

Pulls2χ

-2 -1 0 1 2 3

- 1 (%) E/Etrue

-1 -0.5 0 0.5 1

Residuals2χ

-20 -10 0 10 20

Gaussian Fit 70% Containment Barrel

> 0.94

[30, 100) GeV

γ∈ ET

5867 events /ndof: 32.9/48 χ2

p-value: 0.95

Figure 5.12: An example of a good modeling of the E^γ response peak by a Gaussian. (Top Left) The data and the fitted function on a logarithmic y-axis scale with anx-axis range covering 99.99994% (5-σ) of the data. (Top Middle) The data and the fitted function on a linear y-axis scale with an x-axis range covering 95.4% (2-σ) of the data. This fit demonstrates the impact of limiting the range on the goodness of the E^γ response modeling. This is to be compared with the reference fit in Figure 5.11. Here, the data is the same (photons in the ECAL barrel with E_T > 30 GeV and R₉ > 0.94), the model is even simpler (Gaussian instead of the CB line shape) yet — in contrast to the reference fit — the model describes the data reasonably well. The important difference is that the fit range is a modal interval containing only 70% of all events instead of 99%.

To judge the goodness of a fit, we plot the χ² residuals and pulls as a function of the observable x. Theχ² residuals are defined for each bin i= 1, . . . , N as follows:

∆n_i =n_i−ν_i, (5.9)

where ni and νi are the numbers of the observed and expected events in the bin i, respectively. Here, the number of the expected events events in the bini is:

νi = XN j=1

Z bi

f(x|θ) dx, (5.10)

where we sum j over all the bins. Here, f(x|θ) is the fitted model depending on P parameters θ =θ1, . . . , θP, and ai and bi are the lower and upper boundary of the bin i= 1, . . . , N, respectively. We choose the binning such thatni ≥30 for∀i. This guaranties that νi >5 for ∀i at a very high confidence level. We define theχ² pulls as:

χi = ni−νi

√νi

(5.11) High values of the χ² residuals and pulls indicate poor compatibility of the model f(x|θ) with the data. For each bin, we plot ∆ni and χi at the median x of that bin.

We also plot the distribution of the χ² pulls. This should follow a unit Gaussian if the f(x|θ) describes the data well:

χi ∼ N(x|0,1). (5.12)

Therfore we also overlay the spectrum of χ² pulls with a properly normalized unit Gaussian to see their mutual compatibility.

As another goodness-of-fit test, we calculate the χ² statistic as [123]:

χ² = XN

i=1

χ²_i, (5.13)

where we sum the index i over all bins. If the data follows the model f(x|θ), the χ² statistic approaches a known probability density function (PDF), the so-calledχ² PDF f(z|n_d). Here, n_d is the number of degrees of freedom:

nd =N −P. (5.14)

The χ² statistic follows the χ² PDF in the limit of high statistic. In practice, the χ² PDF is a good approximation of the actual χ² distribution when νi > 5 for all i = 1, . . . , N [123]. This condition is satisfied by our choice of the binning. We can thus use the χ² PDF to calculate the p-value of theχ² statistic as:

p= Z ∞

χ²

f(z|nd) dz. (5.15)

The p-value expresses the probability that the χ² statistic of a random sample would atain a greater value than theχ² statistic of the sample at hand. Thep-value should be uniformly distributed. Poor compatibility of the modelf(x|θ) with the data leads to low numerical values of the p-value.

148

- 1 (%) E/Etrue

-80 -60 -40 -20 0 20 40 60 80 100

Events / ( 0.5 % )

1 10 102

= 11.86 +/- 0.17 µ

= 8.70 +/- 0.21 σ

- 1 (%) E/Etrue

-20 -10 0 10 20 30 40

Events / ( 0.5 % )

0 20 40 60 80 100 120 140 160 180

Pulls χ2

-6 -4 -2 0 2 4 6

- 1trueBins of E/E

0 2 4 6 8 1012 14 16 18 20 22

- 1 (%) E/Etrue

0 5 10 15 20

Pulls2χ

-4 -3 -2 -1 0 1 2 3

- 1 (%) E/Etrue

0 5 10 15 20

Residuals2χ

-30 -20 -10 0 10 20 30 40

Gaussian Fit 70% Containment Endcaps

< 0.95

[10, 12) GeV

γ∈ ET

8141 events /ndof: 73.9/47 χ2

p-value: 0.0074

Figure 5.13: An example of the E^γ response peak mismodeling by a Gaussian. These events have photons in the ECAL endcaps with E_T =10–12 GeV and R₉ < 0.94. The fit range contains 70% of all events. (Top Left) The data and the fitted function on a logarithmic y-axis scale with an x-axis range covering 99.99994% (5-σ) of the data. (Top Middle) The data and the fitted function on a linear y-axis scale with an x-axis range covering 95.4% (2-σ) of the data.

149

- 1 (%) E/Etrue

-80 -60 -40 -20 0 20 40 60 80 100

Events / ( 0.5 % )

1 10 10

= 8.70 +/- 0.20 σ

n = 0.1 +/- 8.7

- 1 (%) E/Etrue

-20 -10 0 10 20 30 40

Events / ( 0.5 % )

0 20 40 60 80 100 120 140 160

Pulls χ2

-6 -4 -2 0 2 4 6

Bins of E/E

0 2 4 6 8 1012 14 16 18 20

- 1 (%) E/Etrue

0 5 10 15 20

Pulls2χ

-4 -3 -2 -1 0 1 2 3

- 1 (%) E/Etrue

0 5 10 15 20

Residuals2χ

-30 -20 -10 0 10 20 30 40

Crystal Ball Fit 70% Containment Endcaps

< 0.95

[10, 12) GeV

γ∈ ET

8141 events /ndof: 73.9/45 χ2

p-value: 0.0043

Figure 5.14: An example of the E^γ response peak mismodeling by a CB line shape. Same as Figure 5.13 but for the CB line shape instead of a Gaussian.

150

- 1 (%) E/Etrue

-80 -60 -40 -20 0 20 40 60 80 100

Events / ( 0.5 % )

1 10 102

s = 13.77 +/- 0.44

∆

L = 10.30 +/- 0.45 σ

R = 6.39 +/- 0.46 σ

- 1 (%) E/Etrue

-20 -10 0 10 20 30 40

Events / ( 0.5 % )

0 20 40 60 80 100 120 140 160 180

Pulls χ2

-6 -4 -2 0 2 4 6

- 1trueBins of E/E

0 2 4 6 8 1012 14 16 18 20 22

- 1 (%) E/Etrue

0 5 10 15 20

Pulls2χ

-3 -2 -1 0 1 2 3

- 1 (%) E/Etrue

0 5 10 15 20

Residuals2χ

-30 -20 -10 0 10 20 30 40

Bifur. Gaussian Fit 70% Containment Endcaps

< 0.95

[10, 12) GeV

γ∈ ET

8141 events /ndof: 52.7/46 χ2

p-value: 0.23

Figure 5.15: An example of the E^γ response peak mismodeling by a bifurcated Gaussian. Same as Figure 5.13 but for a bifurcated Gaussian instead of a Gaussian.

The poor modeling of the tails can be mitigated by fitting a subset of the data near the peak, see Figure 5.12. However, reducing the fit range leads to additional systematics due to the fit range and is very fragile since the behavior varies greatly among various photon categories based on the photon pT, η and R9. Figures 5.13–

5.15 demonstrate the limitation of this approach for photons in the endcaps with ET ∈[10,20] GeV and R9 <0.94, for increasingly complex analytical models.

Dalam dokumen Jan Veverka: Associated Zgamma Production at CMS (Halaman 165-173)