Chapter 6: Nonresonant pair production of highly energetic Higgs bosons
6.4 The ππ π» π» analysis
6.4.2 Background estimation
60 80 100 120 140 160 180 200 220 240 260 280 [GeV]
mSD
j1
0.0 0.5 1.0 1.5
Data / pred.
0 500 1000 1500 2000 2500 3000
Events / 10 GeV
Data tt+jets V+jets, VV
QCD Bkgd. stat. unc.
Data tt+jets V+jets, VV
QCD Bkgd. stat. unc.
CMSSupplementary CR
t t
(13 TeV) 138 fb-1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
b
Db
j1
0.0 0.5 1.0 1.5
Data / pred.
0 500 1000 1500 2000 2500 3000 3500 4000
Events / 0.05
Data tt+jets V+jets, VV
QCD Bkgd. stat. unc.
Data tt+jets V+jets, VV
QCD Bkgd. stat. unc.
CMSSupplementary CR
t t
(13 TeV) 138 fb-1
800 1000 1200 1400 1600 1800
[GeV]
mHH
0.0 0.5 1.0 1.5
Data / pred.
0 100 200 300 400 500 600 700 800 900
Events / 30 GeV
Data tt+jets V+jets, VV
QCD Bkgd. stat. unc.
Data tt+jets V+jets, VV
QCD Bkgd. stat. unc.
CMSSupplementary CR
t t
(13 TeV) 138 fb-1
0.2 0.3 0.4 0.5 0.6 0.7
/mHH T j1
p 0.0
0.5 1.0 1.5
Data / pred.
0 500 1000 1500 2000 2500
Events / 0.025
Data tt+jets V+jets, VV
QCD Bkgd. stat. unc.
Data tt+jets V+jets, VV
QCD Bkgd. stat. unc.
CMSSupplementary CR
t t
(13 TeV) 138 fb-1
Figure 6.13: The data and expected background distributions from simulation in theπ‘ π‘ control region (Sec.6.4.2.1) for some of the most discriminating BDT input variables, including the Jet 1π
SD(upper left) andπ
Xbb(upper right), ππ» π» (lower left), and π
T j1/ππ» π» (lower right). The lower panel shows the ratio of the data and the total background prediction, with its statistical uncertainty represented by the shaded band. The error bars on the data points represent the statistical uncertainties.
as a control region for QCD background estimation, and will be discussed later in Sec.6.4.2.2.
BDT Score Jet2 TXbb Tagger
0.43 0.980
0.11 0.950
0.03
Bin3
Bin2
Bin2 Bin1
Fail
0
Figure 6.14: The event categories in theππ π» π»analysis are visually represented in a 2D grid of the BDT score and the jet 2π
Xbbscore.
The QCD multijet background will be estimated from the data in control regions using the parametric alphabet fit method [222], and is described in Sec.6.4.2.2. The QCD background prediction from this method also predicts the contribution from gluon fusion and VBF single Higgs boson production (see Sec.6.4.2.2).
The top quark background is predicted using simulation, with several data-driven corrections derived from dedicated control regions and is described in Sec.6.4.2.1.
Finally, the remaining backgrounds like π‘ π‘ π», π π», VV, and V+jets are predicted using the simulation.
6.4.2.1 π‘ π‘ backgrounds
In this subsection, we describe all corrections applied to the simulation prediction of the top quark backgrounds and their associated uncertainties.
πXbbshape correction The distribution of theπ
Xbbscore for theπ‘ π‘background may not be modeled perfectly in the simulation. To correct for this discrepancy, we measured its shape in a control region dominated by semi-leptonicπ‘ π‘ events. The events in this control region were selected using online single muon or electron triggers. Additionally, events were required to satisfy the following conditions:
β’ At least one selected electron (Sec.6.3.2) or muon (Sec.6.3.1)
β’ At least one AK8 jet with π
T >300 GeV,π
32 <0.46,π
SD > 140 GeV
β’ Ξπ (lepton, AK8 jet) > 1.0
β’ πmiss
T > 100 GeV.
The requirements on jet π
32, jet SD mass (π
SD), and πmiss
T are applied in order to suppress the QCD multijet and W+jets events. Theπ‘ π‘ events make up for 90% of this control region, as can be seen in Figure6.15.
0 50 100 150 200 250 300
soft drop mass (GeV) j1
0.6 0.8 1 1.2
data/mc
0 50 100 150 200 250 300
0 500 1000 1500 2000 2500
Events
V+jets,VV QCD tt+jets 2L tt+jets 1L Data
CMSPreliminary 35 fb-1 (13 TeV)
0 50 100 150 200 250 300
soft drop mass (GeV) j1
0.6 0.8 1 1.2
data/mc
0 50 100 150 200 250 300
0 500 1000 1500 2000 2500 3000
Events
V+jets,VV QCD tt+jets 2L tt+jets 1L Data
CMSPreliminary 41 fb-1 (13 TeV)
0 50 100 150 200 250 300
soft drop mass (GeV) j1
0.6 0.8 1 1.2
data/mc
0 50 100 150 200 250 300
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Events
V+jets,VV QCD tt+jets 2L tt+jets 1L Data
CMSPreliminary 59 fb-1 (13 TeV)
Figure 6.15: The distribution of the AK8 jet SD mass in the semi-leptonicπ‘ π‘control region after imposing all requirements. The 2016, 2017, and 2018 datasets are shown in the left, center, and right plots, respectively.
In Figure 6.16, we show the distribution of the ParticleNetπ
Xbb tagger. The first bin shown here includes an in-situ normalization correction (between 0.0 and 0.8) to offset any discrepancies that might occur due to mismodeling of the lepton trig- ger/identification/isolation efficiencies or π
32 top tagging efficiencies. Figure 6.17 is a zoomed in version of this plot showing the most important bins. The remaining bins (between 0.8 and 1.0) are used to derive the correction toπ
Xbb. We subtract the simulation prediction from the data for all the non top-quark backgrounds and compare the shape of the residual data to the top quark simulation prediction. We then derive correction factors as residual data to π‘ π‘ simulation ratios in different πXbbbins. The size of these corrections in differentπ
Xbbbins varies between a few percent toβΌ20%, with uncertainties below 2%.
The impact of these corrections on the distribution of the Jet 2π
Xbb score and the event BDT score for the π‘ π‘ background, for the different event categories in the signal region, is shown in Fig.6.18and6.19. These corrections tend to shift the top background towards lowerπ
Xbband BDT scores, which is desirable.
π‘ π‘recoil correction
The recoil of the π‘ π‘ system is mis-modeled by the simulation in the phase space relevant for this analysis. Therefore, we will measure the recoil in-situ using an
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
PNet Xbb tagger j1
0.6 0.8 1 1.2
data/mc
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 1000 2000 3000 4000 5000 6000
Events
V+jets,VV QCD tt+jets 2L tt+jets 1L Data
CMSPreliminary 35 fb-1 (13 TeV)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
PNet Xbb tagger j1
0.6 0.8 1 1.2
data/mc
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 1000 2000 3000 4000 5000 6000 7000 8000
Events
V+jets,VV QCD tt+jets 2L tt+jets 1L Data
CMSPreliminary 41 fb-1 (13 TeV)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
PNet Xbb tagger j1
0.6 0.8 1 1.2
data/mc
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 2000 4000 6000 8000 10000 12000
Events
V+jets,VV QCD tt+jets 2L tt+jets 1L Data
CMSPreliminary 59 fb-1 (13 TeV)
Figure 6.16: The distribution of the jetπ
Xbbscore in the 1β+1jπ‘ π‘control region for 2016 (left), 2017 (middle), and 2018 (right); the first bin includes underflow events in each plot.
0.9 0.92 0.94 0.96 0.98 1
PNet Xbb tagger j1
0.6 0.8 1 1.2
data/mc
0.9 0.92 0.94 0.96 0.98 1
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Events
V+jets,VV QCD tt+jets 2L tt+jets 1L Data
CMSPreliminary 35 fb-1 (13 TeV)
0.9 0.92 0.94 0.96 0.98 1
PNet Xbb tagger j1
0.6 0.8 1 1.2
data/mc
0.9 0.92 0.94 0.96 0.98 1
0 2000 4000 6000 8000 10000
Events
V+jets,VV QCD tt+jets 2L tt+jets 1L Data
CMSPreliminary 41 fb-1 (13 TeV)
0.9 0.92 0.94 0.96 0.98 1
PNet Xbb tagger j1
0.6 0.8 1 1.2
data/mc
0.9 0.92 0.94 0.96 0.98 1
0 2000 4000 6000 8000 10000 12000 14000 16000 18000
Events
V+jets,VV QCD tt+jets 2L tt+jets 1L Data
CMSPreliminary 59 fb-1 (13 TeV)
Figure 6.17: The distribution of the jetπ
Xbbscore in the 1β+1jπ‘ π‘control region for 2016 (left), 2017 (middle), and 2018 (right) showing the most important bins.
all-hadronicπ‘ π‘control region. For this control region, we will require events to have two AK8 jets with each jet satisfying the following conditions:
β’ Jet π
T > 450 GeV,
β’ Jetπ
SD > 50 GeV,
β’ Jetπ
Xbb >0.1 (looser than signal region to increase statistics),
β’ Jetπ
32 < 0.46
β’ Jet contains an AK4 sub-jet that passes the loose WP of the DeepCSV b- tagging algorithm (described in Sec.5.3.6).
These selections result in a pool ofπ‘ π‘events with a purity of about 90%. Figure6.20 shows the π
Tof theπ‘ π‘system in this control region.
We subtract all non-π‘ π‘backgrounds from the data, and then divide the residual data by the π‘ π‘ simulation prediction to obtain ratios in bins of πjj
T. These ratios linearly
Figure 6.18: The distribution of the second jetπ
Xbbscore in the signal region for the π‘π‘Β―background, for event category 1 (top left), category 2 (top right), and category 3 (bottom). The histograms are normalized to unit area.
increase until about 300 GeV, and then decrease linearly above 300 GeV. We fit these ratios with a βramp-modelβ defined as follows:
β’ Fit a linear function with a positive slope π1 and constant π
0, for πjj
T β [0, 300] GeV as: π
0+π
1Γ π
T;
β’ Fit another linear function with a negative slope π3 and constant π
2, forπjj
T β [300, 1000] GeV as: π
2+π
3Γ π
T;
β’ Assume a flat correction for πjj
T > 1000 GeV :π
2+ π
3Γ1000.
The results from this ramp-model fit can be found in Figure6.21. The ramp-model is used to derive correction factors for top quark simulation in various bins of πjj
T. The fit uncertainties (shown with the dotted blue line) are propagated as systematic uncertainties for this correction.
Figure 6.22 shows the distribution of the π
T of the π‘ π‘ system after applying the
βramp-modelβ recoil correction and we observe a good level of agreement between the simulation prediction and data. Furthermore, we also check the modeling
Figure 6.19: The distribution of the event BDT score in the signal region for theπ‘π‘Β― background, for event category 1 (top left), category 2 (top right), and category 3 (bottom). The histograms are normalized to unit area.
0 100 200 300 400 500 600 700 800 900 1000
(GeV)
jj
pT 0
0.5 1 1.5 2
data/mc
0 100 200 300 400 500 600 700 800 900 1000
0 200 400 600 800 1000 1200 1400
Events
V+jets,VV QCD tt+jets Data CMSPreliminary 35 fb-1 (13 TeV)
0 100 200 300 400 500 600 700 800 900 1000
(GeV)
jj
pT 0
0.5 1 1.5 2
data/mc
0 100 200 300 400 500 600 700 800 900 1000
0 200 400 600 800 1000 1200 1400 1600 1800
Events
V+jets,VV QCD tt+jets Data CMSPreliminary 41 fb-1 (13 TeV)
0 100 200 300 400 500 600 700 800 900 1000
(GeV)
jj
pT 0
0.5 1 1.5 2
data/mc
0 100 200 300 400 500 600 700 800 900 1000
0 500 1000 1500 2000 2500
Events
V+jets,VV QCD tt+jets Data CMSPreliminary 59 fb-1 (13 TeV)
Figure 6.20: The distribution of the π
T of theπ‘ π‘ system is shown for events in the all hadronic top control region, for 2016 (left), 2017 (middle), and 2018 (right) datasets.
of the π
T of the two AK8 jets individually and the mass of the dijet system in Figures 6.23, 6.24, and 6.25. All distributions show a good level of agreement between the simulation prediction and data.
For the ramp-model correction, we assumed that our QCD simulation is modelled well enough in the hadronicπ‘ π‘ control region. To support this statement, we per- formed a check. When looking at the 80β120 GeV region in Jet 1π
SD(Figure6.26), the relative contribution of QCD and π‘ π‘ is similar and the data matches the MC
Figure 6.21: This figure shows the results of the βramp-modelβ fit (solid blue line) in the hadronic π‘ π‘ control region, for 2016 (top left), 2017 (top right), and 2018 (bottom) datasets. The black points here represent the residual data (data subtracted with non-top backgrounds) divided by the π‘ π‘ simulation in different bins of πjj
T. The fit uncertainties (shown with the dotted blue line) are propagated as systematic uncertainties for this correction in the final signal extraction fit.
prediction relatively well. Thus, this region shows that an arbitrary increase or decrease (let us say by a factor of 2) to the QCD background yield from simulation would most likely be incompatible with the observed data.
Nevertheless, we tried changing the QCD background yield per bin by a factor of 2 and 0.5 and then re-derived the ramp-model corrections. We observed that the impact of varying the QCD background in this manner is already covered by the total uncertainty on the π‘ π‘ recoil correction considered in the nominal fit. Considering these two pieces of evidence together, we can rule that the impact of the QCD background is negligible for this correction.
Top jet mass scale
We would also like to ensure that the top mass scale as modelled by the regressed mass for top jets is well simulated in the hadronic π‘ π‘ control region, defined in Section6.4.2.1. The regressed mass distribution is shown in Figure6.27for different bins of the BDT score. The mass distributions are then fitted to a Gaussian, and the
0 100 200 300 400 500 600 700 800 900 1000
(GeV)
jj
pT 0
0.5 1 1.5 2
data/mc
0 100 200 300 400 500 600 700 800 900 1000
0 200 400 600 800 1000 1200 1400
Events
V+jets,VV QCD tt+jets Data CMSPreliminary 35 fb-1 (13 TeV)
0 100 200 300 400 500 600 700 800 900 1000
(GeV)
jj
pT 0
0.5 1 1.5 2
data/mc
0 100 200 300 400 500 600 700 800 900 1000
0 200 400 600 800 1000 1200 1400
Events
V+jets,VV QCD tt+jets Data CMSPreliminary 35 fb-1 (13 TeV)
0 100 200 300 400 500 600 700 800 900 1000
(GeV)
jj
pT 0
0.5 1 1.5 2
data/mc
0 100 200 300 400 500 600 700 800 900 1000
0 200 400 600 800 1000 1200 1400
Events
V+jets,VV QCD tt+jets Data CMSPreliminary 35 fb-1 (13 TeV)
Figure 6.22: The distribution of the π
T of theπ‘ π‘ system is shown for events in the hadronicπ‘ π‘control region after applying the ramp-model recoil correction, for 2016 (left), 2017 (middle), and 2018 (right) datasets separately.
400 600 800 1000 1200 1400 1600 1800 2000 2200
(GeV)
j1
pT 0
0.5 1 1.5 2
data/mc
400 600 800 10001200 14001600 1800 20002200
0 200 400 600 800 1000
Events
V+jets,VV QCD tt+jets Data CMSPreliminary 35 fb-1 (13 TeV)
400 600 800 1000 1200 1400 1600 1800 2000 2200
(GeV)
j1
pT 0
0.5 1 1.5 2
data/mc
400 600 800 1000 12001400 16001800 2000 2200
0 200 400 600 800 1000
Events
V+jets,VV QCD tt+jets Data CMSPreliminary 35 fb-1 (13 TeV)
400 600 800 1000 1200 1400 1600 1800 2000 2200
(GeV)
j1
pT 0
0.5 1 1.5 2
data/mc
400 600 800 100012001400 1600180020002200
0 200 400 600 800 1000
Events
V+jets,VV QCD tt+jets Data CMSPreliminary 35 fb-1 (13 TeV)
Figure 6.23: The distribution of theπ
Tof the AK8 Jet 1 (jet with highestπ
Xbbscore) is shown for events in the hadronicπ‘ π‘ control region after applying the ramp-model recoil correction, for 2016 (left), 2017 (middle), and 2018 (right) datasets separately.
location of the fitted peak is plotted as a function of the event BDT and shown in Figure6.28. We observe that the simulation agrees with the data, and predicts the mass scale very well in all BDT bins.
Modeling of second jet regressed mass in theπ‘ π‘ control region
As mentioned previously, the variable used for the final signal extraction is the regressed mass of the second jet. We would like to ensure that this variable is well modelled by the simulation prediction in the hadronicπ‘ π‘ control region. We apply the following corrections and check the distribution of the Jet 2 regressed mass:
β’ corrections for theπ
Xbbscore shape,
β’ corrections for the recoil of theπ‘ π‘system, and
400 600 800 1000 1200 1400 1600 1800 2000 2200
(GeV)
j2
pT 0
0.5 1 1.5 2
data/mc
400 600 800 10001200 14001600 1800 20002200
0 200 400 600 800 1000
Events
V+jets,VV QCD tt+jets Data CMSPreliminary 35 fb-1 (13 TeV)
400 600 800 1000 1200 1400 1600 1800 2000 2200
(GeV)
j2
pT 0
0.5 1 1.5 2
data/mc
400 600 800 1000 12001400 16001800 2000 2200
0 200 400 600 800 1000
Events
V+jets,VV QCD tt+jets Data CMSPreliminary 35 fb-1 (13 TeV)
400 600 800 1000 1200 1400 1600 1800 2000 2200
(GeV)
j2
pT 0
0.5 1 1.5 2
data/mc
400 600 800 100012001400 1600180020002200
0 200 400 600 800 1000
Events
V+jets,VV QCD tt+jets Data CMSPreliminary 35 fb-1 (13 TeV)
Figure 6.24: The distribution of the π
T of the AK8 Jet 2 (jet with second highest πXbb score) is shown for events in the hadronic π‘ π‘ control region after applying the ramp-model recoil correction, for 2016 (left), 2017 (middle), and 2018 (right) datasets separately.
0 500 1000 1500 2000 2500
(GeV) mjj
0 0.5 1 1.5 2
data/mc
0 500 1000 1500 2000 2500
0 500 1000 1500 2000 2500
Events
V+jets,VV QCD tt+jets Data CMS Preliminary 137 fb-1 (13 TeV)
Figure 6.25: The distribution of the dijet mass is shown for events in the top control region using the full Run 2 dataset corresponding to 137 fbβ1.
β’ corrections to the regressed mass scale and mass resolution, as as recom- mended centrally for all CMS data analyses [139].
We observe good agreement between the pre-fit predicted shape and the data, as shown on the left in Fig.6.29. Furthermore, we verified that the residual disagree- ment observed is covered by the jet mass resolution uncertainty, by performing a fit in which we allowed that nuisance parameter to float within its constraints. The post-fit distribution of the jet 2 regressed mass is shown on the right of Figure6.29, where we observe improved shape agreement with data.
Figure 6.26: The Jet 1 π
SD in the π‘ π‘ hadronic control region. The 80β120 GeV region shows that a factor of two increase or decrease to the QCD background would be incompatible with the observed data.
BDT shape uncertainty
Lastly, we also check the event BDT score distribution in theπ‘ π‘all-hadronic control region. For this, we tighten the selection on theπ
32of both AK8 jets toπ
32 < 0.39, in order to achieve very highπ‘ π‘purity in all BDT bins. Figure6.30shows the event BDT shape in the control region, after applying theπ
Xbbshape corrections and the π‘ π‘ recoil corrections derived in the previous sections. The left plot shows the BDT distribution in theπ‘ π‘all-hadronic control region with the nominal binning. The right plot shows the same BDT distribution, but with the last bin split into three bins, to confirm that variations in bin size do not alter the data to simulation agreement.
From the left plot of Figure6.30, we subtract all non-π‘ π‘ background from the data and measure a small difference between the residual data and the π‘ π‘ simulation prediction. This small difference is propagated as a shape systematic uncertainty for theπ‘ π‘background prediction in the final signal extraction fit.
6.4.2.2 QCD
Since QCD is a relatively more difficult process to model accurately with simulation, we would like to estimate it using a data-driven approach from a signal-depleted region. For this purpose, we will use the Fail region, mentioned previously in Sec.6.4.1.3, defined as the phase space of events where Jet 2π
Xbb< 0.95 and BDT
50 100 150 200 250 300 350 400 450 500
regressed mass (GeV) j2
0 0.5 1 1.5 2
data/mc
50 100 150 200 250 300 350 400 450 500
0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200
Events
V+jets,VV QCD tt+jets Data CMSPreliminary 137 fb-1 (13 TeV)
50 100 150 200 250 300 350 400 450 500
regressed mass (GeV) j2
0 0.5 1 1.5 2
data/mc
50 100 150 200 250 300 350 400 450 500
0 50 100 150 200 250 300 350
Events
V+jets,VV QCD tt+jets Data CMSPreliminary 137 fb-1 (13 TeV)
50 100 150 200 250 300 350 400 450 500
regressed mass (GeV) j2
0 0.5 1 1.5 2
data/mc
50 100 150 200 250 300 350 400 450 500
0 20 40 60 80 100 120 140
Events
V+jets,VV QCD tt+jets Data CMSPreliminary 137 fb-1 (13 TeV)
50 100 150 200 250 300 350 400 450 500
regressed mass (GeV) j2
0 0.5 1 1.5 2
data/mc
50 100 150 200 250 300 350 400 450 500
0 20 40 60 80 100 120 140
Events
V+jets,VV QCD tt+jets Data CMSPreliminary 137 fb-1 (13 TeV)
Figure 6.27: The top mass in different BDT bins (BDT range of top left: 0 - 0.00008;
top right: 0.00008 - 0.0002; bottom left: 0.0002 - 0.0004; bottom right: 0.0004 - 1.0) for the full Run2 dataset. Here, the top recoil correction and the π
Xbb shape correction are already applied.
score > 0.03 (see Fig.6.14). The fail region is enriched in backgrounds and can be used to extrapolate the shape of the QCD in the signal region as :
π
π π π π π
QCD (Jet 2π
reg) =π πΉπ(Jet 2π
reg) β π
π πππ
QCD(Jet 2π
reg) (6.9) where π
π π π π π
ππΆ π· (Jet 2π
reg) and π
π πππ
ππΆ π·(Jet 2π
reg) are the QCD regressed mass distri- butions of Jet 2 in the π-th pass region (signal event category 1, 2 and 3) and fail regions, respectively. π πΉπ(Jet 2π
reg) is aπ-th order polynomial transfer factor for theπ-th pass region, as a function of the Jet 2π
reg. This is known as the parametric alphabet fit method. The gluon fusion and VBF single Higgs boson production background exhibits the same shape as the QCD multijet background, as the second jet in those events originate from initial state radiation (ISR), same as the second
0 0.0002 0.0004 0.0006 0.0008 0.001 Event BDT score 150
155 160 165 170 175 180 185 190
Fitted top mass [GeV]
MC Data
Figure 6.28: The fitted top mass (from a Gaussian fit to Figure 6.27) is shown as a function of the event BDT. Here, the top recoil correction and the π
Xbb shape corrections are already applied.
60 80 100 120 140 160 180 200 220
regressed mass (GeV) j2
0 0.5 1 1.5 2
data/mc
60 80 100 120 140 160 180 200 220
0 500 1000 1500 2000 2500
Events
others QCD tt+jets
Total unc Data
CMSPreliminary 137 fb-1 (13 TeV)
60 80 100 120 140 160 180 200 220
regressed mass (GeV) j2
0 0.5 1 1.5 2
data/mc
60 80 100 120 140 160 180 200 220
0 500 1000 1500 2000 2500
Events
others QCD tt+jets
Total unc Data
CMSPreliminary 137 fb-1 (13 TeV)
Figure 6.29: The distribution of the jet 2 regressed mass is shown for events in the top all-hadronic control region, for the full Run 2 dataset corresponding to 137 fbβ1. The pre-fit result is shown on the left, and on the right we show a post-fit result where the jet mass resolution uncertainty nuisance parameter was allowed to float within its constraints.
jet in QCD multijet production. Therefore, the QCD background prediction from this method also predicts the contribution from gluon fusion and VBF single Higgs Boson background.
The mass shape of the QCD multijet background in the fail region (ππ πππ
ππΆ π·(Jet 2π
reg)) is defined as the data subtracted with all non-QCD backgrounds as predicted by the simulation (the π‘ π‘ background prediction used here includes all the corrections
0 0.0010.0020.0030.0040.0050.0060.0070.0080.009 0.01
Event BDT
0 0.5 1 1.5 2
data/mc
0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 1
10 102
103
104
105
Events
V+jets,VV QCD tt+jets Data CMSPreliminary 137 fb-1 (13 TeV)
0 0.0010.0020.0030.0040.0050.0060.0070.0080.009 0.01
Event BDT
0 0.5 1 1.5 2
data/mc
0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 1
10 102
103
104
105
Events
V+jets,VV QCD tt+jets Data CMSPreliminary 137 fb-1 (13 TeV)
Figure 6.30: The event BDT distribution in the π‘ π‘ all-hadronic control region is shown for the data and simulation prediction, for the full Run2 dataset. The binning edges are [0.000-0.00008], [0.00008-0.0002], [0.0002-0.0004], [>0.0004] for the left plot, and [0.000-0.00008], [0.00008-0.0002], [0.0002-0.0004], [0.0004-0.0005], [0.0005-0.0007], [>0.0007] for the right plot. The left plot is used to derive uncer- tainties for the BDT shape modeling in theπ‘ π‘simulation.
described in Section6.4.2.1). The Jet 2 regressed mass shape of the QCD multijet background in the different event categories (π
π π π π π
ππΆ π· (Jet 2 π
reg)) is then estimated from a simultaneous likelihood fit to the pass and fail regions. The order of the transfer factor for each individual event category is determined by performing a F-test and a goodness-of-fit (GOF) test. We will first describe these tests and then explain the procedure to derive the polynomial order of the transfer factors.
F-Test and goodness-of-fit (GOF) Test
We perform the Fisher test (F-Test) to determine the minimum number of parameters that can describe the transfer factors well. For two fit models, 1 and 2, where model 1 withπ
1parameters is βnestedβ within model 2 withπ
2parameters (p2 > p1), F-test determines whether model 2 gives a significantly better fit to the data. For example, a polynomial function of zero-th order is nested within a 1st order polynomial. A random variable X will follow the F-distribution with parameters (π
1, π
2) π = π
1/π
1
π2/π
2
(6.10) whereπ
1 andπ
2 are independent variables that follow chi-squared distributions, withπ
1andπ
2degrees of freedom, respectively.
For data that is distributed like a Gaussian, we have the chi-squared distribution defined as
π2=βοΈ
π
(ππβππ)2 π2
π
(6.11) where ππ Β±ππ is the i-th measured data point with rms deviation ππ, and ππ is the model prediction. For the same data and model, the likelihood is given by:
L =Γ
π
1
βοΈ
2π π2 π
πβ(ππβππ)2/2ππ2. (6.12)
The goodness-of-fit (GOF) test is a test of the null hypothesis in the absence of an alternate hypothesis. In a situation where we only have the data and a model to fit it with, one can always come up with an alternative model where ππ = ππ at every measured value. Such a model is called a saturated model. In Eqn.6.12, if we set
ππ =ππ, the saturated model [79,223,224] becomes:
Lπ ππ‘ π’π ππ‘ π π =Γ
π
1
βοΈ
2π π2
π
. (6.13)
The ratio of the two likelihoods (Eqn.6.12and6.13),π, is then π=Γ
π
πβ(ππβππ)
2/2π2
π, (6.14)
and more importantly,
π2=β2π ππ. (6.15)
Thus, the likelihood ratio asymptotically follows a chi-squared distribution and is a suitable candidate for a GOF test. In case of Poissonian distribution, one can write the GOF test statistic as
β2π ππ=2
πππ ππ
βοΈ
π
(ππ βππ+ππΓπ π(ππ/ππ)). (6.16)
Eqn.6.16asymptotically follows a chi-squared distribution withπππππ β ππ degrees of freedom, where ππ is the polynomial order. Therefore, we can construct the F-statistic (from Eqn.6.10) as,
πΉ =
β2 logπ
1+2 logπ
2
π2βπ
1
β2 logπ
2
πbinsβπ
2
. (6.17)