Population size as a function of magnitude threshold

4.5. Population size as a function of magnitude threshold 93

4.5 Population size as a function of magnitude

14 15 16 17 Ithr(magnitude)

0 20 40 60 80 100

Nc10(I≤Ithr)

(a)

13 14 15 16 17

Ithr(magnitude) 5

10 15 20 25

Nc10/(Ithr<I≤Ithr+0.5)

(b)

Figure 4.6: (a) Cumulative distribution function of the number of captured individuals as a function of the brightness threshold. (b) Number of individuals captured per magnitude bin. The shape may be indicative of a Gaussian

distribution.

sensitive to these few individuals who comprise a small fraction of the dataset. The low count statistics as the threshold in increased to fainter magnitudes is reflected in the larger standard error of the estimate. In Figure 4.7, for both (a) and (b), the Chao estimates are particularly affected in efficiency at thresholds Ithr = 14.1,14.6,15.4, 16.2, and 16.9 magnitude. The thresholds coincide with a sudden rate of increase in captures. At Ithr = 15.7 and 15.8 magnitude, a similar small spike is seen with both the Mh and Mth, Chao and Darroch estimators. These thresholds coincide with a decrease in the rate of captures with respect to the threshold (cf. Figure 4.6). The bias-corrected estimators seem overall more robust in the estimation and correcting for the associated standard error and overcome issues with non-convergence, which can be seen in Mth

Darroch at Ithr = 16.5 magnitude. In summary, the heterogeneous set of models, Mh

Chao, Darroch, and Poisson, are applicable for handling the unequal capture probability concerning an imposed brightness threshold. Bias-correction plays an important role in offsetting this heterogeneity.

4.5. Population size as a function of magnitude threshold 95

14 15 16 17

Threshold (magnitude) 0

25 50 75 100 125 150 175 200

MhChao (LB) MhDarroch MhPoisson2

(a)Mh

14 15 16 17

Threshold (magnitude) 0

25 50 75 100 125 150 175 200

MthChao (LB) MthDarroch MthPoisson2

(b)Mth

Figure 4.7: Closed population estimates as a function of threshold.

14 15 16 17

Threshold (magnitude) 0

50 100 150 200 250 300 350

bNbc

MhChao (LB) BC MhDarroch BC MhPoisson2 BC

(a) Bias correctedM_h

14 15 16 17

Threshold (magnitude) 0

50 100 150 200 250 300 350

bNbc

MthChao (LB) BC MthDarroch BC MthPoisson2 BC

(b) Bias corrected M_th Figure 4.8: Bias corrected closed population estimates as a function of threshold.

...

-♦-

♦- -♦-

-♦- -♦- -♦-

--t-·-t·

--t-

By decreasing the OGLE data’s time resolution, the approach that we have taken here was a tactic to avoid imputation or interpolation. It did, however, change the underlying variability structure of the sources of the population. Whilst this is, admittedly, far poorer time resolution than we explored in the simulated populations (using X-ray lightcurves), it should be noted that lower time resolution does not affect estimation here. The temporal binning is a minor concern compared to the heterogeneous effect due to the distribution of sources in brightness space. No matter how the capture occasions are binned, u1 (i = 1) is always tens of times larger than ui (i ≥ 2) and similarly, fi is capturing large numbers of sources exactly t times (where t is the number of capture occasions in the study). The capture data is no longer Poissonian due to this heterogeneity.

On page 88, I listed several reasons that may explain why the Chao and Poisson models do not estimate any better than the lower bound of the number of units captured during the study. We know that the optical outbursts of HMXBs do not always coincide with the X-ray outbursts and that the optical outburst can reach up to a few magnitude.

However, the most extensive variability that is displayed by a source in this dataset is ∼ 1 magnitude, whilst most of the sources have ∆Imax <0.3 magnitude, plotted in Figure 4.9. For this reason, we require a relatively deep threshold, i.e. faint enough, to estimate within the number range of the currently known size of the HMXBpopulation.

0.0 0.2 0.4 0.6 0.8 1.0 1.2

∆Imax (magnitude) 0

5 10

Numberofsources with∆Imax

Figure 4.9: Distribution of sources binned according to maximum variability

∆Imax displayed in OGLE-IV.

4.5. Population size as a function of magnitude threshold 97 There are numerous other ways that one could attempt the data reduction, for instance:

• One approach could be to ’offset’ all of the sources to the same quiescent, mean, or median magnitude. It would significantly alter the heterogeneous effect of the

’always on’ sources. The choice of threshold is, in this case, still not arbitrary because a bulk of the sources show variability ∆I < 0.2 magnitude. It would be inadvisable to lower the threshold to such an extent that it captures variability that is not associated with an outburst of an HMXB for some, whilst it is the defining characteristic for others.

• Another option is to augment the dataset. Augmentation may involve interpolat- ing each source’s lightcurve on short timescales (e.g. a few weeks at most) using Gaussian processes. Stratified sampling may subsequently be implemented with a Monte Carlo approach, similar to steps applied in §3. The capture histories are created and finally, the population sizes estimated with associated confidence intervals.

One of the benefits of taking the route that we have, i.e. leaving the sources undis- turbed in brightness space, is that the imposed threshold attains physical meaning.

Design-based methods, which were not implemented here, may also help address the heterogeneity within the sample, regarding the ‘Snowshoe hare example’ in Baillargeon and Rivest (2007, §2.1). In the example, they design the matrix, Xi, used in the log- linear regression (cf. §2.2.2) to offset the heterogeneity caused by a fraction of hares that were captured at every occasion. Heterogeneous effects and the covariates ex- plaining the unequal catchability require further scrutiny to characterise the underlying population size fully.

A blatant issue is present within this dataset that prevents us from drawing astronom- ically valid inferences; the sources in this survey have been selected for monitoring and are not encountered at random within the populated survey area. We can therefore not comment on the completeness of the HMXB population within the SMC, even for flux-limited cases. An authentic dataset will need to be generated by returning to the OGLE-IV survey to extract and classify HMXB. However, since the focus of this analysis was on methodology and the interpretation of the statistical results, we opted to treat this dataset as if it were authentic. Future work on this particular population should make attempts at classification and relevant astrophysical parameters that influ- ence present day population size may be incorporated into the problem. Furthermore, the next chapter presents a more realistic approach using survey data by implementing a combination of closed and open population techniques, called ’Robust Design’.

Chapter 5 Application to astronomical population

datasets: Part II

Dalam dokumen Capturing transients: an application of biostatistics to astronomy (Halaman 110-116)