4.2 The need for vetoes
4.2.1 Vetoes/flags for the high-mass search
4.2.1.5 Veto safety
Veto safety is extremely important because we do not want to accidentally veto any true GW signals. After the creation of all the data-quality flags for a given analysis time, we calculate each flag’s safety probability using hardware injections (see Section 3.3.1) as follows:
safety probability≡1−F(# of hardware injections vetoed−1;# expected to be vetoed), (4.14)
where F is the Poisson cumulative density function. If the safety probability is less than10−5, the flag is not used.
Chapter 5
Review of Multivariate Statistical Classification/Machine Learning
Machine learning is a gigantic field whose tools overlap with multivariate statistical classifiers. Multivari- ate statistical classification is the process of using multi-dimensional information to assign events into two or more categories, orclasses. Many algorithms for multivariate classification are supervised machine learning techniques, in which a set of events of known class (thetraining set) are used to train the classifier. Perhaps the most famous of these techniques is the artificial neural network (ANN), but there are a wide variety of techniques that offer better performance for particular problems. Other popular methods are support vector machines (SVMs) and decision trees. These algorithms are extremely useful when the dimensionality of the problem is too large for an analytical analysis or even a numerical regression analysis; they are also able to extract heretofore hidden correlations between the input dimensions. It is not obvious which machine learn- ing algorithm will be the best for a given problem; thus, it is often necessary to try several and pick one based on the results [92] [93].
This chapter presents a review of the three algorithms that are used in the analysis described in Chapter 6 and the analyses in Section 8.3 and Section 9.1. Let us define several terms and ideas which are common to multivariate classification problems:
• feature space: the n-dimensional space used to characterize the events, where each event is described by an n-dimensionalfeature vector;
• training set: a set of events of known class that are used by the training algorithm to create a trained classifier that is then used to guess the class of unknown events in an entirely deterministic way based on their feature vectors;
• validation set: in some algorithms, a separate set of events, also of known class, is used during the training process to test against or actively suppress overtraining;
• overtraining: overtraining occurs when a classifier correctly classifies all or most of the events in the training set, but does poorly at classifying events not in the training set but drawn from the same
distribution;
• generalization error: the distance between the error on the training set and the error on the testing set.
An overtrained classifier has a large generalization error.
• testing/evaluation set: a third set of events of known class, with no events that exist in the training and validation sets, which are ranked by a trained classifier in order to evaluate the performance of the classifier;
• robustness: robustness is an over-used descriptor that can mean: 1) classifiers are unlikely to get over- trained, even without using a validation set during training; 2) noise or missing data in the training set can still yield a strong classifier, or noise in the evaluation set does not prohibit good classification of the evaluation events; 3) classifiers can be used for a wide variety of problems.
The following sections review three machine learning algorithms used in this thesis. After training, each of these algorithms will, given an event, (deterministically) return a rank between 0 and 1 that describes how similar the event is to Class 0 versus Class 1 training events. This thesis describes three applications of machine learning:
• the separation of clean times (Class 0) from glitchy times (Class 1), see Chapter 6;
• the separation of accidental coincidences of instrumental/environmental noise triggers in the high-mass search (Class 0: high-mass background) from truly coincident signal-like triggers as found by the high- mass search (Class 1: high-mass signal), see Section 8.3;
• the separation of accidental coincidences of instrumental/environmental noise triggers in the ring- down search (Class 0: ringdown background) from truly coincident signal-like triggers as found by the ringdown-only search (Class 1: ringdown signal), see Section 9.1.
Setting a threshold on the rank between these classes allows us to classify unknown events into either Class 0 or Class 1. However, it is often useful to use the continuous rank rather than thresholding.
5.1 Artificial neural networks
The ANN is a machine learning technique based on the way in which data are processed in human brains [94, 95]. In the human brain, which is composed of a tremendous number of interconnected neurons, each cell performs only the simple task of responding to an input stimulus. However, when a large number of neurons form a complicated network structure, they can perform complex tasks such as speech recognition and decision-making.
A single neuron is composed of dendrites, a cell body, and an axon. When dendrites receive an external stimulus from other neurons, the cell body computes the signal. When the total strength of the stimulus is
greater than the synapse threshold, the neuron is fired and sends an electrochemical signal to other neurons through the axon. This process can be implemented with a simple mathematical model including nodes (analogous to the cell body), a network topology, and learning rules adopted to a specific data processing task. Nodes are characterized by their number of inputs and connecting weights (analogous to dendrites) and outputs (analogous to axons and synapses) [96]. The network topology (analogous to brain structure) is defined by the connections between the nodes. The learning rules prescribe how the connecting weights are initialized and evolve.