Determining Changes in the Covariance Structure of Gaussian Processes

In the rightmost plot (Env3) the environment changes 'Bumpy ground 2' -> 'Plane ground'. When sequential data are stationary, that is, the parameters in the underlying distribution do not change, the prediction of future events is feasible. Trend filtering also determines change points with the assumption of piecewise linearity in the sequential data [7].

Another work has investigated to detect changes in the mean function of GPs by proposing likelihood ratio tests [15]. First, we propose new likelihood ratio tests which detect structural breaks in GP covariance.

CPD in the Mean of Gaussian Processes

Gaussian Processes

Hypothesis Tests for Structural Break

We want to determine if they come from the same regression model as the previously observed samples. The sum of the squares of the residuals of true data and the estimated values under the null hypothesis H0 (β1 =β2 =β) becomes. The first quadratic term has the rankn−pan and the second quadratic term has the rankm−p respectively.

Since the rank of Q1 is less than or equal to the sum of the ranks of Q2 the rank of Q3, the rank of Q3 must be greater than or equal to then+m−p−(n+m−2p) =p. In combination with the condition that the rank of Q3 cannot be higher than p, we can say that the rank of Q3 is equal to p. CUSUM, which stands for cumulative sum, tests whether a shift exists in a parameter of the probability distribution.

The CUSUM test collects the residuals of a target parameter value and detects a shift when the sum exceeds a specified threshold. If there is no shift in the average, Sn± will continue to fall or rise by about K/2. When there is a shift in the mean, the cumulative residual from the original mean will begin to increase in the case of a positive jump and the difference from the minimum cumulative sum will exceed the threshold.

CPD in the Mean of Gaussian Processes

In the mean change detection problem in a GP, the null hypothesis assumes that the samples follow a GP of zero mean, which can be written as. The associative alternative hypothesis corresponding to a specific time t, on the other hand, assumes that there exists a change point of jump size b in the mean function at time t that can be written as Unifying across the set of change point candidates, the alternative hypothesis states that there exists more than or equal to one change point with a jump size bass written below.

From equation (2.2) to equation (2.3) we take the derivative in terms of b to get the maximum value. When we find a correct threshold Rn,δ, we can bound the conditional error probability with δ ϕn(TGLRT)≤δ under the sufficient condition of b[15]. For every two rows, the top row shows the likelihood ratio for each possible change point, and the bottom row shows the corresponding synthetic data.

The synthetic data is generated with a change point in the middle of the consecutive data. The proposed threshold value is plotted as a red horizontal line in the likelihood ratio plot, which is the top row of every two rows. We can see that in all cases the likelihood ratio is maximized at the actual change point, and the bound works correctly to test changes.

However, the proposed limit is too general, so that it only applies to the total number of data and the desired detection error, but not to the scale of the data, nor to the scale of the covariance values, which can affect the scale of the test statistics.

Figure 2.1: Synthetic data with a change in the mean function of a GP at the middle of 100 points of sequential data and its likelihood ratio varying variance and lengthscale hyperparameters

Bayesian Online CPD

In the following chapters, we propose an improved BOCPD algorithm that overcomes the above drawbacks. This section presents our new hypothesis tests to detect change points in the covariance structure of a GP. Although there have been many works that have concentrated on the mean change in the GPs, changes in the covariance structure of GPs have not yet been fully explored.

In this subsection, we discuss the motivating examples of covariance changes in time series analysis. As we can see in Figure 3.1, covariance matrix can represent different changes in synthetic data. Σbreak represents a structural break in the covariance matrix where the process before and after the change point are independent.

These types of changes are not just theoretical examples, but also take place in the real world. We can see that the length scale, i.e. the smoothness of the time series, changes after September 1987. The leftmost graph in Figure 3.3 presents the different samples generated from a GP with an embedded covariance change at time step 5.

Figure 3.2a shows the Apple share price from July 1985 to July 1989 representing length scale change.

Figure 3.1: Synthetic data with changes in the covariance structure. Figure 3.1a shows samples generated from the covariance matrix with a structural break at the middle

Problem Setting

In the remainder of this section, we first recall the notation for constructing a likelihood ratio test for a general change in covariance, and then turn to our thesis interest, formulating a likelihood ratio test for a structural break in covariance. The far right graph shows the GP posterior from the GP regression with two consecutive kernels representing the structural covariance break.

Tests for the Covariance Structural Break

Then, when the number of consecutive data follows the null hypothesis as defined in Section 3.3 and is bounded by [−V, V], the error rate at which the likelihood ratio test falsely detects a change is bounded as follows. When n number of consecutive data follows the alternative hypothesis as defined in Section 3.3 and is bounded by [−V, V], the error rate at which the likelihood ratio test falsely detects a change is bounded as follows. When we set the threshold to beRδ ≥Rn,δ,H0, we can ensure that the I error of the test is bounded by Lemma 3.3.2.

Similarly, if we set the threshold to beRδ≤Rn,δ,H1, the Type II error of the test can be guaranteed to be bounded by Lemma 3.3.3. Using Theorem 3.3.2, the statistical error probability of the proposed test can be bounded in any case for a structural covariance break of arbitrary kernels. Based on BOCPD, CBOCPD compensates for the change point distribution by loosening the assumption that the run length depends only on the run length of the previous time step, not the observed one.

In Algorithm 1, the first two lines initialize the parameters m and δ, where denotes half the window size and δ denotes the specified error margin of the likelihood ratio test. Conversely, if the result of the second test is 0, we reject the alternative hypothesis and fail to reject the alternative. If the likelihood ratio is maximized in the middle of the window and if the likelihood ratio at that point passes the tests, we decide that the time point is a change point and setP(rt= 0|rt−1, Xt−1(r )) = 1− δ, which reinforces the possibility of changes in the BOCPD framework.

There is another condition τ∗ =t that the probability is maximum in the middle of the window, set to avoid situations where the same time point is detected in several consecutive windows.

Figure 3.4: A horizontal line representing the range of thresholds guaranteeing the bounded type I error with the right-pointing arrow and the range of thresholds guaranteeing the bounded type II error under the alternative hypothesis

Theoretical Analysis of CBOCPD

On the contrary, when both tests fail, that is, T∗GLRT = 0, we assume that there is no change and reduce the prior of the occurrence of changes in the BOCPD algorithm. The Theorem 4.2.1 mainly refers that the performance of CBOCPD is not worse than BOCPD with various conditions. This is obtained from the assumption that the absolute prediction error with an incorrect run length must be greater than 0.

The second condition refers to one conditional prediction error being greater than the weighted sum of the other conditional prediction error over the possible lengths. This condition is set to ensure that the lower bound of the BOCPD prediction error is strictly greater than the CBOCPD prediction error. Finally, if the ratio between the upper and lower bounds of the absolute prediction error in the first condition is bounded at the upper bound, then the expected absolute error of CBOCPD is less than or equal to BOCPD.

Here the difference with condition of Theorem 4.2.1 is that the prediction error is calculated compared to the predictive mean given all the previously observed data. From the last condition, if the ratio between the upper bound and the lower bound of the absolute prediction error in the first condition is upper bound, then the expectation of the absolute error of CBOCPD is less than or equal to the BOCPD. Typically, we used the total size of the sequential data as T = 200, with two randomly generated change points.

Variance indicates the amplitude of variation of the regression function, and length scale indicates the smoothness of the function.

Table 5.1: Comparison of BOCPD,and CBOCPD over NLL and MSE on synthetic datasets.

Robot Simulation Data

Conventional BOCPD algorithm is vulnerable to this parameter setting as we can see in the figure. We present a new likelihood ratio test for detecting covariance structural breaks in the covariance of the GP. In this thesis we focused on abrupt change in the covariance structure of univariate time series.

Direkte belangrikheidskatting vir kovariaatverskuiwingaanpassing," Annals of the Institute of Statistical Mathematics, pp. Harchaoui, "Testing for homogeneity with kernel fisher dis- criminant analysis," in Proceedings of the Neural Information Processing Systems, 2008, pp. Song, "M -statistiek vir kernveranderingspuntopsporing,” in Proceedings of the Neural Information Processing Systems, 2015, pp.

Zacks, “Estimating the Current Mean of a Normal Distribution Subject to Changes with Time,” The Annals of Mathematical Statistics, pp. Nguyen, “Optimal change point detection in Gaussian processes,” Journal of Statistical Planning and Inference, pp Murphy, “Modelling changing dependency structures in multivariate time series,” in Proceedings of the International Conference on Machine Learning, 2007, pp.

Roberts, “Sequential Bayesian Prediction in the Presence of Change Points and Errors,” The Computer Journal, p. Rasmussen, “Gaussian Models of Process Change Points,” in Proceedings of the International Conference on Machine Learning, 2010, p. Choi, “Document reading for bayesian online change point detection,” in Proceedings of the Empirical Methods in Natural Language Processing, 2015, p.