3.5 Application to the SIR model
3.5.4 Sensitivity to the initial point
The final point we seek to investigate regarding the SIR example is the convergence prop- erties and sensitivity due to the dispersion of the starting points. In order to understand the convergence of the search we use search parameter values of,
θ0, Np = 100,δ = (0.5,0.5),s=ςL/2, B = 1000, M = 10000, = 0.4 .
The choice of δ and s are motivated by the previous discussions. We choose to use 10000 iterations to allow the search a reasonable number of steps to converge regardless of where the initial point is in the parameter space. To assess the convergence we sample 1000 starting points from the prior a distance 0.5 from the nearest boundary and these are shown in Figure 3.7.
1.00 1.25 1.50 1.75 2.00 0.5
1.0 1.5 2.0 2.5 3.0
R0
1/γ
Np= 5
1.00 1.25 1.50 1.75 2.00 0.5
1.0 1.5 2.0 2.5 3.0
R0
1/γ
Np = 10
1.00 1.25 1.50 1.75 2.00 0.5
1.0 1.5 2.0 2.5 3.0
R0
1/γ
Np = 25
1.00 1.25 1.50 1.75 2.00 0.5
1.0 1.5 2.0 2.5 3.0
R0
1/γ
Np = 50
1.00 1.25 1.50 1.75 2.00 0.5
1.0 1.5 2.0 2.5 3.0
R0
1/γ
Np= 100
Figure 3.6: Scatter plot of MAP estimates, indicated by the blue points, for different numbers of particles. The contour plot shows the posterior distribution. Dashed black lines indicate the estimate of the MAP from the AKDE approach. The number of particles Np is shown in the title of each subplot.
0.5 1.0 1.5 2.0 2.5 3.0
0.0 2.5 5.0 7.5 10.0
R0
1/γ
Figure 3.7: Scatter plot of the starting points, indicated in blue, for the 1000 searches.
The posterior density is indicated by the contour plot and the estimate of the MAP from the AKDE approach is indicated by the black dashed lines.
51 3.5. Application to the SIR model
1.2 1.5 1.8 2.1 2.4
0.5 1.0 1.5 2.0 2.5
R0
1/γ
Figure 3.8: Scatter plot of MAP estimates, indicated by the blue points, for the 1000 searches. The contour plot shows the posterior distribution. Dashed black lines indicate the estimate of the MAP from the AKDE approach.
0 500 1000 1500 2000 2500
1.4 1.6 1.8 2.0 2.2
Iteration R0
0 500 1000 1500 2000 2500
1 2 3 4
Iteration
1/γ
Figure 3.9: Plot of the sample path behaviour of 1000 independent searches. The grey band indicates 95% confidence interval of the search estimates and the black line denotes the average estimate across the searches. The red line indicates the estimate of the MAP from the AKDE approach.
Figure 3.8 shows a contour plot of the posterior distribution of (R0,1/γ). The AKDE estimate of the MAP is indicated using the black dashed lines and the blue dots show the MAP estimates from the SPSA search. We see relatively strong convergence to the region of highest posterior density. At first, the spread across this region could appear to be slightly concerning, but this is simply the product of the dispersed set of starting points and a the stochastic nature of the search. Improved estimates can be obtained by rerunning the search with a smaller δ, however for tuning purposes all these points will suffice. The sample path behaviour of the search is shown in Figure 3.9. We see that the mean over the searches (indicated by the black line) converges to the MAP estimate obtained through the AKDE method (indicated by the red line). Convergence is seen to occur after around 1000 iterations and this agrees with the results of study of the choices of δ and s.
Method R0 1/γ
AKDE 1.533 1.232
SPSA 1.557 (0.048) 1.243 (0.061)
Table 3.1: MAP estimates for the SIR model. The estimates for the SPSA method are averaged across the 1000 independent searches. Each row of the table corresponds to the method used and each column corresponds to a parameter. Standard deviations are given in parentheses and are only provided for the search algorithm as AKDE returns the same result for the same bandwidth and choice of kernel.
Method Loss (from search) Loss (post search)
AKDE − 26.807 (0.255)
SPSA 26.08 (0.286) 26.801 (0.220)
Table 3.2: Loss values for each method. Each row of the table corresponds to the method used and each column corresponds to the measured loss. Standard deviations for the evaluated loss are given in parentheses. The “Loss (from search)” column represents the best estimate of the loss found during the SPSA algorithm. The “Loss (post search)”
is the estimated loss at the average value obtained through the different methods when evaluating the loss function 25000 times.
In Table 3.1 we see that the average result of the SPSA algorithm for each method and the AKDE algorithm are in agreement. In Table 3.2 we show the average loss estimated during the searches (from search) and following the search (post search). To obtain the loss estimates post search we run the particle filter at the average MAP estimate and
53 3.5. Application to the SIR model average the loss function estimates. The post search loss values show that the search algorithm returns a better average estimate of the MAP compared to the AKDE method.
An interesting result is that the average loss value found during the search is much lower than the estimate found by the AKDE approach. This can be understood better by looking at the MAP estimates which yielded the better outcomes. The point with the best value of the loss function (during the search) wasθ1 = (1.64,1.36) with an estimated loss function value off(θ1) = 25.78.
26.0 26.5 27.0 27.5
0.0 0.5 1.0 1.5
f(θ)
Density
Figure 3.10: Kernel density estimate of the loss values at the best estimate of the MAP.
Average loss value at the AKDE MAP is indicated by the solid black line. The estimated value of the loss function found during the search is indicated by the dashed red line.
Figure 3.10 shows the distribution of the loss function values evaluated at the best estimate of the MAP. The loss function was evaluated 50000 times at θ1 and the kernel density estimate of the loss function value was obtained, this is shown by the blue curve.
The dashed red line is the estimated value of the loss function obtained during the search algorithm. The black line indicates the value of the loss function for the point obtained through the AKDE method. It is clear that the loss function was underestimated for this particular point and hence the search stored this as the best estimate and the search got stuck. We can still see that this point would be slightly better than the estimate from the AKDE method, but it was grossly underestimated.
Restarting the search from this point would result in re-evaluation of the MAP esti- mate and start the search over again. Another approach is to the increase the number of particles which reduces the variance and hence the search is less likely to over- or under-estimate the loss function by a severe amount. Instead of increasing the number of particles we could parallelise the filters which would also reduce the variance in the like- lihood estimates and further improve the searching capabilities near the optimum. This
would be the case because the variance in the estimates would reduce and so our search would perform better.
Another workaround is to reevaluate the loss function at the current minima. This drastically reduces the chance of the search getting stuck in this fashion as the loss function is unlikely to be drastically over- or under-estimated by a large amount over several tries.
However this approach requires twice the number of loss evaluations hence leading to around twice the runtime. This is due to the overall runtime of the search effectively being the runtime of running the particle M (the number of iterations) times. This approach of reevaluation is something we have observed to be a more critical choice when a more precise estimate of the MAP is necessary. This approach is what we use for the SEIAR model in the next example.
Another way to improve the search is to apply parallelism to the search itself. This can be done by running multiple, independent searches on different cores from the same initial point. Through these searches we can store the best obtained estimate and then average across these searches and obtain an improved estimate. This approach means that we obtain multiple estimates of the MAP per search at the computational cost of running the algorithm sequentially.
This version of the algorithm was tested on the same example using 4 cores and we saw a halving in the variance of the MAP estimates obtained. While the implementation of parallelising the code is not trivial in most cases, it offers a drastic improvement. We used 80 particles in the filter and used the optimal values of the parameters for the searches.
1.2 1.5 1.8 2.1 2.4
0.5 1.0 1.5 2.0 2.5
R0
1/γ
Figure 3.11: Contour plot of the posterior density. Estimates of the MAP from the SPSA algorithm are indicated in blue and the estimate of the MAP from the AKDE approach is indicated by the black dashed lines.
55 3.5. Application to the SIR model
Method R0 1/γ
AKDE 1.533 1.232
SPSA (serial) 1.557 (0.048) 1.243 (0.061) SPSA (parallel) 1.546 (0.024) 1.240 (0.031)
Table 3.3: MAP estimates for the SIR model. The estimates for the two versions of the SPSA method are averaged across the 1000 independent searches. See the caption of Table 3.1 for details.
Method Loss (from search) Loss (post search)
AKDE − 26.807 (0.255)
SPSA (serial) 26.080 (0.287) 26.801 (0.220) SPSA (parallel) 26.698 (0.088) 26.799 (0.221)
Table 3.4: Loss values for each method. Each row of the table corresponds to the method used and each column corresponds to the measured loss. See caption of Table 3.2 for details.
Figure 3.11 shows that running multiple searches from the same initial point in parallel tightened the spread about the MAP. This is reflected in the variances which have halved from the serial algorithm (Table 3.3). This suggests that considerable improvement arises from running multiple searches in parallel from the same starting point and averaging the result.
Noticeably there is still some minor difference between the MAP estimates from the AKDE approach and the SPSA approach. Table 3.4 shows that the average estimated value of the loss function at the MAP estimate is lowest for both the SPSA approaches (which are comparable to one another). Regardless of these minor differences, we can quite clearly observe that estimates converge and lie well within the highest posterior density region.