An adaptive econometric system for statistical arbitrage

The proposed system is compared with classical pairs trading for each of the markets examined. In this study, sensitivity analysis of the system is also conducted to investigate the robustness of the proposed system.

Introduction to financial trading

Trading of securities
Statistical arbitrage

Problem statement
Research objectives

Objective 1: Classification of securities from price data
Objective 2: Modelling of mean-reversion characteristic
Objective 3: Trade signal generation and risk management
Objective 4: Sensitivity analysis of proposed system

Beneficiaries
Research limitations

Security universe
Data granularity

Research methodology

Data acquisition and verification
Statistical tests implementation
Backtesting of system
Verification of results

Document conventions
Reading suggestions and document layout

Reading suggestions
Document layout

A model should be created that looks for statistical arbitrage opportunities by forming linear combinations of securities that have been divided into subgroups. Active investment managers and traders can also potentially benefit from the proposed research findings.

Figure 1-1: Entry/Exit signals of mean-reverting strategy

Overview of background study
High frequency trading
Arbitrage
Statistical arbitrage
Stationary processes

Augmented Dickey-Fuller test

Mean-reversion strategies
Correlation and cointegration

Correlation and dependence
Testing for unit roots and cointegration

Hedging positions
Volatility of security prices
Modelling volatility

Overview of ARCH models
ARCH(q) model specification
GARCH(p,q) model specification

Cluster analysis

K-means clustering
Affinity propagation clustering

Background review

As can be seen from equation (2.2), the general objective of the ADF test is to determine whether the hypothesis 𝜆 = 0 can be rejected. The critical values of the test are valid only asymptotically, which can be a weakness of the test.

Figure 2-1: Example of a stationary process

Overview of literature review

The efficient market hypothesis

Arguments for quantitative trading and active investing

Established models for statistical arbitrage

Minimum distance method
Arbitrage pricing theory model
Cointegration for statistical arbitrage

It can be assumed that these factors must be determined by the user of the model. If the two variables are integrated of order 1, the error term for the regression must be weakly stationary (𝐼(0)).

Statistical arbitrage in different markets

This would indicate that the spread price series moves around the equilibrium of 𝛼, since 𝜖𝑡 is assumed to be weakly stationary, but not necessarily identically and independently distributed (i.i.d.). It need only be malicious for an effective trade rule to be implemented.

Identifying related securities

Predicting market volatility

It concludes that the GJR-GARCH model most effectively modeled the daily volatility of 5 South African indices on the Johannesburg Stock Exchange (JSE). It can be concluded that if a deterministic trend is present in the price series, as is the case with most stock indices, an asymmetric GARCH model should be better than a standard GARCH model in modeling volatility.

Literature review

Although his GARCH model differs from that of Alberg, Shalit and Yosef [50], he reaches a similar conclusion that volatility can be modeled more accurately when allowing for either positive or negatively skewed returns. It can be expected that the asymmetric GARCH models will not offer an edge over standard GARCH models when modeling the volatility of (weakly) stationary price series where no deterministic trend is present.

General overview of methodology

Chapter layout
Implementation approach

This section provides an overview of the proposed model and discusses the standard pairs trading method that will be used for comparison as a benchmark. This section describes the data (type and granularity) to be used for testing the models.

Model descriptions

Implementation of the standard model

Formation Period
Trading Period

Implementation of adaptive model

Learning period
Trading period

For each of the pairs with the smallest SE values, a spread over the formation period is calculated as 𝑥𝑡 =𝑦𝑡𝑖. The moving average (𝜇𝑥) and standard deviation (𝜎𝑥) of the difference is calculated and stored for use in the trading period. Weights can be used to construct stationary series for the time period of the study period.

When the volatility of the weak stationary range increases, the market entry thresholds will increase. If the value of 𝑧 is lower than the negative of the GARCH updated threshold value, the stationary portfolio is bought.

Securities universe and sampling frequency

The Z-score will be used for comparison with the GARCH updated volatility forecast to time the entry of market positions. If the value of 𝑧 is higher than the GARCH updated threshold, a market position is taken such that the stationary portfolio is effectively sold short. When you are in a market position with the value 𝑧 crossing zero, the long or short position in the weighted (stationary) series is closed.

The default values are chosen to be comparable to those of the standard model (see [6]). Daily data is available for the entire period that the securities are listed on their respective exchanges.

Implementation of clustering techniques

Implementation of the k-means clustering algorithm
Implementation of the affinity propagation clustering algorithm

Building of initial graph
Clustering of data points
Responsibility updates
Availability updates
Considerations of AP clustering

The term "samples" is used for existing observations that are chosen as centers of the data, as opposed to "centroids" which do not necessarily have to be actual observations and can be created (as with k-means clustering). When 𝑘 = 𝑖 in equation (4.3), the responsibility is placed on the input preference to select point 𝑘 as an example. Setting a damping factor ensures that oscillations that will "overshoot" the solution are reduced (see [32] and [54] for details).

Negative values of availability will decrease the effective values of some input similarities 𝑠(𝑖, 𝑘′) in equation (4.3), removing relevant candidate instances from competition. As can be seen from Equation (4.5), the availability 𝑎(𝑖, 𝑘) is set to the addition of the self-responsibility 𝑟(𝑘, 𝑘) and the sum of the positive responsibilities that the candidate role model 𝑘 receives from other points.

Implementation of the Johansen cointegration test

Overview of the Johansen test
Review of the Johansen method
Johansen method: Input
Johansen method: Step 1

Hypothesis testing

In the first step of Johansen's cointegration test, it is necessary to estimate a VAR(𝑝 − 1) model for the matrix ∆𝑦𝑡. This task can be done by regressing ∆𝑦𝑖𝑡 on a constant and all elements of the vectors ∆𝑦𝑡−1,. In the third step of the Johansen cointegration test, the maximum likelihood estimates of the parameters are calculated.

In this case, the maximum value of the log-likelihood function is given by equation (4.20). In this case, there are no restrictions on the constant term when estimating the regressions in Section 4.5.4.

Implementation of GARCH model

Overview of GARCH models
Parameter estimation in the GARCH model
Gaussian quasi maximum-likelihood estimation
Fat-tailed maximum-likelihood estimation
Implementing the Nelder-Mead algorithm

Description of Nelder-Mead algorithm
Implementation overview of Nelder-Mead algorithm
Initial simplex construction
Simplex transformation algorithm

Maximum likelihood functions will be used with the Nelder-Mead method to find the optimal values of the GARCH model parameters given historical time series data. For the general GARCH(p,q) procedure, the likelihood function is maximized as a function of the included parameters 𝛼𝑖 and 𝛽𝑗. The obtained value in the parameter space is a Gaussian quasi-estimator of the maximum likelihood of the parameters of the GARCH(p,q) process.

In this study, the Nelder-Mead algorithm will be used extensively to estimate the parameters of the GARCH model in conjunction with the chosen log-likelihood function (see Sections 4.6.3 and 4.6.4). As an illustrative example, the steps of a simple Nelder-Mead search algorithm will be shown for the case where the minimum likelihood function (𝑓) is sought.

Figure 4-2: Centroid and initial simplex

Performance evaluation metrics

Compound annual growth rate
Sharpe ratio
Sortino ratio
Benchmark comparison metrics

Alpha
Beta
Information ratio

It is an indicator of downside risk and indicates the maximum possible loss that the portfolio could experience in the future. Together with the maximum drawdown, the duration of time from the start of the maximum drawdown to the point in time when the portfolio regains its pre-drawdown value is calculated. It can be expressed as the CAGR of the benchmark minus the CAGR of the portfolio.

The information ratio (or valuation ratio) is a measure of a portfolio's risk-adjusted return. Active return is the difference between the portfolio's return and the return of a given benchmark.

Methodology review

In general, a beta value of less than one indicates that a portfolio's equity curve is less volatile than the market. It can be calculated as the expected value of active return divided by a tracking error. The implementation details of the GARCH volatility model were discussed along with three likelihood function derivations.

This study will use the student t-likelihood function because the results of the literature review show that it generally provides the best fit when applied to financial data. To search for the parameter values of the GARCH model, the Nelder-Mead simplex search algorithm will be used.

Overview of evaluation

Backtesting system setup

Verification of system components

Affinity propagation clustering
K-means clustering
Comparison of clustering techniques
Johansen cointegration test

Johansen test applied to two index funds
Johansen test applied to two US stocks during the 2008 crash
Johansen test applied to US stocks in the same industry
Johansen test applied to three country index ETFs
Johansen test applied to clustered German stocks

GARCH volatility model

Applying GARCH(1,1) models to the S&P 500 sectors ETFs
Applying GARCH(1,1) models to the MSCI country index ETFs
GARCH-updated entry threshold versus fixed entry thresholds

The results of the k-means clustering on shares listed on the Deutsche Börse Xetra over the period January 2004 – January 2005 can be seen in Table 5-3. The results of the two hypothesis tests of the Johansen test are contained in Table 5-8 and Table 5-9 respectively. The results of the Johansen test applied to the time series of these securities are shown in Table 5-10 and Table 5-11.

The results of the two hypothesis tests of the Johansen test are contained in Table 5-12 and Table 5-13 respectively. As discussed in section 4.6, a persistence value can be calculated by summing the parameters 𝛼 and 𝛽.

Table 5-2: Clustering of German stocks (2004-2005) - AP

Validation of system

Evaluation on Deutsche Börse Xetra
Evaluation on TSE ETFs
Evaluation on TSE stocks
Evaluation on JSE stocks
Evaluation on US ETFs
Evaluation on US stocks
Evaluation over different market regimes
Sensitivity analysis

Deutsche Börse Xetra
TSE ETFs
TSE Stocks
JSE Stocks
US ETFs
US Stocks
Transaction cost sensitivity

For comparison, another version of the flexible system (which trades up to 8 baskets of securities) was also implemented. Two variations of the adaptive system, the pairs trading system and the Bollinger Bands strategy, were able to outperform the S&P 500 during the non-trending period, with alphas of 2.72%, respectively. A version of the flexible system that trades up to eight baskets had lower volatility than the index and had an alpha of -0.78%.

A sensitivity analysis of the adaptive system will be performed for all financial markets examined in this study. As discussed in Section 5.2, the default transaction fee was chosen to be 0.4% of the transaction value.

Figure 5-24: System performance on DAX stocks Table 5-18: Performance metrics summary (DAX stocks)

Review of evaluation

Study overview

Concluding remarks

Adaptive system overview
Performance discussion

The third objective was to generate trade signals to profit from temporary mispricings in mean reversion characteristics for manufactured series. The ultimate goal of this study was to perform a sensitivity analysis of the statistical arbitration system. This was completed with a one-at-a-time (OAT/OFAT) system sensitivity analysis across all the different markets examined in this study.

The results show that the adaptive system was able to generate positive alpha for five of the six security universes on which the system was tested over the investigated period. The results of the sensitivity analysis gave an indication of the regions in which parameter values should be chosen if the system is to be applied practically.

Recommendations for future research

Closure

Dimitriu, “The Cointegration Alpha: Enhanced Index Tracking and Long-Short Equity Market Neutral Strategies,” August 5, 2002. Fama, “Efficient capital markets: A review of theory and empirical work.” The journal of Finance, vol. Titman, "Returns on Buying Winners and Selling Losers: Implications for Stock Market Efficiency," The journal of finance, vol.

Vassilvitskii, “k-means++: The Advantages of Careful Seeding,” in Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. Wright, “Convergence properties of the Nelder-Mead simplex method in low dimensions,” Society for Industrial and Applied Mathematics, vol.

JOHANSEN METHOD

Overview of the cointegration approach
Maximum likelihood estimation of cointegration vectors
Maximum likelihood estimator of the cointegration space

First, the likelihood ratio test for the hypothesis given by equation (6.4) and also the maximum likelihood estimator of the cointegration space. Second, a likelihood ratio test of the hypothesis that the cointegration space is constrained to lie in a particular subspace, representing a linear constraint we might want to impose on the cointegration vectors. The parameters 𝛼 and 𝛽 cannot be uniquely estimated because they form an excessive parameterization of the model.

The influence matrix П is found as the coefficient of lagged levels in a nonlinear least-squares regression of ∆𝑋𝑡 on lagged differences and lagged levels. Let 𝐷 denote the diagonal matrix of ordered eigenvalues 𝜆̂1> ⋯ > 𝜆̂

RE-PARAMETERIZING A VAR MODEL

COMPARISON OF CLUSTERING METHODS ON SYSTEM PERFORMANCE

COMPARISON OF FIXED AND DYNAMICALLY UPDATED MARKET ENRTY

SENSITIVITY ANALYSIS OF TRANSACTION COSTS ON THE DEUTSCHE BÖRSE

ANALYSIS OF DIFFERENT GARCH-UPDATED MODELS ON THE ADAPTIVE