Statistics and Data Analysis for Financial Engineering

Almost all of the examples in this book use datasets available in R so readers can reproduce the results. For readers who want to use R, the bibliographic notes at the end of each chapter list books that cover R programming, and the book's website contains examples of the Rand WinBUGS code used to produce this book.

Notation

If A is some statement, then I{A} is called the indicator function of A and equals 1 if Ais true and equals 0 if Ais false.

Introduction

Bibliographic Notes

The dictum that "All models are false, but some models are useful" is from Box (1976).

Returns

Introduction

Net Returns
Log Returns
Adjustment for Dividends

The Random Walk Model

Random Walks
Geometric Random Walks
Are Log Prices a Lognormal Geometric Random Walk?

Bibliographic Notes
R Lab

Data Analysis
Simulations
Simulating a Geometric Random Walk
Let’s Look at McDonald’s Stock

Exercises

The quick answer is "no". The lognormal geometric random walk has two assumptions: (1) the logarithmic returns are normally distributed and (2) the logarithmic returns are independent of each other. Suppose that the daily logs of a stock's returns are independent and normally distributed with a mean of 0.001 and a standard deviation of 0.015. a).

Fig. 2.1. Comparison of functions log(1 + x) and x.

Fixed Income Securities

Introduction

Zero-Coupon Bonds

Price and Returns Fluctuate with the Interest Rate

If the interest rate remained unchanged at 6%, the price of the bond would be. If the interest rate changes, however, the annual return of 6% is only guaranteed if you keep the bond until maturity.

Coupon Bonds

A General Formula

Yield to Maturity

General Method for Yield to Maturity
Spot Rates

The maturity interest rate of a zero-coupon bond with a maturity of n years is called the current year's spot interest rate and is denoted medyn. The coupon bond maturity is thus a complex "average" of the spot rates for the zeros in this bundle.

Fig. 3.1. Bond price versus yield to maturity. The horizontal red line is at the bond price of $ 1,200

Term Structure

Introduction: Interest Rates Depend Upon Maturity
Describing the Term Structure

In this example, we first find the yields to maturity from the prices derived in Example 3.2 using the interest rates from Table 3.1. Equations (3.12) and (3.13) give the yields to maturity in terms of bond prices and future rates, respectively.

Figure 3.2 shows weekly values of the 90-day, 10-year, and 30-year Treasury rates from 1970 to 1993, inclusive

Continuous Compounding

Continuous Forward Rates

The discount function D(T) and the forward rate function r(t) in formula (3.22) depend on the current time, which is equal to zero in this formula. However, we might be interested in how the discount function and the forward interest rate change over time.

Sensitivity of Price to Yield

Duration of a Coupon Bond

When this definition is extended to derivatives, the duration has nothing to do with the maturities of the underlying securities. Unfortunately, the underlying assumption behind (3.31) that all yields change by the same amount is unrealistic, so duration analysis is falling out of favor and value-at-risk is replacing duration analysis as a method of assessing rate risk. of interest. 4 Value-at-risk measures and other risk measures are covered in Chapter 19.

Bibliographic Notes

Now assume that all yields change by a constant amount δ, that is, yT changes toyT +δ for all T. Because of this assumption, Eq. 3.30) applies to each of these cash flows, and averaging them with these weights gives us that for a coupon bond.

R Lab

Computing Yield to Maturity
Graphing Yield Curves

Run the code above, then to zoom in on the short end of the curves, run the code again with maturities limited to 0 to 3 years; To do that, use xlim in the plot function. The estimated forward rates found by numerically differentiating an interpolating spline are 'wobbly'. The wobbles can be removed, or at least reduced, by using a penalized spline instead of an interpolating spline.

Exercises

If you bought the bond for the original price of $828 and sold it a year later for the price calculated in part (b), what is the net yield. a) Suppose the yield for this bond is 4% per annum, compounded semi-annually. The investor plans to sell the bond at the end of one year and wants the highest yield for the year.

Exploratory Data Analysis

Introduction

We also see volatility clustering, as there are periods of higher, and of lower, variation within each range. Volatility clustering does not indicate a lack of stationarity, but rather can be viewed as a type of dependence in the conditional variance of each series.

Fig. 4.2. Daily changes in the DM/dollar exchange rate, January 2, 1980, to May 21, 1987

Histograms and Kernel Density Estimation

Kernel density estimates (solid) of the daily log returns on the S&P500 compared to normal densities (dashed). a) The normal density uses the sample mean and standard deviation. We have just seen a problem with using a KDE to suggest a good model for the distribution of the data in a sample – the parameters in the model must be estimated correctly.

Fig. 4.4. Histograms of the daily log returns on the S&P 500 index from January 1981 to April 1991

Order Statistics, the Sample CDF, and Sample QuantilesQuantiles

The Central Limit Theorem for Sample Quantiles
Normal Probability Plots
Half-Normal Plots
Quantile–Quantile Plots

A semi-normal plot is a variant of the normal plot that is used to detect unusual data instead of testing for a normal distribution. More specifically, a semi-normal plot is a scatter plot of the order statistics of the absolute values of the data with respect to Φ−1{(n+i)/(2n+ 1)}, i = 1,.

Fig. 4.8. The EDF F n (solid) and the true CDF (dashed) for a simulated random sample from an N (0, 1) population

Tests of Normality

For the S&P 500 returns, the Shapiro-Wilk test rejects the null hypothesis of normality with a p-value less than 2.2×10−16. Shapiro-Wilk also strongly rejects normality for the changes in the DM/dollar exchange rate and for the changes in the risk-free return.

Boxplots

With large sample sizes, e.g. and 515, for S&P 500 returns, changes in the DM/dollar exchange rate, and changes in the risk-free rate, respectively, it is quite likely that normality will be rejected since any real population will deviate to some degree from normality and any deviation regardless how small, will be discovered with a sufficiently large sample. Of course, one must be aware of differences in scale, so it is worth looking at boxplots of the variables both without and with standardization.

Data Transformation

The transformed data satisfy the assumptions of the t-test that the two populations are normally distributed with equal variance, but of course the original data do not satisfy these assumptions. The second is to transform the data so that the transformed data meet the assumptions of the original test or estimator.

Fig. 4.19. Changes in risk-free rate (top) and changes in the logarithm of the risk- risk-free rate (bottom) plotted against time and against lagged rate

The Geometry of Transformations

All statistical estimators and tests make certain assumptions about the distribution of the data. We see that the correlations decrease as α decreases from 1, so that the concavity of the transformation increases.

Fig. 4.23. A symmetrizing transformation. The skewed lognormal data on the horizontal axis are transformed to symmetry by the log transformation.

Transformation Kernel Density Estimation

The red dashed curve in Fig.4.27 is a plot of the TKDE of the earnings data using the square root transformation. For positive, right-skewed variables such as the earnings data, a concave transformation is required.

Fig. 4.28. Histograms of earnings (y) and the square roots of earnings. The data are from the Earnings data set in R’s Ecdat package and use only age group g1.

Bibliographic Notes

R Lab

European Stock Indices
McDonald’s Prices and Returns

Run the following code to generate normal plots of the four indices and test the normality of each using the Shapiro–Wilk test. In lines 5–6, a robust estimator of the standard deviation of the t-distribution is computed using themad() function.

Exercises

Create a second set of six normal plots using n simulated N(0,1) random variables, where n is the number of bp changes plotted in the first figure. Use the following fact about the standard normal cumulative distribution function Φ(·):. b) What is the 0.975-quantile of a normal distribution with mean -1 and variance 2.

Modeling Univariate Distributions

Introduction

Parametric Models and Parsimony

A model should only have as many parameters as are necessary to capture the important features of the data. On the other hand, a statistical model must have enough parameters to adequately describe the behavior of the data.

Location, Scale, and Shape Parameters

A model with too few parameters can introduce bias because the model does not fit the data well. A statistical model with small bias but no redundant parameters is called parsimonious and achieves a good compromise between bias and variance.

Skewness, Kurtosis, and Moments

The Jarque–Bera Test
Moments

Estimating the skewness and kurtosis of a distribution is relatively simple if we have a sample, Y1. Deviations from the sample skewness and kurtosis of these values are indicative of nonnormality.

Fig. 5.2. Comparison of a normal density and a t-density with 5 degrees of freedom.

Heavy-Tailed Distributions

Exponential and Polynomial Tails
t -Distributions
Mixture Models Discrete MixturesDiscrete Mixtures

This is the "outlier" region (along with x < -6).7 The normal mixture has many more outliers than the normal distribution, and the outliers come from 10% of the population with a variance of 25. In summary, the normal mixture is much more prone to outliers than a normal distribution with the same mean and standard deviation.

Fig. 5.5. Comparison of N (0, 3.4) distribution and heavy-tailed normal mixture distributions

Generalized Error Distributions

The generalized error distributions can give tail weights between the normal and double-exponential distributions by having 1 < ν < 2. Because t-distributions have polynomial tails, any distribution is more heavily tailed than any generalized error distribution.

Fig. 5.6. A comparison of the tails of several generalized error (solid curves) and t-distributions (dashed curves).

Creating Skewed from Symmetric Distributions

Symmetric (solid) and skewed (dashed) t-densities, both with mean 0, standard deviation 1, andν= 10.ξ= 2 in the skewed density. Note that the mode of the skewed density lies to the left of its mean, a typical behavior for right-skewed densities.

Fig. 5.8. Symmetric (solid) and skewed (dashed) t-densities, both with mean 0, standard deviation 1, and ν = 10

Quantile-Based Location, Scale, and Shape Parameters

The parameters ξ,ω and α determine location, scale and bias and are called the direct parameters or DP. The parameters ξ and ω are the mean and standard deviation of φ(z) and α determine the amount of bias induced by Φ(αz).

Maximum Likelihood Estimation

Fisher Information and the Central Limit Theorem for the MLE

The covariance matrix of the MLE can be estimated from the inverse of the observed Fisher information matrix. If the negative of the logical likelihood is minimized by theRfunctionoptim(), then the observed Fisher information matrix is computed numerically and ifhessian = TRUE is returned.

Likelihood Ratio Tests

The left-hand side of (5.27) is twice the logarithm of the likelihood ratio L(θML)/L(θ0,ML), hence the name the likelihood ratio test. When an exact critical value is unknown, then the usual choice is the critical value.

AIC and BIC

In general, from a group of candidate models, one chooses the model that minimizes whichever criterion, AIC or BIC, is used. However, it is common for both criteria to choose the same or almost the same model.

Validation Data and Cross-Validation

The inappropriate use of the training data for validation would have led to the erroneous conclusion that the separate means estimator is more accurate. With leave-one-out cross-validation, each observation takes a turn to be the validation data set, with the other n−1 observations as training data.

Fitting Distributions by Maximum Likelihood

The flows in pipelines 1 and, to a lesser extent, 2 are fit reasonably well by the A-C skewed normal distribution. The red reference line through the quartiles in the QQ plot is created in lines 20–22.

Fig. 5.9. Kernel estimate of the probability density of diffrf, the changes in the risk-free returns.

Proﬁle Likelihood

Yn(α) and these values can be plugged into the log-likelihood to obtain the profile log-likelihood for α. This can be done with the function boxcox() in R'sMASS package, which plots the profile log-likelihood with confidence intervals.

Fig. 5.12. Proﬁle log-likelihoods and 95 % conﬁdence intervals for the parameter α of the Box–Cox transformation (left), KDEs of the transformed data (middle column), and normal plots of the transformed data (right).

Robust Estimation

Letk=nαrounded14 to an integer; kis the number of observations removed from both ends of the sample. The sample standard deviation is the most common estimate of dispersion, but as stated, it is not robust.

Transformation Kernel Density Estimation with a Parametric Transformationwith a Parametric Transformation

The removal of the heavy tails can be seen in Fig.5.16, which is a normal plot of the transformed data. Line8 calculates the KDE for the untransformed data and line11 calculates the KDE for the transformed data.

Fig. 5.13. Kernel density and transformation kernel density estimates of monthly changes in the risk-free returns, January 1960 to December 2002

Bibliographic Notes

R Lab

Earnings Data
DAX Returns
McDonald’s Returns

This section uses log returns for the DAX index in the EuStock-Markets dataset. The fig std$par component contains the MLE, and the fig std$value component contains the minimum value of the objective function.

Exercises

Based on this information alone, what would you use as an estimate of ν, the tail index parameter. An Analysis of Transformations. Journal of the Royal Statistical Society, Series B Kernel Density Estimation for Heavy Tailed Distributions Using the Champernowne Transformation. Statistics.

Resampling

Introduction
Bootstrap Estimates of Bias, Standard Deviation, and MSEand MSE

Bootstrapping the MLE of the t -Distribution

Bootstrap Conﬁdence Intervals

Normal Approximation Interval
Bootstrap- t Intervals
Basic Bootstrap Interval
Percentile Conﬁdence Intervals

Bibliographic Notes
R Lab

BMW Returns
Simulation Study: Bootstrapping the Kurtosis

Exercises

The bootstrap is based on an approximation of the population probability distribution using the sample. Most estimators satisfy a CLT, e.g. The CLTs for sample quantiles and for MLE in Sects.4.3.1and5.10, respectively.

Table 6.1. Estimates from ﬁtting a t-distribution to the 2,528 GE daily returns.

Multivariate Statistical Models

Introduction
Covariance and Correlation Matrices
Linear Functions of Random Variables

Independence and Variances of Sums

Scatterplot Matrices
The Multivariate Normal Distribution
The Multivariate t -Distribution

The covariance matrix of standardized variables is equal to the correlation matrix of original variables, which is also the correlation matrix of the standardized variables. This is because the return on the portfolio is the weighted average of the return on the assets.

Fig. 7.1. Scatterplot matrix for the CRSPday data set.

LSCC

ALTR

Using the t -Distribution in Portfolio Analysis
Fitting the Multivariate t -Distribution by Maximum Likelihoodby Maximum Likelihood
Elliptically Contoured Densities
The Multivariate Skewed t -Distributions
The Fisher Information Matrix
Bootstrapping Multivariate Data

In that figure, it can be seen that the MLE ofν is 5.94, and there is relatively little uncertainty about the value of this parameter - the 95% profile likelihood confidence interval is. When the data are t-distributed, maximum likelihood estimates are superior to the sample mean and covariance matrix in several respects—the MLE is less variable and is less sensitive to outliers.