Paul I. Louangrath
3. METHODOLOGY
3.3 Statistical Tests Used
The following tests have been used for preliminary data analysis: sample size determination, data distribution test, randomness test, and trend test.
Minimum sample size is determined by using the confidence interval estimation method suggested by Gou et al. (2013). According to Gou, the minimum sample size for Weibull’s distribution function may be obtained by:
1CI Rn (27)
where CI = confidence interval; R= Weibull reliability; and n= sample size. The confidence used for sample size determination is 99%. Since 1CI , equation (27) may be written as:
Rn
(28)
The mean of the Weibull reliability for the study period is R SD 0.51. Using 0.99 as the confidence interval, the value for alpha is 0.01. The sample size is n7. The sample size used in this research is comprised of 8 operating quarters for 10 industries. The rationale for using Gou’s method for calculating sample size is supported by the fact that
123
weight out of ten industries manifested Weibull distribution in their NPL rates. After the minimum sample size was determined, the data was tested for distribution type.
The sample size of n = 7 was confirmed by the testing of the Central Limit Theorem.
The confirmation of the sample size by CLT is calculated by modifying the Lyapunov CLT.
The Lyapunov equation is given by:
2 2
lim 1 i 0
n n
E X S
(29)
Since we are dealing with small sample size, the above condition is modified to:
2 2
lim 1 i 0
n n
E X S
(30)
The limit is set to near to or less that zero. The original term in the Lyapunov equation stands for moment, in this modified form the estimated standard deviation is used.
The modification is affected to accommodate small sample size studies.
Table 3: Sample Size Calculation under Central Limit Theorem
Item
TYPE OF INDUSTRIES
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
1.50 1.46 1.59 1.52 1.52 1.45 1.35 1.33 1.50 1.46
0.20 0.18 0.21 0.17 0.16 0.18 0.17 0.17 0.20 0.18
34.25 32.32 30.70 29.42 29.02 27.46 26.69 27.26 34.25 32.32
3.97 3.83 3.87 3.84 3.93 3.79 3.51 3.42 3.97 3.83
14.25 14.48 14.69 15.34 14.70 15.37 16.34 17.58 14.25 14.48
0.34 0.42 0.30 0.26 0.25 0.24 0.25 0.27 0.34 0.42
9.75 9.12 9.34 9.15 9.09 10.17 9.69 9.06 9.75 9.12
2.15 3.92 3.83 4.21 4.30 4.22 3.55 2.62 2.15 3.92
23.52 23.93 25.68 26.89 28.18 28.81 30.66 30.03 23.52 23.93 n
Mean S
8 1.47 0.09 1.41 0.10
8 0.18 0.02 0.17 0.02
8 29.64 2.65 27.90 2.99
8 3.77 0.20 3.64 0.22
8 15.34 1.12 14.61 1.26
8 0.29 0.06 0.25 0.07
8 9.42 0.41 9.15 0.46
8 3.60 0.80 3.07 0.90
8 27.21 2.68 25.45 3.02 Limit 0.00 0.00 15.91 0.01 0.37 0.00 0.04 0.15 17.09
CLT Yes Yes No Yes No Yes Yes No No
The determination whether the data distribution falls within the confine of normal distribution curve under the CLT assumption is accomplished by binomial frequency counts and tested by the DeMoivre-Laplace Central Limit Theorem for binomial distribution. The DeMoivre-Lapace equation is given by:
lim Pr n
X np npq Z
(31)
124
Using 0.95 confidence interval, the critical value for Z is 1.65. The calculation for Z produced an observed value of 1.30. This value was verified by a second Z binary equation:
X p
Z n
pq n
(32)
The result of this calculation shows that the Z(obs) is 1.32. in both cases, the null hypothesis could not be rejected. The decision rule was: H(o): Z(obs) < 1.65 means that the data distribution falls within the normal curve; H(A): Z(obs) > 1.65 means that the data distribution falls outside of the confidence interval. In the present case, 1.32 < 1.65 and 1.30
< 1.65; both binary methods confirm that the data set satisfied the requirement of CLT. The sample size used for this research is n = 8. According to the CLT test, it is considered adequate.
Data distribution test is accomplished by Anderson-Darling (AD) (Anderson &
Darling, 1952). The rationale is to verify whether the data is normally distributed because most statistical tests require that the data be normally distributed. With known distribution type, appropriate statistical tests could be used. The AD test consists of two steps: first, determine the observed value for AD and second, compare the observed AD value to that of the theoretical value: AD*. The observed value for AD is obtained through:
AD n S (33)
where S is defined as:
1
2 1
ln( ( )) ln(1 ( )) n
i
S i F Z F Z
n
(34)The theoretical value for AD* is given by:
2 0.752 2.25
* 1
AD AD
n n
(35)
The decision rule is: H0:ADAD* assumed to be normally distributed and
: *
HA ADAD assumed to be non-normally distributed. The data from the 10 industries are normally distributed because ADAD*. After the data distribution was verified, the data was tested for randomness.
Data randomness is verified by the adjacent test. The rationale for randomness test comes from the fact that most statistical tests require randomness in the data set. The adjacent test is given by:
125 For n25, the test statistic is given by:
1 2
1 25 1
2 1
n
i i
n i n
i i
x x
L
x x
(36)
The null hypothesis assumes normal distribution and this assumption may be rejected if the test statistic lies outside the lower and upper bounds of the critical value. The hypothesis statement is: H0:Llower Lobs Lupper and the data is non-random if Lobs lies outside of the lower and upper boundaries. In case where the data set may fall out of the L- bounds, it is necessary to test whether the data manifest significant trend.
Trend test was used to confirm whether the NPL rates manifest any trends:
recognizable patterns with magnitude and direction. Trend may be classified as improving or deteriorating trend. There are three types of trend tests commonly in use: (i) Military Handbook Trend Test (MHB); (ii) Laplace Trend Test; and (iii) Reverse Arrangement Test (NIST, Engineering Handbook, 2013). The Military Hand Book’s approach is given by:
2 2
1 2 ln
r end
r i i
T
T
(37)The MHB approach requires that the data comes from a system that follows Power Law, i.e. one quantity varies as a power of another quantity. In the present case, the NPL data set does not qualify because in the eight quarters for the ten industries, the change from quarter-to-quarter does not follow the Power Law.
The second approach to trend test is given by the Laplace Trend Test which is based on the assumed normal distribution of the data set and uses the Z-test:
1
12 2
r end
i LP i
end r T T
Z rT
(38)
Under the Laplace approach, the data must follow exponential model. In the present case, the NPL rates for the studied period manifest normal distribution, but did not change exponentially. The third trend test is called the Reverse Arrangement (RA) Test. This is the trend test applicable to NPL data in the present case. The RA test is given by:
( 1) 4 0.50 (2 5)( 1)
72 RA
R r r
Z r r r
(39)
126
The NPL data is tested under the RA test to verify whether there significant trend exists in the loan failure among the 10 industries.