A CLUSTERING ANALYSIS USING FINITE MIXTURE MODEL

(1)

International Journal of Social Science Research (IJSSR) eISSN: 2710-6276 | Vol. 4 No. 2 [June 2022]

Journal website: http://myjms.mohe.gov.my/index.php/ijssr

A CLUSTERING ANALYSIS USING FINITE MIXTURE MODEL

Seuk Yen Phoong^1* and Seuk Wai Phoong²

1 Department of Mathematics, Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris, Tanjung Malim, MALAYSIA

2 Department of Management, Faculty of Business and Economics, Universiti Malaya, Kuala Lumpur, MALAYSIA

*Corresponding author: [email protected]

Article Information:

Article history:

Received date : 17 March 2022 Revised date : 20 May 2022 Accepted date : 21 June 2022 Published date : 25 June 2022 To cite this document:

Phoong, S. Y., & Phoong, S. W.

(2022).A CLUSTERING ANALYSIS USING FINITE MIXTURE MODEL.

International Journal of Social Science Research, 4(2), 71-78.

Abstract: Finite mixture model is a model-based clustering and probabilistic model. The trend, crisis, seasonal, etc might be the k parameters of components.

This model is widely applied in analyzing a mixture of probability. Maximum likelihood estimation and Bayesian method are two statistical methods that applied to fit the finite mixture model. Maximum likelihood estimation is used to maximize the likelihood function, while the Bayesian method allows prior information to integrate with the likelihood function to produce a new model. Thus, the objectives of this study are to (1) formulate the mixing probability using maximum likelihood estimation and Bayesian method and (2) identify the most plausible statistical method in modelling the exchange rate and gold price in Malaysia.

Findings show that both maximum likelihood estimation and Bayesian method provide roughly similar distribution. However, Bayesian method is found to be more suitable in analyzing time series data due to the lower values of Akaike information criterion and Bayesian information criterion.

Keywords: Bayesian method, Exchange Rate, Finite Mixture Model, Gold Price, Maximum Likelihood Estimation.

(2)

1. Introduction

The finite mixture model is a clustering based model that widely used for large heterogeneous data. These models enable to examine data from different distributions such as Gaussian mixture models, Dirichlet, Gamma etc (Lai et al., 2019; McLachlan et al., 2019). The flexibility of finite mixture model can be used to investigate the presence of unobserved situations and examine the distinct parameters or distribution (Gensler, 2017). Finite mixture model also used to identify the number of components or weights of the distribution.

There are two methods widely applied to fit the finite mixture model, include Bayesian method and maximum likelihood estimation. These two methods are used to identify number of classes and reveal the relationship between the response and predictor variables after classification (Shan and Yang, 2021; Grimm et al., 2021).

The objectives of this study are to (1) formulate the mixture distribution for exchange rate and gold price in Malaysia using MLE and Bayesian method and (2) compare the MLE and Bayesian method to find the plausible method to analyze the time series data.

2. Literature Review

2.1 Maximum Likelihood Estimation

Maximum likelihood estimation (MLE) is a statistical method that is widely applied in various fields such as finance (Phoong and Phoong, 2016), communication systems (Snijiders et al., 2010; Byshkin et al., 2018), biology (Lai et al., 2019; Hauschild and Jentschel, 2001), and others. The main role of MLE is to maximize the likelihood function. Moreover, MLE shows consistency characteristics when applied in hidden Markov models (Holzmann and Schwaiger, 2016).

MLE is effective because it can be applied in most problems to produce reasonable estimator.

An excellent estimator can be produced as the sample size is large. Furthermore, the application of MLE provides the findings with asymptotically normal, efficient, and low variance as compared to other statistical methods as the sample size increases (Byshkin et al., 2018; Yoo, 2012). The inference can be made based on the population using MLE. To maximize the likelihood function, the equation to obtain the value of parameters is:

( )

x L ( x)

L_nˆ; max _n ;



= (1)

where the specific value ˆ⁼ˆ_n( )x^^and the function ˆ_n:ⁿ→ given thatis the parameter space.

In general, the maximization process is more convenient to work with log-likelihood or logarithm of the likelihood function (McLachlan and Peel, 2000; Zhu and Wathen, 2018).

( )x L ( )x

L_n ; =In _n ; (2)

This statistical method is recommended for large sample size data because it is versatile and applicable to most of the models or distributions.

(3)

2.1 Bayesian Method

The Bayesian method is another statistical method used to examine parameter estimates and make predictions. The Bayesian method allows the belief or knowledge integrated with the likelihood situation to formulate posterior information. These features provide precise results and conclusions since they pool evidence from different sources.

The Bayesian method is widely applied in estimating missing data, decision-making, and predicts future observation. It entails a subjective prior distribution to be formulated; then, the datasets with uncertainty are modeled, a reasonable decision is made, and a utility function is expressed. Moreover, the Bayesian approach provides accurate predictions when the prior information is enough and proper. Invalid or insufficient prior information will affect the posterior distribution. The parameter estimates that are obtained are close to the predictive distribution because the Bayesian method always shows consistency (Gelman et al., 2013). The asymptotic properties of the Bayesian method, such as being normal and efficient, also allow remarkable results to be obtained.

According to McLachlan and Peel (2000), there are three important concepts for the Bayesian method, including prior distribution, likelihood function, and posterior distribution. Prior distribution is the information of the model before the samples are analyzed. Likelihood function is the information for the current samples. Meanwhile, posterior distribution is known as the weighted average between prior distribution and likelihood function.

The Bayesian method is a conditional probability density function and derives from Bayes' theorem. Given that n observation,^x¹^,^x²^,...,^xⁿ,the prior distributionP( ) is combined with the likelihood functionP( )x to present posterior distribution^P( )^x . The posterior distribution can be obtained as follows:

( ) ( )_{( )}^{( )}

x P

P x x P

P  

 = ^• (3)

where^P^{( )}^x denotes the marginal likelihood.

3. Research Methodology

The sampled data used in this study are monthly data from January 2000 until August 2019 of the exchange rate and gold price in Malaysia. Additionally, the unit for nominal exchange rate is Ringgit Malaysia per U.S. Dollar and Ringgit Malaysia per Troy Ounce for gold price.

There are several important steps to analyze the objectives of this study. First, the unit root test, which is the Augmented Dickey-Fuller (ADF) test, used to check the stationarity of the data.

The behaviors of the non-stationary data can be cycles, random walks, trends, or a combination of these three. The aim of this step is to avoid unreliable and spurious results.

Then, the mixture distribution is measured using MLE and the Bayesian method. Lastly, a comparison between MLE and the Bayesian method using Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) is made to measure the most plausible statistical method to analyze the financial time series data.

(4)

4. Results and Discussion 4.1 Unit Root Test

The unit root test is a preliminary step for time series data analysis since it is important to identify the stationarity of the data. The stationarity for both exchange rate and gold price can be referred to in Table 1 and Table 2.

Table 1: Unit Root Test for Exchange Rate

Before Differencing First Differencing t-Statistic Prob.* t-Statistic Prob.*

Augmented Dickey-Fuller test

statistic

-1.298024 0.6309 -10.12755 0.0000

*MacKinnon (1996) one-sided p-values, lag order is selected SIC.

Table 1 reveals that the p-value before differencing is 0.6309 which is greater than alpha. Thus, the null hypothesis is not rejected and we found that the exchange rate has a unit root. Then, first differencing is needed to transform the non-stationary data into stationary. The p-value for first differencing is 0.0000 which is less than alpha. Hence, we concluded that the exchange rate has no unit root after first differencing.

Table 2: Unit Root Test for Gold Price

Before Differencing First Differencing t-Statistic Prob.* t-Statistic Prob.*

Augmented Dickey-Fuller

test statistic

-0.006691 0.9562 -13.42045 0.0000

*MacKinnon (1996) one-sided p-values, lag order is selected SIC.

The unit root test for gold price is shown in Table 2. Results depicted that the p-value for both before differencing and first differencing are 0.9562 and 0.0000 respectively. The p-value for first differencing is less than alpha; thus, it can be concluded that the gold price has no unit root at first differencing.

4.2 Maximum Likelihood Estimation

The mixture models are fitted by the MLE to measure the number of components existing, the weight of the probability mass function for different components, and the equation obtained for different situations. Based on the findings obtained, there is a two-component mixture normal distribution for the variables, including exchange rate and gold price in Malaysia. This can be supported by Phoong and Phoong (2016), that the two-component mixture model is reasonable for financial datasets since it expresses the data in normal situations and crisis times.

Table 3 shows a dual Quasi-Newton technique is used to find the local minima and maxima of the functions in this study. This technique enables one to balance the speed and stability of the mixture models. Table 3 also displays seven parameters in the optimization model, including four means function parameters, two scale parameters, and a mixing probability parameter.

(5)

Table 3: Optimization Information Optimization Technique Dual Quasi-

Newton Parameters in Optimization 7

Mean Function Parameters 4

Scale Parameters 2

Mixing Probability Parameters

1

The information about the seven effective parameters obtained in this case can be observed in Table 4 and 5. Table 4 describes the parameter estimates and standard error for six parameters, including four mean parameters and two scale parameters.

Table 4: Parameter Estimates for Normal Model Component Effect Estimate Standard Error

1 Intercept 3.5493 0.09471

1 Gold 0.000106 0.000019

2 Intercept 4.0237 0.01842

2 Gold –0.00019 5.629E-6

1 Variance 0.03716 0.008032 2 Variance 0.009958 0.001265

Meanwhile, the parameter estimates for mixing probability is shown in Table 5. The mixing probability or weight of a probability density function is y = 0.3280f1 + 0.6720f2 where f1 denotes the probability density function for component 1 (crisis period) and f2 denotes the probability density function for component 2 (normal situation).

Table 5: Parameter Estimates for Mixing Probabilities

Component Mixing Probability Logit (Prob) Standard Error

1 0.3280 –0.7173 0.1704

2 0.6720 0.7173

4.3 Bayesian Method

In this study, there is a two-component normal mixture model being estimated by using Markov Chain Monte Carlo. Markov Chain Monte Carlo (MCMC) is a statistical method that comprises the sampling algorithm from a probability distribution.

By referring to the Table 6, a conjugate sampling algorithm is applied in this case where the family distributions for both prior and posterior distribution are the same. The Bayes information also depicts that there are seven parameters, including four mean function parameters, two scale parameters, and a mixing probability parameter that is similar to the findings obtained from MLE.

(6)

Table 6: Bayes Information

Sampling Algorithm Conjugate Parameters in Sampling 7 Mean Function Parameters 4

Scale Parameters 2

Mixing Prob Parameters 1

The finding about the posterior distribution, which is the integration of prior distribution and likelihood function, is displayed in the Table 7. From the posterior distribution table, the probability density function is y = 0.3350f1 + 0.6650f2, where f1 denotes the probability density function for component 1 (crisis period) and f2 denotes the probability density function for component 2 (normal situation).

Table 7: Posterior Distribution Component Effect Estimate

1 Intercept 3.5457

1 Gold 0.000108

2 Intercept 4.0215

2 Gold –0.00019

1 Variance 0.0615

2 Variance 0.0246

1 Probability 0.3350

4.4 Method Comparison

The analysis of exchange rate and gold price using MLE and Bayesian method are then compared for the most plausible statistical method in analyzing the time series data. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) are fit of measurements that are used in this study to compare the most plausible statistical method. The AIC and BIC are the estimators of the relative quality for the sample data.

Table 8: Method Comparison

Maximum Likelihood Estimation Bayesian Method

AIC ^–106.3 ^–49.5913

BIC –82.0532 –25.3445

Table 8 shows the value of AIC and BIC obtained based on the value of -2Log Likelihood. The values of AIC for MLE and Bayesian method are –106.3 and –49.5913 respectively. Moreover, the BIC value for these two statistical methods is –82.0532 for MLE and –25.3445 for the Bayesian method. From the findings obtained, both the AIC and BIC for the Bayesian method display the lowest value as compared to MLE. This illustrated that the Bayesian method is more plausible in examining nonlinear financial datasets. This can be supported by Phoong and Ismail (2015) that stated that the Bayesian method is superior to maximum likelihood in modeling time series data, although both statistical methods provided roughly similar results.

This is because the Bayesian method allows prior information to integrate with the likelihood

(7)

Meanwhile, MLE does not have prior distribution and may lead to model identification problems.

5. Conclusion

This study applied the MLE and the Bayesian method to formulate the mixture distribution based on the crisis time and normal situation for the exchange rate and gold price in Malaysia.

Findings depict that there are two-component mixture model fitted with the statistical methods and the mixture distribution obtained is approximately similar. Thus, a further analysis of identifying the most plausible method using AIC and BIC found that the Bayesian method is superior as compared to MLE in measuring the time series data.

6. Acknowledgement

This research has been carried out under Fundamental Research Grants Scheme (FRGS/1/2019/STG06/UPSI/02/2) provided by Ministry of Education of Malaysia. The authors would like to extend their gratitude to Universiti Pendidikan Sultan Idris (UPSI) that helped managed the grants.

References

Byshkin, M., Stivala, A., Mira, A., Robins, G., & Lomi, A. (2018). Fast Maximum Likelihood Estimation via Equilibrium Expectation for Large Network Data. Scientific Reports, 8 (11509).

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013).

Bayesian Data Analysis. United States: Taylor and Francis.

Gensler, S. (2017). Finite Mixture Models. In C. Homburg, M. Klarmann & A. Vomberg (Eds.). Handbook of Market Research. Cham: Springer.

Grimm, K. J., Houpt, R., & Rodgers, D. (2021). Model Fit and Comparison in Finite Mixture Models: A Review and a Novel Approach. Frontiers in Education, 6, 613645.

Hauschild, T., & Jentschel, M. (2001). Comparison of maximum likelihood estimation and chi- square statistics applied to counting experiments. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 457 (1-2), 384-401.

Holzmann, H., & Schwaiger, F. (2016). Testing for the number of states in hidden Markov models. Computational Statistics & Data Analysis, 100, 318-330.

Lai, K., Twine, N., O’Brien, A., Guo, Y., & Bauer, D. (2019). Artificial Intelligence and Machine Learning in Bioinformatics. Encyclopedia of Bioinformatics and Computational Biology, 1, 272-286.

Maria, E. C. J., Salazar, I., Sanz, L., & Gómez-Villegas, M. A. (2020). Using Copula to Model Dependence When Testing Multiple Hypotheses in DNA Microarray Experiments: A Bayesian Approximation. Mathematics, 8, 1514.

McLachlan, G., & Peel, D. A. (2000). Finite Mixture Models. New York: Willey.

McLachlan, G. J., Lee, S. X., & Rathnayake, S. I. (2019). Finite Mixture Models. Annual Review of Statistics and Its Application, 6 (1), 355-378.

Phoong, S. Y., & Ismail, M. T. (2015). A Comparison between Bayesian and Maximum Likelihood Estimations in Estimating Finite Mixture Model for Financial Data. Sains Malaysiana, 44 (7), 1033–1039.

(8)

Phoong, S. Y., & Phoong, S. W. (2016). Modeling the nonlinear financial relationship in selected ASEAN countries. International Journal of Advanced and Applied Sciences, 3 (4), 44-49.

Shan, A., & Yang, F. (2021). Bayesian Inference for Finite Mixture Regression Model Based on Non-Iterative Algorithm. Mathematics, 9, 590.

Snijiders, T. A. B., Koskinen, J., & Schweinberger, M. (2010). Maximum likelihood estimation for social network dynamics. Annals of Applied Statistics, 4 (2), 567-588.

Yoo, C. (2012). The Bayesian method for causal discovery of latent-variable models from a mixture of experimental and observational data. Computational Statistics & Data Analysis, 56 (7), 2183-2205.

Zhu, S., & Wathen, A. J. (2018). Essential formulae for restricted maximum likelihood and its derivatives associated with the linear mixed models. arxiv:1805.05188.