A Mixture Regression Model
6.3 General Procedure for the Selection of Comparison Partners
6.3.2 Identification of the optimal number of segments
For each dependent variable multiple runs were performed in order to find the appropriate number of latent segments in the data. Each model was run with two to four segments,and evaluated by the Bayesian information criterion (BIC).5In those results four segments were proposed to be the optimal solution for the given model and another two runs using five and six segments were performed. None of these cases suggested the use of more than five segments, thus further runs with more than six segments were omitted. Table 6.10 gives a summary of theseGLIMMIXruns.
The number of segments were defined using the BIC criterion. Therefore, models 1 and 2,using total F&B revenue (Y1) or total accommodation revenue (Y2) as an independent variable,reached a minimum BIC value with five latent segments in theGLIMMIXruns,whereas model 3,using occupancy rate (Y3), achieved its optimal number of segments with three classes.
The occurrence of local optima is a serious problem in the EM algorithm.
To investigate the presence of local optima,15 reruns for each of these optimal
Kolmogorov–Smirnov goodness of fit testa
Y1 Y2 Y3 ln(Y1) ln(Y2)
19911992 19931994 19951996 1997
0.030 0.077 0.071 0.059 0.173 0.087 0.085
0.069 0.022 0.141 0.082 0.248 0.220 0.253
0.972 0.967 0.925 0.696 0.983 0.844 0.866
0.773 0.995 0.913 0.886 0.832 0.713 0.406
0.927 0.966 0.986 0.987 0.797 0.990 0.510
Mean 0.083 0.148 0.893 0.788 0.880
aTested distribution: normal; values are two-tailed sign.
Table 6.9. Log-transformation forY1andY2.
5 One of the principal decision-making problems faced by applied statisticians is that of choosing an appropriate model from a number of competing models for a particular data set. The most popular way to solve this problem is to use an information criterion (IC) to make the choice. In general, an IC model-selection procedure is based on choosing the model with the largest maximized log-likelihood function minus a penalty function, which depends on the number of parameters and, in most cases, the sample size. Among a large number of information criteria, Schwartz’s (1978) Bayesian information criterion (BIC) and Akaike’s (1973) information criterion (AIC) are the most popular.
segment solutions were performed. If different starting values yield to different optima,Wedel suggests selection of the solution with the maximum value of the log-likelihood (Wedel,1997: 8). The likelihood estimates and the corresponding BIC values for the 15 GLIMMIX reruns for each of the three models are summarized in Table 6.11.
ForY1,from all 15 runs the eighth run achieved the highest log-likelihood and was therefore selected for further analysis. ForY2andY3the best results could be reached in runs 9 and 10, respectively.
Segment AIC CAIC MAIC BIC R2
Y1 2
34 56
−172−11
−275−66
−226
−290−190
−173−25
−274
−201−55
−201−7
−185
−290−190
−173−25
−87
0.840.89 0.900.92 0.95
Y2 2
34 56
−273−86
−22−87
−87
−391−264
−261−213
−274
−302−130
−81−13
−2
−391−264
−261−213
−274
0.840.92 0.950.96 0.96
Y3 2
34 56
−629−720
−770−799
−852
−511−541
−531−499
−490
−600−676
−711−725
−763
−511−541
−531−499
−491
0.660.76 0.810.85 0.86 Note: Figures in bold signifythe segmentation solution derived from the BIC criterion.
Table 6.10. Summaryof mixture regression findings withGLIMMIX.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Y1 LL BIC−175
− 99−177
− 94−140
−169−191
− 66−179
− 91−142
−165−196
− 57−208
− 32−165
−118−172
−105−120
−209−174
−101−163
−122−200
− 49−170
−109 Y2 LL
BIC−117
−215− 91
−266− 78
−292−112
−225− 67
−313−125
−199− 61
−326−112
−225−126
−197−110
−229−105
−238−101
−246−123
−202− 91
−266− 83
−283 Y3 LL
BIC−401
−536−390
−513−408
−549−406
−546−387
−508−384
−502−355
−443 −391
−516−401
−535−411
−556−399
−531−402
−537−408
−550−402
−538−392
−517 Note: Figures in bold signifytheGLIMMIXrun recommended bythe BIC criterion.
Table 6.11. GLIMMIXtest (AHRP 1991–1997): 15 reruns.
Classification problems for non-disjunctive cluster solutions
Agreement between the resulting classifications will lend more weight to the chosen solution. In order to compare the classification of a mixture regression model with different starting values,someone could compare the posterior probabilities of each of these runs. There are two problems which arise in this context. First,there is the problem of varying labels assigned by theGLIMMIX
output. Segment (class) number 1 in iteration 1 is not necessarily class number 1 in the next iteration,which makes comparisons of several GLIMMIX runs (among identical models) difficult.
The second problem is related to the strategy that is applied to the classifi- cation problem. This is illustrated by Table 6.12 which shows an ambiguous classification which is valid for all kinds of non-disjunctive cluster solutions.
The example shows seven non-disjunctive cluster solutions for one company achieved by the same model and cluster methodology,but employing different starting values. The posterior probabilities are already correctly labelled and the segment assignments,listed in the column ‘classification’,are therefore not affected by the problem of changing segment labels. In iteration 1 the posterior probabilities suggest classifying the company into segment 1 (0.8 > 0.2); in iteration 2 it is classified to segment 2 (0.4 < 0.6), and so on.
Nevertheless,the example illustrated in Table 6.12 is still a rather ambigu- ous clustering solution. The company in question is classified three times into segment 1 (43%) and four times into segment 2 (57%). In order to decide on an exact class someone could consider the number of assignments to each of the two segments. Following this strategy (1) the company is classified to segment 2,which in the present context means that this company will be compared with other companies in this group.
Iteration Segment 1a Segment 2a Classification Class 1 Class 2 12
34 56 7
0.800.40 0.700.45 0.900.40 0.40
0.200.60 0.300.55 0.100.60 0.60
12 12 12 2
0.80 0.70 0.90
0.60 0.55 0.600.60
240.57 235.57
Strategy1:
Strategy2: 0.43
0.80 0.57
0.59
aPosterior probabilities.
Table 6.12. Example of an ambiguous classification problem for non-disjunctive cluster solutions.
Another strategy (2) considers the values of the posterior probabilities.
Using these a company is assigned to the two groups with the highest average posterior probability. The average likelihood of the company being classified in segment 1 is 80%,compared to 59% in segment 2 which clearly favours class 1 over class 2. Although the example in Table 6.12 is a very extreme case, it demonstrates the practical problems with repeated non-disjunctive cluster solutions. On the other hand,agreement between the resulting classifications using both strategies lends support to the case for the validity of a solution.
For the present study the average posterior probabilities for each of the segments identified and both strategies are summarized in Table 6.13. As can be seen from Table 6.13,both strategies lead to the same classification and therefore to the same average posterior probabilities. The high values of the average posterior probabilities indicate that the classification of all companies is very sharply defined.
Mixture regression classification versus AHRP strategic groups
After having achieved several regression models for each latent class in the database,the composition of these groups can be analysed. If the groups are more or less comparable to the traditional AHRP classification system (indus- try sectors),then the mixture regression model could be replaced by ordinary multiple regression analysis. If the groups are significantly different from the traditional system then this will justify the additional effort.
Each of the attributes which make up the AHRP classification in the hotel group,i.e. location,number of days of operation and category,were tested
Segment Strategy1a Strategy2a
Y1 1
23 45
0.9951 0.9916 0.9792 0.9943 0.9992
0.9951 0.9916 0.9792 0.9943 0.9992
Y2 1
23 45
0.9896 0.9862 0.9916 0.9878 1.0000
0.9896 0.9862 0.9916 0.9878 1.0000
Y3 1
23
0.9706 0.9647 0.9983
0.9706 0.9647 0.9983
aPosterior probabilities.
Table 6.13. Average posterior probabilities in modelsY1,Y2andY3.
against the classification derived from the mixture regression models. Only a few relationships could be found. For example,in the occupancy rate model the classes seemed to reflect to some degree differences in the number of days of operation,hence reflecting the industry’s adaptability to the seasonal variations in Austria. In this model segment 2 is dominated by companies which are open all year (63%),whereas in groups 1 and 3 the majority of hotels close at least in the pre- or post-season,or open only in summer or winter (76 and 70.7%).
Service quality seems to be an important attribute reflected by the mixture regression classification in model 2 (accommodation revenue group). Here,in segments 1 and 2 the majority of companies are four and five-star category hotels,and thus can be described as the luxury hotel segment. Segment 3 has most of the three-star hotels and segment 4 all the remaining forms of low-budget accommodation.
Although these findings suggest a few similarities between parts of the traditional AHRP classification system and thea posterioridefined classes,they are not convincing enough to reject the conditional mixture approach.