Quantitative Methods for Tourism Industry Analysis
2.4 Quantitative methods
While a qualitative approach is still widely used, it must be emphasized that a quantitative approach is important in advancing tourism and hospitality research as another solid social scientific subject in today ’ s social science field.
In the US, which is one of the most advanced nations in terms of quantitative knowledge generation in hospitality and tourism, many students dislike learning quantitative methods as much as they dislike finance or accounting courses because of overwhelming amount of dry numbers. However, without an ability to understand financial statements, it is difficult to serve at senior management positions in business.
In the current US hospitality and tourism academic fields, it is almost impossible for students studying for a doctor in philosophy degree (PhD) to obtain tenure-track positions unless they can demonstrate solid knowledge on, and mastery of, quantitative methods. It is understandable that hospitality and tourism students procrastinate to learn to use numbers until, or unless, they appreciate that mastery of quantitative skills is important whether they work for the profit-oriented industry or research-oriented noncommercial environment. After all, as some honest students claim that they are in the hospitality and tourism field because they do not like numbers.
In the following sections, there will be only brief overviews of statistical and nonstochastic (deterministic) models. Excellent textbooks covering the precise subjects of statistical models are already available so the subject will not be covered in great detail. For nonstochastic (determin- istic) models, detailed introductory explanations will be expanded in the following chapters.
2.4.1 Statistical models
A model is a simple representation of the complex interactions of variables in the real world.
If you believe that beverage expenditures at your restaurant (something you are interested in knowing: represented by Y as a dependent variable, which means that this variable changes depending on another factor) depends on the total number expenditure on food (another factor that may affect the Y represented as X as an independent variable), then you can investigate the customer ’ s expenditure patterns on food and beverages at your restaurant using equation (1.1).
Y X (1.1)
where Y dependent variable, i.e. the beverage expenditure per person; intercept value of Y when X is zero, i.e. how much does a customer consume for beverages when they do not eat anything; coefficient for the slope of the model; X independent variable, i.e. the food expenditure per person; and error term that captures the difference when Y i devi- ates from an expected value predicted by X i.
Suppose the analysis of data of 3000 customers revealed that Y $4.00 0.3 X .
Then if per customer consumption of food item in total ( X ) was $20, you can say the cus- tomer ’ s expenditure on beverages would be $4.00 $6.00 $10.00. However, in a social sci- ence setting where you have to observe reality, one person who spent $20 on food may spend
$20 on beverages ($20 – ($4.00 0.3 $20.00) $10 deviation from the predicted results by the model), whereas another person who spent $50 on food may spend nothing on beverages ($0 – ($4.00 0.3 $50.00) $19 deviation from the predicted result by the model). It is important to note that after analyzing data from 3000 customers, the model Y $4.00 0.3 X was suggested from the result of calculations to express the association between Y i and X i (where i is any observation from observation number 1 to 2999) better than any other param- eters, which was fit to the data in a way that the expected errors to be zero. The model allows those deviations, and the model may overestimate or underestimate, but overall, such errors would be expected to be zero as the errors occur both above and below zero in real data.
When there is a large sample size, such as 3000 observations, amid occasional observed value with large deviations from the expected value, you start to see the observed values tend to regress back to the mean (expected value).
In general, statistical models have common features of stochastic (random) character. You cannot predict exactly how much the next customer would spend on beverages, though you can make a reasonable guess (expected value) about it from food expenditure . This random- ness may be attributed to either the true randomness of customers ’ expenditure on beverages or a lack of inclusion of certain other important factors (variables) that were not incorporated in the model.
It depends on your judgment on model building whether you, as a researcher, could think any other factors that may affect the relationship between food and beverage expendi- tures; which may include, but are not limited to:
● outside temperature,
● time of eating (e.g. breakfast, lunch, or dinner),
● number of parties,
● age group of the customer (above legal age for alcoholic beverages or not),
● gender,
● perceived friendliness of the server,
● spiciness of food items,
● price elasticity,
● ambience of the restaurant,
● layout of the restaurant,
I N T R O D U C T I O N T O Q U A N T I TAT I V E M E T H O D S F O R T O U R I S M I N D U S T RY A N A LY S I S 29
● particular seating that customer was assigned,
● existence and results of special social events such as sports (New York Yankees versus Boston Red Sox game in New York City or Boston; Six Nation Rugby game in England, France, Ireland, Italy, Scotland, and Wales; Cricket World Cup match in Caribbean Islands;
Asian Cup soccer game in Iraq etc.),
● religious events (Easter holidays, Eid-ul Fitr, Hanukkah, New Year in Chinese calendar, etc.),
● national holidays (Independence day, Liberation day, Constitution day, etc.).
If you do not include them in the model, those effects are captured in the error term.
Unlike deterministic models, statistical models incorporate certain variance of data to be captured as an error term. In other words, a model would take into consideration asso- ciations between independent variables and a dependent variable, and thus it may indicate some relationship between values of a dependent variable given certain values of independ- ent variables. So, while the overall model would showY $4.00 0.3 X , it may be expressed as Y i $4.00 0.3 X i . These models come with some explicit assumptions and tacit reservations.
It may seem that the error term, , appears to be useless because it is expected to be zero anyway, however, it cannot be disregarded. When is put in perspective with the coefficient (slope) to have the relative comparison of the estimated coefficient and the standard error of the coefficient, useful information can be obtained, for example, how certain the model is to insist on the existence of slope at all. Namely, you can verify whether increase in food expenditure would yield an increase in expected expenditure on beverages with 95% or 99%
likelihood or even higher, subject to a series of assumptions in the model.
2.4.2 Statistical analysis
In order to do a statistical analysis the basic features of the data are needed (whether using primary data collected from results of questionnaires or secondary data obtained from another researcher) . This includes number of data, types of variables (how many possi- ble answers, e.g. if it is gender there should be two choices for one answer), overall mean, and standard deviation. This initial process is termed descriptive statistics. While results of descriptive statistics may suffice for many practitioner audiences, more rigorous data analysis is needed to draw certain inferences, such as hypothesis testing and correlation analysis.
If you work for restaurant environment, you may regard descriptive statistics as a proc- ess of checking the quality and quantity of all the ingredients before you start cooking, to be followed by processes of cooking, which are the inferential statistics. Knowing the condi- tions of ingredients would surely help you make a better meal. What would happen to the cooked dishes if the ingredients are of an inferior quality? Hopefully, before you start to jump on intensive cooking processes, you take time to taste the carrots, lettuces, and tomatoes to ensure that they meet the required level of basic quality, or look closely and smell very care- fully the quality of meat and fish. Such practice would save precious cooking time before you start the processes. You can abort the cooking process before you cook subpar meals, which
would require you to throw away the ingredients you currently have to be replaced with bet- ter ingredients. When the meats in your inventory are of an inferior quality for steaks though are still edible and safe , an experienced chef can still cook them as ingredients for Chili sauce as long as the chef has basic set of skills and knowledge to generate satisfactory results.
Descriptive statistics may show you whether it is advisable for you to throw away, salvage them and cook with spices, or cook a series of dishes by capitalizing on the superior quality of the fresh ingredients, as long as you have basic set of skills and knowledge as a researcher.
Researchers who collect primary data are able to design the research and identify sam- pling issues before they start data collection. Thus by creating a controlled environment for experiments, a researcher can make statistical analysis of the samples to verify their hypoth- eses. In contrast, researchers who deal with observed data in the society or secondary data may not be able to create a controlled environment and the lack of a controlled environment will pose extra challenges as will be mentioned later.
2.4.2.1 Hypothesis testing
Hypothesis testing is the technique in which you present two hypotheses, namely one hypothesis that stands for the idea that you would like to challenge or verify (null hypoth- esis), and another hypothesis that stands for an alternative idea or possible new finding that you are eager to explore. Data analysis will show quantitatively whether there would be enough evidence to refute the null hypothesis or not.
For example, the null hypothesis in our simple case can be that ‘ there is no association between a customer ’ s expenditure amount for food and the same customer ’ s expenditure amount for beverages ’ (i.e. the null hypothesis; ‘ Ho ’ – the hypothesis that you as a researcher would like to challenge with statistical evidence), and the alternative hypothesis can be that ‘ there is an association between a customer ’ s expenditure amount for food and the same cus- tomer ’ s expenditure amount for beverages ’ (i.e. the alternative hypothesis; ‘ Ha ’ – the hypoth- esis that you as a researcher would like to prove certain existence of associations between the variables of your interest by negating the null hypothesis with statistical evidence).
The following provides some other examples of combinations of null and alternative hypotheses. These are not necessarily derived from existing research but are used as exam- ples of how you can think about hypotheses testing.
● A null hypothesis can be that ‘ there is no association between degree of employee satisfac- tion and the financial performance of the employer ’ , and an alternative hypothesis can be that ‘ there is an association between the degree of employee satisfaction and the financial performance of the employer ’ ;
● A null hypothesis can be that ‘ there is no association between whether the mortgage bor- rower is classified as subprime borrower (or not) and the likelihood of the borrower fil- ing bankruptcy ’ , and an alternative hypothesis can be that ‘ there is an association between whether the mortgage borrower is classified as subprime borrower (or not) and the likeli- hood of the borrower filing bankruptcy ’ ;
I N T R O D U C T I O N T O Q U A N T I TAT I V E M E T H O D S F O R T O U R I S M I N D U S T RY A N A LY S I S 31
● A null hypothesis can be that ‘ there is no association between increase in marketing expen- ditures of the regional marketing office (often known as Convention and Visitors Bureau:
CVB) and the change in numbers of inbound visitors to the region ’ , and an alternative hypothesis can be that ‘ there is an association between increase in marketing expenditures of the regional marketing office (CVB) and the change in numbers of inbound visitors to the region ’ (this is a return on investment (ROI) in marketing issue).
Imagine only two variables of X and Y. If you believe that as X increases Y should increase, then you are assuming the existence of positive slope (slope increases towards 2 o ’ clock direc- tion). If X represents the disposable income in households and Y represents tourism-related expenditures per year and you assume a positive slope, then your null hypothesis would be ‘ Ho there is no association (slope 0) between amount of disposable income in house- hold and amount of tourism-related expenditures ’ and your alternative hypothesis would be ‘ Ha there is an association (slope 0) between amount of disposable income in household and amount of tourism-related expenditures ’ . If your parameter estimation error is relatively small, such as less than 5%, in relation to the value of the estimated coefficient in parameter (which means you are investigating whether there would be a substantial possibility that the slope will become zero), you can say ‘ we have enough statistical evidence to refute the null hypothesis that there is NO association between amount of disposable income in household and amount of tourism-related expenditures. ’
2.4.3 Regression models
Regression is a statistical tool to verify whether any relationships exist between variables of interests. It can be between sales at restaurants and the numbers of visitors to the region, or likelihood of a mortgage loan borrower ’ s default and the usage of adjustable rate mortgages.
There are many variations of regression modeling, starting from simple linear regression where there are only one set of dependent variables (the data you are interested in) and one set of independent variables. If you think that the height of children below the age of 18 years can be explained by the age of that student,Y X , where Y height and X age.
If there are more than two independent variables, they would be incorporated as a multivari- ate regression model. Typically, you want to find whether there is a slope, positive or nega- tive, in the relationship between the independent variable(s) and the dependent variable, and you will examine the data whether the slope is kept away from being flat (i.e. no slope) over 95% of the cases. That is measured by comparing the estimated coefficients with the standard error of the coefficient.
Techniques used in regression models by statisticians are similar with those used in econo- metric models. One of the differences would appear to exist in the mindset of researchers.
Econometricians tend to discourage any attempts to measure relationships without referring to relevant theories such as those in macro- and microeconomics, labor economics, public finance economics, monetary theories, international trade theories, etc. While processes of verification
with existing theories are also important in noneconometrics environment, negative percep- tions towards ‘ measurement without theory ’ may not be as punishing as in econometrics.
One example would be a technique called data-mining. The term data-mining itself may bring about negative connotations to some econometricians and economists due to the implied lack of prior verification with existing theories, but in commercial, business, and financial environments, statisticians are making best use of increasing powers of personal computers to identify trivial, nonapparent, hidden patterns of information from certain combinations of model specifications from huge amounts of real data. The huge dataset often exist in propri- etary, commercial environments, such as consumers ’ consumption patterns collected by credit card companies, retail banks, online vendors, or hotel chains. Because the datasets are huge in sample size and numbers of variables (often 200 and more), computers are programmed to use large numbers of models in an attempt to find any significant increases in dependent variables.
If a cosmetic company plans to introduce a new cosmetic product targeting at middle age, self- conscious women, identifying the target segments would probably help the likelihood of the product penetrating into the segment. In addition, if a company can extract certain hidden patterns by identifying combinations of key variables, the company would be able to generate narrower focused target segments where the response rate to the pilot campaign is expected to be significantly higher (e.g. variables such as age group, annual income level, type of occupa- tion, type of car they drive, hotels they stay at, magazines they subscribe to, travel destinations they choose, number and age of children, zip code of residence (indicating the area they live, which may have spatial autocorrelations with other data), etc.). There are no theories behind the certain combination that maximizes the likelihood of a purchase of a promoted product and, as long as the combination maximizes the likelihood of purchase, everything else can be secondary value. This can be a very different attitude from many econometricians.
2.4.4 Econometrics model
Econometrics is a statistical application to deal with economic data in society from an eco- nomics point of view. Economic data are often called secondary data, as econometricians do not collect data by designing experiments but most likely use the data of the society that were collected and compiled by others. Econometrics is similar to regression and statistical models in structure, but the data they deal with are far from the controlled environment with which statisticians are more familiar. In that regard, econometricians tend to face more problems with violations of various assumptions used in the statistical environment and thus become more familiar with how to deal with them.
2.4.5 Time-series model
Techniques employed in the time-series model are similar to other statistical methods, but there is a difference. The time-series model does not depend on other variables at the same timeframe, but it does depend on the past behavior of variables, including the past data of the variable itself. We will start from a simple model.
I N T R O D U C T I O N T O Q U A N T I TAT I V E M E T H O D S F O R T O U R I S M I N D U S T RY A N A LY S I S 33
If you are concerned with your performance in a coming examination, you may con- sider how you have been doing in the examinations up to now, and if the results of your last examination were just as good as those of your previous examinations, then you hope you should do well with the coming examination (as you seem to know how to study and pre- pare for examination). This is very different from other models in which you thought your performance in the coming examination depends on how many hours you slept, how many classes you skipped, the temperatures of the examination day, and how happy you are with your friend. In time-series modeling, the relative relevance of your examination result last week may be higher than your examination result 2 months ago when it comes to predicting your performance in tomorrow ’ s examination. The time-series model can be referred to as the extrapolative method, in contrast with other methods such as regression or econometrics models, which can be considered as causal methods. The time-series technique is very impor- tant in financial fields and in tourism-related forecasting.
While time-series data deal with the collection of observations on the same entity across time, another contrasting concept is the cross-section data, in which observed data are col- lected from different entities at the same time period.
2.4.6 Forecasting
Forecasting is the group of techniques used to predict certain values of your interest in the future. While there are some nonquantitative methods, such as the Delphi method, many rigorous forecasts use a combination of the quantitative methods that are mentioned in this chapter. Demand forecasting is critical for hospitality company managers, in terms of capital budgeting and taking proactive steps to curve out the fluctuations of market demands and not to miss opportunities to maximize their profit margins when appropriate.
Forecasting, particularly tourism-demand forecasting, even requires some qualitative techniques and rigorous utilization of all the quantitative methods particularly with the high level of regression and econometrics applications, it is the field which is led and expanded by small numbers of economics/econometrics-trained tourism researchers. Their textbook is specific to tourism-demand forecasting and is comprehensive and rigorous, covering the sto- chastic side of the quantitative methods.
Frechtling (2001) has published a thorough guide of forecasting tools that can be used for tourism-demand forecasting, and the book provides ample examples and actual data to work with. The book is good at suggesting appropriate strategies for a researcher who plans to con- duct tourism-demand forecasting. It offers actual monthly data on hotel/motel room demand in Washington DC metropolitan area. The book chapters clearly show how the author catego- rizes the forecasting into sets of different methods as follows:
1 Introduction
2 Alternative forecasting methods and evaluation 3 The tourism forecasting process