and Application to Print Advertising Data
Michael Smith
AUSTRALIAN GRADUATESCHOOL OFMANAGEMENT, UNIVERSITY OFNEWSOUTHWALES
Robert Kohn
AUSTRALIAN GRADUATESCHOOL OFMANAGEMENT, UNIVERSITY OFNEWSOUTHWALES
Sharat K. Mathur
BOOZ, ALLEN ANDHAMILTON, CHICAGO, IL, USA
A new regression-based approach is proposed for modeling marketing the analysis, the functional form of the effect of each indepen-dent variable, and the appropriate transformation for the de-databases. The approach is Bayesian and provides a number of significant
pendent variable. improvements over current methods. Independent variables can enter into
This article introduces a highly flexible Bayesian regression the model in either a parametric or nonparametric manner, significant
approach to the marketing literature that addresses such issues variables can be identified from a large number of potential regressors,
and has the following advantages over existing approaches to and an appropriate transformation of the dependent variable can be
regression. automatically selected from a discrete set of prespecified candidate
trans-formations. All these features are estimated simultaneously and
automati-1. Independent variables can enter the model either para-cally using a Bayesian hierarchical model coupled with a Gibbs sampling
metrically or nonparametrically. By parametric, we scheme. Being Bayesian, it is straightforward to introduce subjective
infor-mean that the functional form of the independent vari-mation about the relative importance of each variable, or with regard
able in the regression is known (for example, an inde-to a suitable data transformation. The methodology is applied inde-to print
pendent variable can enter linearly or as a quadratic), advertising Starch data collected from 13 issues of an Australian monthly
whereas by nonparametric we mean that the functional magazine for women. The empirical results highlight the complex and form is not prescribed, but is estimated from the data. detailed relationships that can be uncovered using the methodology. An This is particularly important in modeling marketing Splus compatible package called “br” that implements the semiparametric data because the form of the relationship between the approach in this article is currently available at the Statlib repository on dependent and any particular independent variable is the world wide web at http://www.stat.cmu.edu/S/. J BUSN RES 2000. often unknown and is usually chosen subjectively. 49.229–244. 2000 Elsevier Science Inc. All rights reserved. 2. The procedure identifies regressors that have a high
probability of being significant determinants of the de-pendent variable. It does so by estimating the posterior probability that each of the regression coefficients are
A
large proportion of research in marketing attempts to non-zero, conditioned only on the information in the understand the factors affecting consumer behavior. dataset. Traditional approaches to identify significant The number of factors is frequently large and often regressors include all-subsets regression and stepwise complex and interrelated. With the growing sophistication of regression. However, all-subsets regression is computa-computing tools, there is a shift from separated to integrative tionally impractical in datasets with a large number of databases (Curry, 1993) that provide the capability of model- variables. Stepwise regression also is often unsatisfactory ing these complex interrelationships. Researchers modeling because at each stage only the probability that each such data as a regression are faced with decisions on key regressor is non-zero, conditional on full knowledge of issues, such as which variables and interactions to include in the significance (or otherwise) of other regressors, canbe identified. This results in a procedure that considers only a restricted number of subsets and often identifies Address correspondence to Dr. Michael Smith, Econometrics, University of
Sydney, New South Wales 2006, Australia. completely incorrect subsets (Kass and Raftery, 1995).
Journal of Business Research 49, 229–244 (2000)
2000 Elsevier Science Inc. All rights reserved. ISSN 0148-2963/00/$–see front matter
Our approach uses a Bayesian hierarchical regression tions given in Gelfand and Smith (1990) and Casella and George (1992). The hierarchical model in this article was model that is estimated using a Gibbs sampling scheme,
an estimation methodology that involves a stochastic proposed in Smith and Kohn (1996) and improves and ex-tends the work on such models by Mitchell and Beauchamp search that traverses most of the likely subsets of
vari-ables. The result is a variable selection procedure that (1988) and George and McCulloch (1993). It makes it feasible to identify significant regressors from a large number of such does not suffer from the unreliability of stepwise
regres-sion, nor the computational problems of all-subsets re- variables, whereas this was not practical with previous ap-proaches. Furthermore, previous approaches did not consider gression. This is particularly useful with the large and
complex datasets that often arise in marketing. data transformations.
The features outlined above make our methodology partic-3. The Bayesian hierarchical model we use enables
consid-eration of datasets where there are large numbers of ularly useful in exploratory analysis of large marketing data-sets. We illustrate this by applying it to Starch print advertising independent variables, including cases where these are
collinear. This can even include cases where there are data from an Australian monthly magazine for women. Such data contain measurements of advertisement (ad) readership, more regressors than observations, but many of which
are redundant. along with a large number of content and product category variables. One feature of this data is that there is likely to be 4. An appropriate transformation of the independent
vari-able is selected from a discrete number of candidate substantial interaction between these two groups of variables, though exactly which variables is unknown. This gives rise transformations. Traditional regression approaches
usu-ally assume that an appropriate transformation of the to a dataset with a large number of main effects and interaction terms (in our case over 200) from which those of key impor-dependent variable is known before model estimation
takes place. tance require identification. In addition to such interactions, there are regressors (such as position of the ad in the magazine 5. The functional forms of those variables that are modeled
nonparametrically, the effect of the other independent issue) that are unlikely to be related to the readership scores in a linear manner, but where the true functional form is variables, their probability of significance, and the
choice of the transformation of the dependent variable unknown a priori. Finally, the readership scores themselves are far from normally distributed, being bounded between are estimated simultaneously. It is crucial that these
aspects of the regression model are not estimated condi- zero and one and also skewed, and require an appropriate data transformation. To estimate such a regression model by tionally upon each other. For example, if an
inappropri-ate transformation is chosen, then the wrong variables using any existing alternative methodology would require the researcher to subjectively make the decisions about data trans-may be identified as significant and their effects
incor-rectly estimated. Conversely, if the wrong variables are formations and functional form before conducting variable selection by using techniques such as forward or backward identified, then there will be so much variability in the
regression estimates that it may be difficult to choose the selection.
The article is organized as follows. The second section appropriate data transformation. Current approaches to
regression either select the variables, given the data describes the data and the proposed model and relates them to the current literature on print advertising. The third section transformation, or choose a data transformation from a
family of transformations but do not select variables at describes the Bayesian approach to semiparametric regression. It sets out the Bayesian hierarchical model, discusses the prior the same time. To the best of our knowledge, there
are currently no other approaches that simultaneously assumptions, and briefly discusses estimation by the Gibbs sampler. However, for details on the implementation and identify significant variables, fit any required regressors
nonparametrically, and select a suitable transformation further discussion of the sampling scheme, see Smith and Kohn (1996, 1997). The fourth section reports the results of for the data.
6. Because this approach is Bayesian, it allows the re- the application to the Australian print advertising data, whereas the fifth section summarizes the implications of the searcher (and the manager) to introduce into the
analy-sis prior subjective information on the relative impor- methodology for marketing researchers. The appendix lists the variables used in the study.
tance of each independent variable. This is in addition to the information in the data and enables the modeler
to incorporate “soft” data. It also permits the user to
Modeling the Starch Print
determine the sensitivity of the estimates to variousif-Advertising Data
then scenarios.Introduction
The Bayesian analysis is conducted by using a Bayesian
hierarchical regression model that is estimated using a Gibbs Creating ads that attract consumer attention to their products is one of the major tasks facing companies. It is particularly sampling scheme. The Gibbs sampler is an estimation
affecting readership of print ads (the largest category of ads) The three attention-relevant scores are “noted” scores (indicat-ing the proportion of respondents who claim to recognize the because the advertiser has much less control over the
consum-er’s response than in a highly emotive medium like television ad as having been seen by them in that issue), “associated” scores (indicating the proportion of respondents who claim (Rossiter and Percy, 1987). This knowledge will help
advertis-ers in their creative and media strategies. to have noticed the advertiser’s brand or company name or logo), and “read-most” scores (indicating the proportion of Researchers have tried to determine what aspects of print
advertising affect readership for over three decades. Previous respondents who claim to have read half or more of the copy). These have the structure where a noted score is at least as research investigating print advertising can be classified into
two groups. The first investigated the influence of specific large as the corresponding associated score for an ad, which, in turn, is at least as large as the read-most scores. Therefore, aspects of print ads, such as the presence of absence of pictures
(Edell and Staelin, 1983), position in the magazine (Frankel we normalized these so thaty(1)5 (raw noted), y(2) 5(raw associated/raw noted) andy(3)5(raw read-most/raw associ-and Solov, 1963), type of headline (Soley associ-and Reid, 1983),
the presence and use of color (Gardner and Cohen, 1964) ated), and we use these measures as our dependent variables. and size (Starch, 1966; Trodahl and Jones, 1965). Most of The data used in the study are for the 13 months between this research used controlled experiments, with a few variables March 1992 and March 1993. During this period, 1,030 ads investigated at any one time. appeared in the magazine, but 25 were dropped because of The second group of research is more integrative and exam- missing observations on some of the independent variables. ines the collective impact of a number of ad characteristics on The data from the first 12 issues are used to fit the model measures of ad effectiveness (Twedt, 1952; Diamond, 1968; (n5943 observations), whereas the data from the thirteenth Hanssens and Weitz, 1980). These studies typically use some issue (62 observations) are used as a hold-out sample. The measure of recall or recognition as the dependent variable, independent variables in the data describe the following four with a number of ad characteristics (both mechanical and aspects of the ad: (1) the position of the ad in the magazine, content related) as the independent variables. including inserts, (2) the major features of the ad, (3) the Whereas previous research added to our understanding of product category, and (4) the message in the ad. How each the key drivers of readership, it had two important method- of these was incorporated into the model is described below. ological limitations. First, all the models in the literature
as-sumed that the functional relationship relating the dependent
Variables Relating to the Position of the Ad
variable and the independent variables was known (exceptPrevious research has found the page number of the ad in for a few parameters that were estimated from the data). Most
an issue to be an important factor influencing readership of the research used linear or log-linear regression models
(Diamond, 1968). The Starch data include the page number (Twedt, 1952; Diamond, 1968) or a fixed nonlinear
relation-of the ad and the total number relation-of pages in all our issues. This ship (Hanssens and Weitz, 1980). In practice, the form of the
was used to construct the continuous variable relationship for some of the regressors (such as position of
the advertisement in an issue) is likely to be nonlinear but
P5 page number
number of pages in issue, difficult to predetermine and needs to be estimated from the
data. Second, few of the articles explicitly modeled the
interac-which takes values between 0 and 1 and is defined as the tion between the ad content and product category variables,
relative position of the ad in an issue. The front and inside-which would result in a large number of interaction terms,
front cover are coded as the first two pages, whereas the but used a multiplicative model to implicitly capture these
inside-back and the back cover are the last two pages. Figure interactions (Hanssens and Weitz, 1980).
1 plots the raw noted score against the position variable, With the use of Bayesian semiparametric regression, such
indicating a distinctly nonlinear relationship. limitations no longer apply, with unknown functional
rela-The usual approach to modeling such a nonlinear response tionships being modeled nonparametrically and the large
is to assume a specific functional form, for example, a linear number of interaction and main effects (even if collinear)
or a quadratic function. Such an approach is called parametric, entered into the regression and the key determinants
identi-as the functional form is specified a priori. This article takes fied. As such, the Starch data form an interesting
demonstra-tion of the Bayesian methodology. To give our results face a nonparametric approach instead, in which the shape of a response functionfis estimated from the data. We do so by validity, we compare the procedure’s predictive ability with
some common alternatives. using a cubic regression spline, which is a piecewise cubic polynomial between m so-called “knots” that partition the The data investigated in this article are collected for all
advertisements in each issue of a leading Australian monthly domain of the independent variable (in this case Pe[0,1]) into m 1 1 subintervals. If the location of the m knots is magazine for women. They consist of three attention-relevant
Figure 1. The scatter plot is of the noted scoresy(1)against position in
is-sue P. The bold line is the Bayesian regression fit to the data, the dotted line is from a local regression with a low smoothing parameter value, and the dashed line is from a local regres-sion with a higher value for the smoothing parameter. No smoothing parameter value would enable a local regression to simultaneously capture the curvature of the relationship, while remaining smooth.
“local bias” (that is, where important features offare missed)
f(P)5b01 b1P1 b2P21 b3P31
o
mi51
b31i(P2 pi)31
because knots will not be located in all the appropriate posi-tions to capture variaposi-tions in the nonlinear relaposi-tionship. How-where (x)3
15max(0,x)3. [1]
ever, if too many knots are used and the parameters are estimated using least squares, thefwill have high “local vari-It can be readily shown that such an approximation for the
unknown response functionfis not only continuous, but has ance”(that is, the estimate will not be smooth). Our solution is to introduce a lot of knots (m520), but instead of ordinary continuous first and second derivatives. Such regression splines
are used frequently in practice because they form a linear model least squares we use a hierarchical Bayesian model (as dis-cussed in Estimation Methodology below) that explicitly ac-where thebis are regression parameters and the terms {P,P2,
P3, (P2 p
1)1, . . . (P2 pm)1} are terms in a regression. counts for the possibility that many of them are redundant. This results in an estimation procedure that provides smooth, An important issue with regression splines concernsm, the
Alternative nonparametric regression methodologies exist which is obtained by substituting in a regression spline of the type outlined at equation [1.1] withm520 knots. When an including the popular local regression and smoothing splines,
for which general references are Hardle (1990) and Eubank ad is an insert, equation [1.2] is equal to an intercepta; when an ad is within a magazine, it is equal tof(P).
(1988), respectively. However, the methodology presented here has several distinct advantages that lead us to prefer it
in the empirical analysis of our Australian Starch data. First,
Variables Describing the Major Features
it is locally adaptive in the sense that it allows the curvature
of the Ad
of the regression function to change across its domain. Both Previous research investigating the impact of major features local regression and spline smoothing work best when the of the ad on magazine readership found significant effects for curvature of the regression function does not vary greatly size of the ad (Trodahl and Jones, 1965; Diamond, 1968), across the whole domain. presence of pictures (Edell and Staelin, 1983), and type of This property is important in identifying the effect of the headline (Soley and Reid, 1983). There are 33 independent position variable on readership scores. To demonstrate this, variables in the data that describe the major features of the we estimated the univariate nonparametric regression model ad, such as color, size of ad, presence or absence of bleed,
headline type, etc. The appendix lists these, including
vari-y(1)5f
uni(P)1 e where ez iidN(0,s2).
ables X1, . . ., X31that are mostly binary and therefore enter Figure 1 plots the results of the estimation with the bold line the regression linearly as
o
31i51biXi. The other two variables being the estimatefˆunifrom the Bayesian approach discussed are the size of the ad (S) and its brand prominence (B), which above.1This captures the profile of the effect of the position
we model nonparametrically. Each assumes a discrete, but variable on noted scores, while remaining smooth. The dotted small, number of levels (0–3 and 0–4, respectively), and it is line denotes the estimate from a local regression with a inappropriate for such variables to be modeled using a cubic smoothing parameter set so that the peak in the noted scores regression spline (Smith, Sheather and Kohn, 1996). Instead, for the last tenth of the issue could be caught. However, an we use a dummy variable basis for each, so that
unavoidable by-product of this is that the rest of the curve is
nonsmooth. The dashed line corresponds to a local regression g(S)5
o
3 i51ciIi(S) and h(B)5
o
4i51
diIi(B), where the smoothing parameter is set larger so that the
esti-mate is smooth, but key features of the curve (such as the
where Ii(x)5
5
1 ifx5 i0 otherwise . exposure of the middle fifth and last tenth of magazine) are
bleached out.2A further discussion of the comparison of local
Here, g and h measure deviation from the base exposure regression, Bayesian regression and other approaches can be
attributable to an ad withS50 (size less than a full page) and found in Smith and Kohn (1996).
B50 (brand not present). The termscianddiare regression The second advantage of our approach is that there are
coefficients that we estimate using the Bayesian hierarchical no alternative nonparametric regression approaches that can
model. This ensures g and h measure significant deviations extend to the large integrative regression models of the type
from the base case ofS50 andB50, as opposed to those developed here.
that would be obtained using least squares. Therefore, the In addition to ads within the magazine, each issue had a
model for the main effects of the major features of the ad is few ads that were inserts and thus had no page number. To
[Eq. (3)]: deal with these, we created a dummy variableIfor the presence
(I 5 1) or absence (I 5 0) of inserts. In our integrative regression model, the combined effect of position in the
maga-zine and inserts is [Eq. (2)]:
1
o
It is likely that consumers have different ad readership scores
1b23(I2 1)(P2p20)31, [2]
for different product categories. Hanssens and Weitz (1980) addressed this issue by classifying products into three groups—routine, unique, and important products—and
mod-1This was produced using the Splus compatible package “br” available from
the Statlib repository. eled each group separately. In contrast, our model allows the
2The local regression was undertaken using the default local regression
simultaneous estimation of product effects by introducing
estimator “loess” in Splus with the smoothing parameter set to 0.18 for the
specific interactions between the product type variables and
first fit and 0.75 for the second. No single smoothing parameter for such an
ad content variables. This allows the investigation of
product-estimator would enable an estimate that is both smooth and captures the
There are 23 product types T1, . . ., T23, recorded in our color (i5 4) of the ad. They enter into the model as main effects and the following terms included [Eq. (5)]:
Starch data, which are described in the appendix. These vari-ables are binary and enter the model as main effects but are
o
8 thought to be mainly of significance as linear interactions withbudgetary and design variables. To explicitly model these interactions, we grouped the product types into four categories
Issue Number
based on the criteria in Rossiter and Percy (1987, pp. 166–
174)—the type of decision and the type of motivation. Rossiter The data consists of ads published in the 13 consecutive and Percy (1987) categorize decisions into two broad types— monthly issues from March 1992 to March 1993. The first low involvement decisions that have a low level of perceived 12 issues (M51, . . ., 12) were used as a calibration sample, risk (both economic and psychosocial) for the consumer and and the March 1993 issue was used as a hold-out sample for high involvement decisions that have a high level of perceived model validation. It seems unreasonable to attempt to estimate risk. Similarly, there are two types of motivations for product any seasonal or trend components because the data used to fit the model span only a single year. Nevertheless, it is impor-purchases—informational (which reduce the negative
motiva-tant to control for issue specific effects. Consequently, we tion associated with the purchase) and transformational
modeled the issue effect nonparametrically with a dummy (which increase the positive motivation associated with a
pur-variable basis, so that [Eq. (6)] chase). We call the four categoriesG1, . . .,G4and define them
as follows:G1, low involvement and informational; G2, low
l(M)5
o
12
i52
riIi(M). [6]
involvement and transformational;G3, high involvement and informational; G4, high involvement and transformational.
The functionlmeasures any significant deviations in reader-Each product typeTiis classified as belonging to one of the
ship scores for the April 1992 to February 1993 issues from four categories, with the classification given in the appendix.
that of March 1992 and controls for any potential monthly For example, T1 is women’s apparel and accessories and is
effects on the training sample. Here,ridenote regression coeffi-classified as G4 (high involvement and transformational).
cients andIi(·) is as defined as before. There are 136 linear interactions ofG1, . . .,G4 with the ad
attributesX1, . . .,X31,S,B, andI, which we model as simple
Transforming the Dependent Variables
multiplicative interactions in equation [1.4] below. We use
There are three measures of ad readership in the Starch data: the Rossiter and Percy (1987) categories to illustrate the
meth-noted, associated, and read-most scores, each of which is a odology, but there are other ways to classify products, and
proportion. Simple histograms and density estimates demon-the results may vary according to demon-the classification.
strate that the raw scores are far from normally distributed The variableRfor the presence or absence of recipes was
and may require transformation to satisfy both the additivity treated as a special case, and only its interaction with G2
assumptions in a regression model and the assumptions of a (the group that is low involvement and transformational and
Gaussian error distribution. Previous researchers used trans-includes food and drink products) is included. The following
formations such as arcsine (Finn, 1988) or logarithmic (Hans-terms were included in the full model [Eq. (4)]:
sens and Weitz, 1980). These transformations typically are decided in advance and imposed on the data. We considered nine different normalized transformations for the dependent variables of the type
Tl(y)5al1bltl(y).
In the abovel 51,2, . . ., 9 indexes our nine transformations andtl(y) is what we call the “base transformation.” This can be any monotonically increasing transformation, which we normalize using two constantsalandblto get the actual data transformationTl(y). These constants can be calculated from the data for each transformation so that the dependent variable has the same median and interquartile range pre- and
post-Variables Describing the Message in the Ad
transformation; see Smith and Kohn (1996) for details. InTwenty-eight ad message binary variables X[j]
i were created effect, these constants normalize the base transformations con-from four original Starch categorical variables for the purposes sidered to make them scale and location invariant—a property of inclusion in the model. These variables are described in the that eases the qualitative interpretation of the regression esti-appendix and include predominant feature (i51), headline mates.
Table 1. Transformation Index and Base Transformations
Noted Score Associated Score Read-Most Score
l tl(y) al bl al bl al bl
1 F21(y0.1) 20.516 0.678 20.124 0.459 20.537 0.692
2 F21(y0.25) 20.063 0.564 0.165 0.396 20.072 0.573
3 F21(y0.5) 0.240 0.478 0.368 0.346 0.237 0.483
4 F21(y0.75) 0.398 0.428 0.478 0.316 0.397 0.431
5 F21(y) 0.500 0.394 0.552 0.295 0.500 0.395
6 F21(y1.5) 0.631 0.347 0.649 0.266 0.630 0.347
7 F21(y2) 0.713 0.315 0.712 0.246 0.712 0.315
8 log(11y) 0.819 0.625 0.892 0.873 0.798 0.584
9 sin21(y0.5) 20.278 0.991 20.117 0.820 20.281 0.994
The first column gives the transformation index, whereas the second the base transformations. The remaining columns give the constants required to make the normalized transformation scale and location invariant. These constants are provided for each of the three dependent variables considered in our dataset. The last two transformations were proposed by Finn (1988) and Hanssens and Weitz (1980).
which include those of Finn (1988) and Hanssens and Weitz are severely limited. All-subset regression requires 2p regres-(1980). The other seven are of the formF21(yQ), which are sions to be undertaken—something that is computationally probit transformations with various values for the skewing infeasible. The stepwise procedure, while feasible, traverses only parameter Q. It is important to note that the methodology a small number of subsets, which often leads to the situation allows a data-driven selection of the most appropriate normal- that forward and backward stepwise procedures result in sub-ized transformation from those listed in Table 1. A user could stantially different estimates (Kass and Raftery, 1995). well select any finite number of transformations considered All these problems lead us to use a Bayesian hierarchical likely and use the procedure to select between these. As imple- model, coupled with a Gibbs sampler, to estimate this large mented, our methodology may lead to different transforma- regression model. This approach can handle a large number tions for the noted, associated and read-most scores. of variables (p 5 262 in this case), even when collinearity exists, without being hindered by the problem that there are only about four times as many observations as coefficients to
Estimation Methodology
be estimated. It can do so because it reduces the effective dimension of the problem by explicitly modeling thepossibil-Introduction
ity that many of the terms in the design matrix may be redun-All the terms constructed in the previous section listed at
dant. Using the Gibbs sampler to undertake the computations equations [2] to [6] can be collected together to form a large
results in the redundant variables being identified reliably. design matrixX with p 5 262 terms for each of the three
This is because the search over the distribution of important regressions [Eq. (7)]
subsets is stochastic, rather than deterministic as in stepwise
Tl(y(j))5Xb 1 e forj5 1, 2, 3. [7] regression, and it traverses many more likely combinations of
regressors than the stepwise regression approach. Further-Here, we consider a regression for each of the three
trans-more, all other approaches require prespecification of the formed readership scores; so thatTl(y(1)),Tl(y(2)), andTl(y(3))
data transformation, whereas the Bayesian methodology can aren-vectors of the observations of the noted, associated and
simultaneously select the appropriate data transformation and read-most scores, respectively, transformed according to the
identify significant variables. transformationTl. The vectorbis made up of the regression
coefficients introduced earlier (thea,bis,cis,dis,ris, and various
The Bayesian Hierarchical Model
bs) whereas the vectoreis made up of errors distributed iid
N(0,s2).
This section briefly describes the Bayesian hierarchical model, Estimating such a regression by using least squares presents with a full exposition being found in Smith and Kohn (1996). a number of problems. First, there is often collinearity between The key idea behind such a model is that we explicitly model the regressors. Second, even when collinearities are elimi- the possibility that terms in the regression at (6) may be nated, variability in the regression coefficient estimates would superfluous and can be omitted from the regression. To do be largely due to a lack of degrees of freedom. Third, the this, we use the vector g 5 (g1, g2, . . ., gp)9 of indicator estimates for the nonparametric function componentsf,g,h variables, where each element is defined as
andlwould be nonsmooth.
Commonly used approaches for tackling this problem are g i5
5
That is,gi is a binary variable that indicates whether or not in the model, the sample enables Monte Carlo estimation of the posterior probability Pr(l 5i|data).
bi is zero and therefore whether or not the corresponding
From this distribution, we can pick the single most likely independent variable enters the regression. Therefore, the
vec-transformation and label it the “mode” vec-transformation lˆM. torg explicitly parametrizes the subsets of regressors in the
Following Box and Cox (1982), we estimate the remaining linear regression at (6). The regression can be rewritten,
condi-features of the model conditional on this best transformation. tional on this “subset” parameter, as
The sampling scheme is run again, and another Monte Carlo
Tl(y(j))5Xgbg1 e forj51, 2, 3. sample for the subset parameter g is obtained. Using this, estimates of the expected value ofbgiven the data (called the Here,bg are the regressors that are non-zero, giveng, and
posterior mean), smoothing over the distribution ofg, can be Xgis a design matrix made up of the corresponding columns obtained; that is, we estimate E(b|l 5 lˆ
M, data). Estimates of X.
of the probability each term in the regression is non-zero (and Because this is a Bayesian methodology, it is necessary to
therefore significant), given the data, can also be calculated; place priors distributions on the unknown parametersg, l,
that is, we estimate Pr(gi51|l 5 lˆM, data). Our
implementa-b, and s2. Ideally, such priors are “uninformative,” unless
tion followed that outlined in Smith and Kohn (1996), but there is some real information on a parameter or group of where our sampling scheme was run for 10,000 iterations for parameters (for example, from previous research). The priors convergence and 20,000 iterations to obtain the Monte Carlo we use here are listed below. sample ofg. This is a conservative sampling length and took
approximately four hours on a standard DEC UNIX worksta-1. The prior distribution ofbgconditional ong, s2 and
tion. Similar results were obtained with sampling runs of
one-lisN(0,ns2(X9
gXg)21). The variance matrix in this prior
tenth the length and took under one hour to run. is simply that of the least squares estimate ofbgblown
This produces estimates of the transformation and the re-up by a factor ofnand therefore is relatively
uninforma-gression parameters that are robust to the fact that it is un-tive. A fully uninformative prior forbgcannot be used as
known a priori which terms actually are in the regression. it results in what is commonly called Lindley’s paradox
This is in contrast to least squares regression which estimates (Mitchell and Beauchamp, 1988).
the regression coefficients given some particular known sub-2. The prior density fors2 giveng is proportional to 1/
set, or value of,g, as well as a prespecified data transformation.
s2. This is a commonly used prior for s2 and means that logs2has a flat prior or uninformative prior (Box
and Tiao, 1973).
Empirical Estimates and
3. We assume that givenl, thegiare a priori independent
Methodological Comparison
withPr(gi50)5 pi,i51, . . ., p.The probabilitiespiare specified by the user. In this article, we takepi5
Introduction
1/2 for all i, which can be considered uninformative. We applied our estimation methodology to the integrative However, our methodology allows for a user to impose a regression models for the Starch scores noted (y(1)), associated prior assessment of the relative importance of individual (y(2)), and read-most (y(3)). The first 943 observations, consisting terms in the form a nonuniform prior ong. For example, of the March 1992 to February 1993 issues, are used for estima-ifX7(photo used) is considered more likely to have an tion. The March issue, consisting of 62 ads, is used for model impact on ad recognition than other variables, a higher validation. The sections below discuss various aspects of the value ofpicould be imposed on that variable. model estimates. Model Validation below discusses the pre-4. We assume that each of the nine candidate transforma- dictive performance of the model on the hold-out sample com-tions is equally likely, that is,Pr(l 5i)51/9 for i5 pared with the application of standard regression approaches
1,2, . . ., 9. This is an uninformative prior forl; although to such a dataset, whereas Managerial Implications discuses if a researcher wants to encorporate prior beliefs, some some of the managerial implications of the empirical results. transformations can be given a higher prior probability
than others.
Data Transformation Estimates
The most likely transformations, given the data, were (once
Estimation
normalized) sin21(y0.5),F21(y1.5), and sin21(y0.5) for the noted, Smith and Kohn (1996) develop a Gibbs sampling scheme to associated, and read-most scores, respectively. The posterior estimate this Bayesian hierarchical model and show how it probabilities of all nine candidate transformations for each of can be implemented. The sampling scheme produces a Monte the three scores are given in Table 2.Table 2. Estimated Posterior Probabilities for Each of the Nine Candidate Transformations
log(y11) 0.000 0.000 0.000 0.000 0.000 0.000
sin21(y0.5) 0.305 0.001 0.856 0.190 0.000 0.751
The first column gives the respective base transformation (though we consider the normalized version of this, as discussed in text under Transforming the Dependent Variables). The first three columns correspond to the posterior probabilities by using the default uninformative prior given at item (4) in text under Bayesian Hierarchical Model, whereas the last three columns are for the informative prior that places more prior weight on the seven probit transformations. The posterior probabilities of the most likely transformations for each score and prior are highlighted in boldface and do not change with respect to either prior.
that the asymmetric arcsine transformation discussed by Finn (Diamond, 1968; Hanssens and Weitz, 1980), which showed a significant negative coefficient for the page number variable. (1988) is the most likely (in that it has the highest posterior
probability) of those considered here for the noted and read- However, the nonparametric regression is able to capture the more sophisticated relationship between noted scores and most score regressions. However, in the associated score
re-gression, a skewed probit transformation is the most likely, position than the fixed linear relationships imposed in prior research. After the rapid drop that occurs in the front of the given the data. The logarithmic transformation proposed by
Hanssens and Weitz (1980) is the most unlikely, with a poste- magazine, there is a resurgence in ad exposure for the middle 40% of the magazine. After this, positioning an ad 80–90% rior probability equal to zero (to three decimal places) for all
three scores in our datasets. into the magazine appears to be the worst choice, whereas the last 5% seems to be an improvement. Notice that the The posterior probabilities of the transformations seem to
be fairly insensitive to choice of the prior distribution. For estimate for the effect is slightly different than that uncovered in the univariate nonparametric regression found in Figure 1 example, we used an informative prior that ascribed double
the probability to the seven skewed probit transformations as because the current regression controls for other effects that are correlated with the position variable, as well as having a prescribed for the arcsine and logarithmic. That is, the prior
Pr(l 5i)5 1/8 fori5 1, . . ., 7 and Pr(l 58)5 Pr(l 5 transformed dependent variable.
Figure 3 (b.1) provides the estimate offfor the associated 9)51/16. The data transformations selected using this
infor-mative prior were the same as those selected using the uninfor- scoresy(2). The profile of the estimate differs substantially from that of the noted score and indicates that for high associated mative prior, although the posterior probabilities change
slightly (see Table 2). These results are reassuring because scores the worst place to advertise is 70–90% into the maga-zine, whereas the last 5% is advantageous.
they demonstrate that the transformation selection is based
on information from the data and is not “prior driven.” The insert dummy variable Iwas a significant contributor to all three scores, with Table 3 providing the insert main effects and interactions with the product categories that had
Impact of Variables Related to the Position
a greater than 35% chance of being non-zero (i.e., Pr(gi 5
of the Ad
1|data, l 5 lˆM) . 0.35). Also included in the table is the Figures 3 (a.1) (a.2) and (a.3) outline the impact ofP, the
estimated intercept of f(P) from equation [2], namely, the position of the ad in the issue, on ad recognition scores. It is
coefficientbˆoof (12I). Notice that inserts have a similar effect evident from these figures that the position of the ad is a
as advertising on the front cover of the magazine, although for determinant of noted and associated scores, but is not related
the associated and read-most scores they achieve slightly to the read-most score.
higher exposure. Figure 3 (a.1) is a plot of the regression spline estimate of
the responseffor the noted scorey(1). This estimate suggests
Impact of the Major Features of the Ad
that advertising in the front end of the magazine (the first
The size of the ad has a strong effect on the noted scores, but 15%) is immensely beneficial from the point of view of
at-the marginal gain decreases after a full-page ad (S51). This tracting casual attention to the ad leading to high noted scores.
is illustrated in Figure 3 (a.2), which plots the estimate of the However, there is a dramatic decrease in exposure as ads are
main effect,g, of ad size as the solid curve. Interestingly, for placed further into the magazine, up to approximately 20% in
transfor-Figure 2. (a–c) Normal probability plots based on the residuals obtained from each of the three regressions. If the quantiles of a normal and the observed residuals are linearly related with a slope of 45 degrees, then this indicates that the residuals follow a normal distribution.
mational” (for example, fashion apparel and accessories, is a very flat ad size main effect. However, in a similar manner to associated scores, the figure indicates transformational household furnishings, and jewelry), the size of the ad has a
much more pronounced effect. When combined with the main products react positively to increases in ad size. This is caused by positive slopes forSG2andSG4, which are non-zero with effect, it produces the relationship represented by the long
dashed line. This is caused by a value of 0.045 for the slope a probability of 0.649 and 0.765, respectively.
The relationships between the three scores and brand prom-of the interactionG4S, which is identified as having a posterior
probability of being non-zero of 0.946. inence are shown in Figure 3 (a.3, b.3, and c.3). For both the noted and associated regressions, there are strong nonlinear For the associated scores, ad size has a marginal main
effect, although some effect may occur on products that are effects, whereas there are no significant interaction effects with any of the four product categories. In particular, for associated high involvement and transformational because there is a
pos-terior probability of 0.411 thatSG4is non-zero and positive, scores it seems that brand names have little effect unless they are highly prominent. For read-most scores, brand promi-something that is reflected in the interaction plots in Figure
informa-Figure 3. Estimates of the functionsf,g,h, andlfor each of the three regressions. The rows correspond to the four functions, whereas the three columns correspond to the three regressions arising from the dependent variablesy(1),y(2), andy(3), respectively. The figures in each column are plotted with the same
range on the vertical axis, so that an idea of the comparative strength of the effects within each regression can be obtained. This range is set equal to the distance between the 85thpercentile and 15thpercentile of
the transformed dependent variable of each regression (that is,T9(y(1)),T6(y(2)), andT9(y(3)). The main effects
are in bold, whereas interactions also are plotted for the effects ofBandS.
tional products, with the slope ofBG3 being20.039 with a in the three regressions (excluding those involvingP,I,B,S, andM), which have a posterior probability greater than 0.35 of posterior probability of being non-zero of 0.863.
Table 3. Estimated Insert Effects magazine, irrespective of product type. For example, variables
such as right hand position of the ad and headline color are
Term Probability Coefficient
likely to be related to noted scores for all products.
For noted
I 0.992 0.402
Product-Specific Effects
(12I) 0.992 0.442
It is apparent from Table 4 that product-specific effects are
For associated
I 1.000 0.743 very important for the three models. First, there are product
(12I) 1.000 0.739 types that are significant main effects. For example,T
12(motor
For read-most
vehicles) has a negative coefficient in all three regressions and
I 1.000 0.555
a posterior probability (to three decimal places) of being
non-IG1 0.522 0.094
zero of 1.000, 0.530, and 1.000 for noted, associated, and
(12I) 1.000 0.548
read-most scores. These, and the other product main effects
The posterior mean estimates of the regression coefficients,E(bi❘l 5 lˆM, data), are in the found in Table 4, appear consistent with the positioning of
“Coefficient” column. The “Probability” column contains the posterior probabilities of
the magazine as primarily for women with a focus on domestic
these coefficients being non-zero, Pr(gi51❘l 5 lˆM, data). The results are for the terms
involving the insert dummy variable, including the noninsert intercept (12I). Only those and entertainment issues. Second, the product-specific effects
coefficients that have a posterior probability.0.35 of being non-zero are reported.
manifest themselves as interactions between ad characteristics and the product categories G1,G2, G3, andG4. This already
Table 4. Coefficient Estimates and Estimated Posterior Probabilities of Being Nonzero
Description Term Prob. Coeft. Description Term Prob. Coeft.
For noted score regression
RH position X3 0.777 0.033 Cntnt is fin X10 0.724 0.02
Hdln color X20 0.838 20.015 Copy area X23 0.428 20.011
Furnishings T6 0.431 20.021 Building T8 0.998 20.211
Travel T9 0.909 20.096 Motor vehicles T12 1.000 20.157
Cosmetics T15 0.813 0.045 Jewelry T17 0.656 0.071
For associated score regression
Hdln color X1 0.992 20.018 Advertorial X30 0.472 20.023
Coupon X31 0.506 20.012 (Cntnt is fin)G1 X10G1 0.584 0.021
2 0.522 20.01 Pred col yel/org/brwn X[4]6 0.582 0.009
Pred apl income X[3]
6 0.57 0.014 Building T8 0.459 20.041
Motor vehicles T12 0.53 20.026 Cosmetics T15 0.565 0.019
Pets T22 0.777 0.047
For read-most score regression
Hdln color X20 0.65 0.011 Hdln type X21 0.467 20.006
(Num words)G1 X24G1 0.972 20.072 (Square ill)G2 X5G2 0.65 0.028
Motor vehicles T12 1.000 20.198 Jewelry T17 0.633 0.071
Government T19 0.923 0.184
This table provides the posterior means for the slope coefficients,E(bi❘l 5 lˆM, data), in the “Coeft.” column. Their respective posterior probability of being non-zero, Pr(gi51❘l 5
lˆM, data), is provided in the “Prob.” column. These are provided for all three regressions, but only for coefficients that had a posterior probability greater than 0.35 of being
non-zero. The description column provides a short description of the terms—for a full description, see the appendix. Terms that involve the variablesP,I,S,B, andMare not included
Table 5. Out of Sample Performance
Procedure
Regression Bayesian OLS Step and AIC 30-Factor 50-Factor 100-Factor
Correlation between predicted and actual values
Noted 0.764 0.706 0.743 0.727 0.695 0.620
Associated 0.511 0.507 0.488 0.464 0.331 0.347
Read-most 0.504 0.368 0.357 0.273 0.220 0.245
Mean absolute error of predicted values
Noted 0.075 0.086 0.092 0.109 0.102 0.100
Associated 0.070 0.074 0.151 0.111 0.114 0.086
Read-most 0.095 0.115 0.120 0.146 0.145 0.121
Mean squared error of predicted values
Noted 0.009 0.012 0.011 0.019 0.017 0.015
Associated 0.008 0.009 0.043 0.027 0.023 0.011
Read-most 0.015 0.021 0.023 0.036 0.032 0.022
This table contains diagnostics for the predicted readership scores in the hold-out sample (March 1993 issue) by using a variety of approaches. The six methods used are arrayed across the top of the table and are described in the text, while the results for all three readership scores are presented. The diagnostics include (1) the sample correlations between the actual readership scores and predicted scores, (2) mean absolute prediction error, and (3) mean squared prediction error. Here, OLS stands for ordinary least squares and AIC for Akaike’s information criterion.
has been seen, to some extent, with their interaction effects 1979) with 30, 50, and 100 factors, which explained 54.62, 66.71, and 81.75%, respectively, of the variation in the design. withB(brand prominence) andS(size of ad). However, there
are many additional interaction terms found in Table 4. For Using these procedures, we forecast the (transformed) read-ership scores and calculate (1) the sample correlation between example, forG2(low involvement and transformational)
prod-ucts, having more than 50 words in the copy (X24) appears these and the (transformed) actual readership scores, (2) the mean absolute error, and (3) the mean squared error. These are to provide a decline in read-most scores, with a posterior
probability of 0.096 of having a non-zero slope coefficient. found in Table 5 and, by all measures, the Bayesian approach results in the most accurate forecasts. While these results Table 4 provides a large number of other such interaction
effects between product categories and both ad attribute and demonstrate the advantage of using the Bayesian approach, this is, of course, specific to this single dataset. For comprehen-ad message variables.
sive simulations on the reliability of such a semiparametric regression approach, see Smith and Kohn (1996, 1997).
Model Validation
To demonstrate the effectiveness of our estimation
methodol-Managerial Implications
ogy, we compare its predictive performance to some other
alternative estimation procedures. The prediction is for the The methodology provides a way in which many of the design and budgetary issues facing prospective advertisers of this 62 ads in the hold-out sample of the subsequent March 1993
issue of the magazine and is undertaken for all three readership Australian women’s magazine can be addressed. For example, the estimated effect of the position of the ad in an issue (P) scores. Throughout, we use the transformed dependent
vari-ables (as selected by our procedure) to enable a fair compari- on its noted score, found in Figure 3 (a.1), suggests that an advertiser who is interested in attaining high levels of casual son of the approaches.
The first alternative is an ordinarily least squares regression exposure may be prepared to pay a premium to place the ad in the front 10% of the magazine. However, the nonlinearity based on the 260 linearly independent regressors in our
sam-ple, which is included simply as a benchmark. The second is of the relationship also indicates that locating the ad 40% into the magazine is also an advantageous position. If a more in-a forwin-ard selection stepwise regression procedure coupled
with Akaike’s information criteria, or AIC, (Akaike, 1978). depth exposure is required, the estimated effect of the position variable on read-most scores found in Figure 3 (c.1) indicates Here, the model uncovered during the forward selection
pro-cedure with the maximum AIC value was selected as our that it is not worth any extra cost to position the ad at any particular location in the magazine.
regression model and least squares used to estimate the
regres-sion coefficients. The last alternative uses factor analysis on Identifying such nonlinearities is important in identifying changing marginal relationships between ad characteristics the 260 linearly independent regressors in our sample to
reduce the dimensionality of the problem and then uses least and readership scores. For example, consider an advertiser who is faced with a decision on how prominent to make the squares to estimate the coefficients for these factors. We used
average, making it “impossible to miss” (B 5 4) results in provide an insight into the form of the relationship between variables such as position of the ad in an issue, size of the ad higher noted and associated scores than “not present” (B5
0). However, the nonlinear relationship suggests that placing and brand prominence, and the readership scores. They also identify product-specific effects through the use of interaction the brand somewhat in the middle of these extremes, as “easy
to miss” (B52), is no different than omitting it completely. terms based on a categorization suggested by Rossiter and Percy (1987). As such, they have implications for the design It does not result in half the level of noted and associated
exposure as would be suggested by a model that imposed a of ads by potential advertisers in this magazine. More gener-ally, with the growth of complex databases in marketing, we linear relationship.
The ability of the methodology to incorporate a large num- feel that our technique provides the researcher with a powerful modeling tool that is practical, easy to implement, and can ber of product type interaction variables into a regression
framework with a relatively low sample size also proves impor- be applied to complex datasets containing a large number of variables, such as Starch print advertising data.
tant. It enables a more detailed modeling of the complex interaction effects and provides a measurement of the
relation-The authors would like to thank John Rossiter for helping them with the
ships between product type, ad characteristics, and exposure.
classification in the section titled “Variables Describing the Product Category”
For example, whereas Figure 3 (a.2) confirms that larger ads
and for permission to use the Starch data. Michael Smith was supported by
obtain higher casual attention, it is particularly important
an Australian Postgraduate Research Award and a Monash Faculty grant. The
for products that are categorized as high involvement and work of Sharat Mathur and Robert Kohn was partially supported by an transformational. Moreover, Figure 3 (c.2) identifies that Australian Research Council Grant. We are grateful to Professors Gary Lilien, David Midgley, John Roberts, and John Rossiter for comments on a previous
higher in-depth exposure (as measured by the read-most
version of the article, as well as three anonymous referees whose comments
score) can be obtained by larger ads when the product is
helped improve the article.
transformational, but not when it is informational.
Advertising managers also can obtain predictive exposure
ratings for proposed ads, given decisions on design and mes-
References
sage factors. This is viable because the methodology allows a Akaike, H.: A Bayesian Aanalysis of the Minimum AIC Procedure. reliable estimation of the collective impact of ad characteris- Annals Institute Statistical Mathematics30 (1978): 9–14. tics. Such reliability is reflected in the relative improvement Box, George, and Cox, David: An Analysis of Transformations Revis-in predictive ability found for the ads Revis-in the March 1993 issue. ited, Rebutted.Journal of the American Statistical Association 77
(1982): 209–252.
Ad design then can be tailored to maximize this predicted
exposure, given product type and budgetary constraints. Box, George, and Tiao, George:Bayesian Inference in Statistical Analy-sis, John Wiley and Sons, New York. 1973.
Casella, George, and George, Edward: Explaining the Gibbs Sampler.
Summary and Conclusion
The American Statistician46 (1992): 167–174.
This article provides a new modeling approach to regression
Curry, David J.:The New Marketing Research Systems: How to Use that allows the data to determine the functional form of the
Strategic Database Information for Making Better Marketing Deci-independent variables and to identify the significant indepen- sions, John Wiley and Sons, New York. 1993.
dent variables, as well as an appropriate transformation of the Diamond, Daniel S.: A Quantitative Approach to Magazine Advertise-dependent variable. It can handle regressors that lead to a ment Format Selection.Journal of Marketing Research5 (1968): regression matrix that is collinear (such as in the Starch point 376–386.
advertising dataset examined here) because it explicitly ac- Edell, Julia A., and Staelin, Richard: The Information Processing of
Pictures in Print Advertising.Journal of Consumer Research 10
counts for the possibility that many of the regressors are
(June 1983): 45–61.
redundant. It reduces the subjective requirements of
tradi-Eubank, Randolf L.:Spline Smoothing and Nonparametric Regression,
tional techniques in prespecifying functional forms and
trans-Marcel Dekker, New York. 1988.
formations of the dependent variable. Whereas an
uninforma-Finn, Adam: Print Ad Recognition Readership Scores: An Information
tive prior is used in this analysis, if a researcher has a strong
Processing Perspective.Journal of Marketing ResearchXXV (May
prior information on which regressors are important, or which
1988): 168–177.
data transformation is most likely, then it can be easily
incor-Frankel, Lester R., and Solov, Bernard M.: Does Recall of an
Advertise-porated into the analysis.
ment Depend upon Its Position in the Magazine?Journal of Adver-These features of the methodology prove useful in our tising Research2 (December 1963): 28–32.
analysis of a dataset from an Australian monthly magazine for
Gardner, Burleig B., and Cohen, Yehudi A.: ROP Color and Its Effect
women. They enable the development of an integrative model on Newspaper Advertising.Journal of Marketing Research1 (May that can capture the complex interrelations between ad charac- 1964): 68–70.
teristics and exposure measures that exist in such data. All Gelfand, Alan E., and Smith, Adrian F. M.: Sampling-Based Ap-features of the model are estimated simultaneously, the relia- proaches to Calculating Marginal Densities.Journal of the American
Statistical Association85 (1990): 398–409.
bility of which is reflected in its improved predictive
Gibbs Sampling.Journal of the American Statistical Association88 X10ad content relates to “actual product or package” (0,1)
(1993): 881–889. X
11ad content relates to “actual product or package” (0,1)
Hanssens, Dominique M., and Weitz, Berton A.: The Effectiveness X
12number of illustrations (0,1), 0-multi-illustrations, 15
of Industrial Print Advertisements across Product Categories.Jour- single main illustration nal of Marketing Research17 (1980): 294–306.
X13bleed (unbordered) (0,1)
Hardle, W.:Applied Nonparametric Regression, Econometric Society
X14headline length (0,1), 05 ,8 words
Monographs, Cambridge University Press, Oxford, UK. 1990.
X15large headline (0,1), 15 .1.3 cm high
Kass, Robert E., and Raftery, Adrian E.: Bayes Factors.Journal of the
X16small headline (0,1), 15 ,1.3 cm high American Statistical Association90 (1995): 773–795.
X17headline above main illustration (0,1)
Mardia, Kantilal, Kent, J., and Bibby, John: Multivariate Analysis,
X18headline beside main illustration (0,1)
Academic Press, London. 1979.
X19headline below main illustration? (0,1)
Mitchell, T. J., and Beauchamp, J. J.: Bayesian Variable Selection in
X20headline color (0–2), o5no color, 15 partial color,
Linear Regression.Journal of the American Statistical Association
83 (1988): 1023–1036. 25all color
Rossiter, John R., and Percy, Larry:Advertising and Promotion Manage- X21headline type (0–2), o5all reverse, 15partial reverse, ment, McGraw-Hill, New York. 1987. 25all straight
Smith, Michael, and Kohn, Robert: Nonparametric Regression Using X22headline case (0–2), 05all lower, 15partial upper,
Bayesian Variable Selection.Journal of Eocnometrics 75 (1996): 25all upper 317–344.
X23copy area (0,1), 15 .1/3 area
Smith, Michael, and Kohn, Robert: A Bayesian Approach to
Nonpara-X24number of words in copy (0,1), 05 ,50 words
metric Bivariate Regression.Journal of the American Statistical
Asso-X25opposite page is mainly other advert (0,1) ciation92 (1997): 1522–1535.
X26opposite page is mainly related educational (0,1)
Smith, Michael, Sheather, Simon, and Kohn, Robert: Finite Sample
X27opposite page is mainly related editorial (0,1)
Performance of Robust Bayesian Regression.Journal of
Computa-tional Statistics11 (1996): 317–343. X28opposite page is mainly unrelated editorial (0,1)
X29opposite page is mainly part of the same multipage ad
Soley, Lawrence C., and Reid, Leonard N.: Industrial Ad Readership
as a Function of Headline Type.Journal of Advertising12 (1983): (0,1)
34–38. X
30advertorial (0,1)
Starch, Daniel:Measuring Advertising Readership and Results, McGraw- X31coupon (0,1)
Hill, New York. 1966. Pposition in issue, continuous on [0,1], 05front cover, Trodahl, Verling C., and Jones, Robert L.: Predictors of Newspaper 15back cover
Advertising Readership.Journal of Advertising Research5 (1965):
I insert (0,1)
23–27.
Ssize of ad (0–3), 05 less than full page, 15full page,
Twedt, Dik W.: A Multiple Factor Analysis of Advertising Readership.
25double page, 35multiple pages Journal of Applied Psychology36 (1952): 207–215.
Bbrand prominence (0–4), 05not present, 45impossible to miss
Appendix: Description of the Data
R recipe (0,1)M issue number (1–13), 15 March 1992, 135 March There were 1,005 observations. The first 943, March 1992 to
1993 February 1993, formed the calibration sample. The remaining
advertisements, numbers 944–1,005 from March 1993, were
Product-Type Independent Variables.
used for model validation. The following is a list of the
inde-All the variables take the values 0 and 1 only, where 15true pendent variables.
and 05 false. The category to which each product belongs is provided in brackets. The product categories areG1 (low
Advertisement Attribute Variables.
involvement and informational), G2 (low involvement and
X1 color (0–2), 05monotone, 1 52-color, 25 4-color transformational), G
3 (high involvement and informational)
X2 left-hand position (0,1) andG
4(high involvement and transformational).
X3 right-hand position (0,1)
T1(G4) women’s apparel/accessories
X4size of issue, (0–2) 05296 pages or less, 15300 to
T4(G1) toiletries and health 320 pages, 25 more than 320 pages
T7(G1) household goods
X5 square finish in main illustration (0,1)
T10(G4) men’s apparel/accessories
X6 silhouette or other shape in main illustration (0,1)
T13(G2) books and magazines
X7 photo used (0,1)
T16(G1) hair
X8 size of photo (0,1), 05 ,1/2 of space
T19(G3) government
X9 ad content relates to “end result of using or eating the
T2(G2) food teristics: X[1]
1 person(s) 181; X[1]2 baby(ies); X[1]3 child(ren);
T5(G3) household appliances X[1]
4 animal(s);X[1]5 women’s fashions;X[1]6 furnishing(s);X[1]7 a
T8(G4) building and decorating premium offer;X[1]
8 the product(s).
T11(G4) children’s wear
2. HEADLINE APPEAL. Again, each ad can have only one
head-T14(G2) drink
line but may have any of the following appeals:X[2]
1 a promise;
T17(G4) jewelry
T20(G4) craft X[2]
2 new feature of the product;X[2]3 a question;X[2]4 an
exclama-T23(G3) finance tion;X[2]5 news item; X[2]6 prize(s) in a contest;X[2]7 price; X[2]8
T3(G3) pharmaceutical new product.
T6(G4) household furnishings
T9(G4) travel and holidays 3. PREDOMINANT APPEAL. This can have any of the following
T12(G4) motor vehicles characteristics; X[3]
1 social status; X[3]2 increased pleasure; X[3]3
T15(G2) cosmetics/beauty personal security; X[3]
4 health or hygiene; X[3]5 knowledge or
T18(G2) records quality;X[3]6 increased income/capital gain/savings.
T21(G1) baby
4. PREDOMINANT COLOR OF ADVERTISEMENT. The registered
Add Message Dummy Variables
colors are: X[4]1 red; X[4]2 blue; X[4]3 green; X[4]4 grey, black, or monotone;X[4]
5 pastel shades, no predominant color;X[4]6 yel-1. PREDOMINANT FEATURE. Each ad can have only one