Directory UMM :Data Elmu:jurnal:J-a:Journal Of Business Research:Vol49.Issue3.2000:

(1)

and Application to Print Advertising Data

Michael Smith

AUSTRALIAN GRADUATESCHOOL OFMANAGEMENT, UNIVERSITY OFNEWSOUTHWALES

Robert Kohn

AUSTRALIAN GRADUATESCHOOL OFMANAGEMENT, UNIVERSITY OFNEWSOUTHWALES

Sharat K. Mathur

BOOZ, ALLEN ANDHAMILTON, CHICAGO, IL, USA

A new regression-based approach is proposed for modeling marketing the analysis, the functional form of the effect of each indepen-dent variable, and the appropriate transformation for the de-databases. The approach is Bayesian and provides a number of significant

pendent variable. improvements over current methods. Independent variables can enter into

This article introduces a highly flexible Bayesian regression the model in either a parametric or nonparametric manner, significant

approach to the marketing literature that addresses such issues variables can be identified from a large number of potential regressors,

and has the following advantages over existing approaches to and an appropriate transformation of the dependent variable can be

regression. automatically selected from a discrete set of prespecified candidate

trans-formations. All these features are estimated simultaneously and

automati-1. Independent variables can enter the model either para-cally using a Bayesian hierarchical model coupled with a Gibbs sampling

metrically or nonparametrically. By parametric, we scheme. Being Bayesian, it is straightforward to introduce subjective

infor-mean that the functional form of the independent vari-mation about the relative importance of each variable, or with regard

able in the regression is known (for example, an inde-to a suitable data transformation. The methodology is applied inde-to print

pendent variable can enter linearly or as a quadratic), advertising Starch data collected from 13 issues of an Australian monthly

whereas by nonparametric we mean that the functional magazine for women. The empirical results highlight the complex and _{form is not prescribed, but is estimated from the data.} detailed relationships that can be uncovered using the methodology. An _{This is particularly important in modeling marketing} Splus compatible package called “br” that implements the semiparametric _{data because the form of the relationship between the} approach in this article is currently available at the Statlib repository on _{dependent and any particular independent variable is} the world wide web at http://www.stat.cmu.edu/S/. J BUSN RES 2000. _{often unknown and is usually chosen subjectively.} 49.229–244. 2000 Elsevier Science Inc. All rights reserved. _{2. The procedure identifies regressors that have a high}

probability of being significant determinants of the de-pendent variable. It does so by estimating the posterior probability that each of the regression coefficients are

A

large proportion of research in marketing attempts to _{non-zero, conditioned only on the information in the} understand the factors affecting consumer behavior. _{dataset. Traditional approaches to identify significant} The number of factors is frequently large and often _{regressors include all-subsets regression and stepwise} complex and interrelated. With the growing sophistication of _{regression. However, all-subsets regression is} computa-computing tools, there is a shift from separated to integrative _{tionally impractical in datasets with a large number of} databases (Curry, 1993) that provide the capability of model- _{variables. Stepwise regression also is often unsatisfactory} ing these complex interrelationships. Researchers modeling _{because at each stage only the probability that each} such data as a regression are faced with decisions on key _{regressor is non-zero, conditional on full knowledge of} issues, such as which variables and interactions to include in _{the significance (or otherwise) of other regressors, can}

be identified. This results in a procedure that considers only a restricted number of subsets and often identifies Address correspondence to Dr. Michael Smith, Econometrics, University of

Sydney, New South Wales 2006, Australia. completely incorrect subsets (Kass and Raftery, 1995).

Journal of Business Research 49, 229–244 (2000)

(2)

Our approach uses a Bayesian hierarchical regression tions given in Gelfand and Smith (1990) and Casella and George (1992). The hierarchical model in this article was model that is estimated using a Gibbs sampling scheme,

an estimation methodology that involves a stochastic proposed in Smith and Kohn (1996) and improves and ex-tends the work on such models by Mitchell and Beauchamp search that traverses most of the likely subsets of

vari-ables. The result is a variable selection procedure that (1988) and George and McCulloch (1993). It makes it feasible to identify significant regressors from a large number of such does not suffer from the unreliability of stepwise

regres-sion, nor the computational problems of all-subsets re- variables, whereas this was not practical with previous ap-proaches. Furthermore, previous approaches did not consider gression. This is particularly useful with the large and

complex datasets that often arise in marketing. data transformations.

The features outlined above make our methodology partic-3. The Bayesian hierarchical model we use enables

consid-eration of datasets where there are large numbers of ularly useful in exploratory analysis of large marketing data-sets. We illustrate this by applying it to Starch print advertising independent variables, including cases where these are

collinear. This can even include cases where there are data from an Australian monthly magazine for women. Such data contain measurements of advertisement (ad) readership, more regressors than observations, but many of which

are redundant. along with a large number of content and product category variables. One feature of this data is that there is likely to be 4. An appropriate transformation of the independent

vari-able is selected from a discrete number of candidate substantial interaction between these two groups of variables, though exactly which variables is unknown. This gives rise transformations. Traditional regression approaches

usu-ally assume that an appropriate transformation of the to a dataset with a large number of main effects and interaction terms (in our case over 200) from which those of key impor-dependent variable is known before model estimation

takes place. tance require identification. In addition to such interactions, there are regressors (such as position of the ad in the magazine 5. The functional forms of those variables that are modeled

nonparametrically, the effect of the other independent issue) that are unlikely to be related to the readership scores in a linear manner, but where the true functional form is variables, their probability of significance, and the

choice of the transformation of the dependent variable unknown a priori. Finally, the readership scores themselves are far from normally distributed, being bounded between are estimated simultaneously. It is crucial that these

aspects of the regression model are not estimated condi- zero and one and also skewed, and require an appropriate data transformation. To estimate such a regression model by tionally upon each other. For example, if an

inappropri-ate transformation is chosen, then the wrong variables using any existing alternative methodology would require the researcher to subjectively make the decisions about data trans-may be identified as significant and their effects

incor-rectly estimated. Conversely, if the wrong variables are formations and functional form before conducting variable selection by using techniques such as forward or backward identified, then there will be so much variability in the

regression estimates that it may be difficult to choose the selection.

The article is organized as follows. The second section appropriate data transformation. Current approaches to

regression either select the variables, given the data describes the data and the proposed model and relates them to the current literature on print advertising. The third section transformation, or choose a data transformation from a

family of transformations but do not select variables at describes the Bayesian approach to semiparametric regression. It sets out the Bayesian hierarchical model, discusses the prior the same time. To the best of our knowledge, there

are currently no other approaches that simultaneously assumptions, and briefly discusses estimation by the Gibbs sampler. However, for details on the implementation and identify significant variables, fit any required regressors

nonparametrically, and select a suitable transformation further discussion of the sampling scheme, see Smith and Kohn (1996, 1997). The fourth section reports the results of for the data.

6. Because this approach is Bayesian, it allows the re- the application to the Australian print advertising data, whereas the fifth section summarizes the implications of the searcher (and the manager) to introduce into the

analy-sis prior subjective information on the relative impor- methodology for marketing researchers. The appendix lists the variables used in the study.

tance of each independent variable. This is in addition to the information in the data and enables the modeler

to incorporate “soft” data. It also permits the user to

_{Modeling the Starch Print}

determine the sensitivity of the estimates to various

if-Advertising Data

then scenarios.

Introduction

The Bayesian analysis is conducted by using a Bayesian

hierarchical regression model that is estimated using a Gibbs Creating ads that attract consumer attention to their products is one of the major tasks facing companies. It is particularly sampling scheme. The Gibbs sampler is an estimation

(3)

affecting readership of print ads (the largest category of ads) The three attention-relevant scores are “noted” scores (indicat-ing the proportion of respondents who claim to recognize the because the advertiser has much less control over the

consum-er’s response than in a highly emotive medium like television ad as having been seen by them in that issue), “associated” scores (indicating the proportion of respondents who claim (Rossiter and Percy, 1987). This knowledge will help

advertis-ers in their creative and media strategies. to have noticed the advertiser’s brand or company name or logo), and “read-most” scores (indicating the proportion of Researchers have tried to determine what aspects of print

advertising affect readership for over three decades. Previous respondents who claim to have read half or more of the copy). These have the structure where a noted score is at least as research investigating print advertising can be classified into

two groups. The first investigated the influence of specific large as the corresponding associated score for an ad, which, in turn, is at least as large as the read-most scores. Therefore, aspects of print ads, such as the presence of absence of pictures

(Edell and Staelin, 1983), position in the magazine (Frankel we normalized these so thaty(1)₅ _{(raw noted),} _y(2) ₅_(raw associated/raw noted) andy(3)₅_{(raw read-most/raw} associ-and Solov, 1963), type of headline (Soley associ-and Reid, 1983),

the presence and use of color (Gardner and Cohen, 1964) ated), and we use these measures as our dependent variables. and size (Starch, 1966; Trodahl and Jones, 1965). Most of The data used in the study are for the 13 months between this research used controlled experiments, with a few variables March 1992 and March 1993. During this period, 1,030 ads investigated at any one time. appeared in the magazine, but 25 were dropped because of The second group of research is more integrative and exam- missing observations on some of the independent variables. ines the collective impact of a number of ad characteristics on The data from the first 12 issues are used to fit the model measures of ad effectiveness (Twedt, 1952; Diamond, 1968; (n5943 observations), whereas the data from the thirteenth Hanssens and Weitz, 1980). These studies typically use some issue (62 observations) are used as a hold-out sample. The measure of recall or recognition as the dependent variable, independent variables in the data describe the following four with a number of ad characteristics (both mechanical and aspects of the ad: (1) the position of the ad in the magazine, content related) as the independent variables. including inserts, (2) the major features of the ad, (3) the Whereas previous research added to our understanding of product category, and (4) the message in the ad. How each the key drivers of readership, it had two important method- of these was incorporated into the model is described below. ological limitations. First, all the models in the literature

as-sumed that the functional relationship relating the dependent

_{Variables Relating to the Position of the Ad}

variable and the independent variables was known (except

Previous research has found the page number of the ad in for a few parameters that were estimated from the data). Most

an issue to be an important factor influencing readership of the research used linear or log-linear regression models

(Diamond, 1968). The Starch data include the page number (Twedt, 1952; Diamond, 1968) or a fixed nonlinear

relation-of the ad and the total number relation-of pages in all our issues. This ship (Hanssens and Weitz, 1980). In practice, the form of the

was used to construct the continuous variable relationship for some of the regressors (such as position of

the advertisement in an issue) is likely to be nonlinear but

P5 page number

number of pages in issue, difficult to predetermine and needs to be estimated from the

data. Second, few of the articles explicitly modeled the

interac-which takes values between 0 and 1 and is defined as the tion between the ad content and product category variables,

relative position of the ad in an issue. The front and inside-which would result in a large number of interaction terms,

front cover are coded as the first two pages, whereas the but used a multiplicative model to implicitly capture these

inside-back and the back cover are the last two pages. Figure interactions (Hanssens and Weitz, 1980).

1 plots the raw noted score against the position variable, With the use of Bayesian semiparametric regression, such

indicating a distinctly nonlinear relationship. limitations no longer apply, with unknown functional

rela-The usual approach to modeling such a nonlinear response tionships being modeled nonparametrically and the large

is to assume a specific functional form, for example, a linear number of interaction and main effects (even if collinear)

or a quadratic function. Such an approach is called parametric, entered into the regression and the key determinants

identi-as the functional form is specified a priori. This article takes fied. As such, the Starch data form an interesting

demonstra-tion of the Bayesian methodology. To give our results face a nonparametric approach instead, in which the shape of a response functionfis estimated from the data. We do so by validity, we compare the procedure’s predictive ability with

some common alternatives. using a cubic regression spline, which is a piecewise cubic polynomial between m so-called “knots” that partition the The data investigated in this article are collected for all

advertisements in each issue of a leading Australian monthly domain of the independent variable (in this case Pe[0,1]) into m 1 1 subintervals. If the location of the m knots is magazine for women. They consist of three attention-relevant

(4)

Figure 1. The scatter plot is of the noted scoresy(1)_{against position in}

is-sue P. The bold line is the Bayesian regression fit to the data, the dotted line is from a local regression with a low smoothing parameter value, and the dashed line is from a local regres-sion with a higher value for the smoothing parameter. No smoothing parameter value would enable a local regression to simultaneously capture the curvature of the relationship, while remaining smooth.

“local bias” (that is, where important features offare missed)

f(P)5b01 b1P1 b2P21 b3P31

o

m

i51

b31i(P2 pi)31

because knots will not be located in all the appropriate posi-tions to capture variaposi-tions in the nonlinear relaposi-tionship. How-where (x)3

15max(0,x)3. [1]

ever, if too many knots are used and the parameters are estimated using least squares, thefwill have high “local vari-It can be readily shown that such an approximation for the

unknown response functionfis not only continuous, but has ance”(that is, the estimate will not be smooth). Our solution is to introduce a lot of knots (m520), but instead of ordinary continuous first and second derivatives. Such regression splines

are used frequently in practice because they form a linear model least squares we use a hierarchical Bayesian model (as dis-cussed in Estimation Methodology below) that explicitly ac-where thebis are regression parameters and the terms {P,P2,

P3_{, (}_P₂ _p

1)1, . . . (P2 pm)1} are terms in a regression. counts for the possibility that many of them are redundant. This results in an estimation procedure that provides smooth, An important issue with regression splines concernsm, the

(5)

Alternative nonparametric regression methodologies exist which is obtained by substituting in a regression spline of the type outlined at equation [1.1] withm520 knots. When an including the popular local regression and smoothing splines,

for which general references are Hardle (1990) and Eubank ad is an insert, equation [1.2] is equal to an intercepta; when an ad is within a magazine, it is equal tof(P).

(1988), respectively. However, the methodology presented here has several distinct advantages that lead us to prefer it

in the empirical analysis of our Australian Starch data. First,

Variables Describing the Major Features

it is locally adaptive in the sense that it allows the curvature

_{of the Ad}

of the regression function to change across its domain. Both _{Previous research investigating the impact of major features} local regression and spline smoothing work best when the _{of the ad on magazine readership found significant effects for} curvature of the regression function does not vary greatly _{size of the ad (Trodahl and Jones, 1965; Diamond, 1968),} across the whole domain. _{presence of pictures (Edell and Staelin, 1983), and type of} This property is important in identifying the effect of the _{headline (Soley and Reid, 1983). There are 33 independent} position variable on readership scores. To demonstrate this, _{variables in the data that describe the major features of the} we estimated the univariate nonparametric regression model _{ad, such as color, size of ad, presence or absence of bleed,}

headline type, etc. The appendix lists these, including

vari-y(1)₅_f

uni(P)1 e where ez iidN(0,s2).

ables X1, . . ., X31that are mostly binary and therefore enter Figure 1 plots the results of the estimation with the bold line _{the regression linearly as}

_o

31

i51biXi. The other two variables being the estimatefˆunifrom the Bayesian approach discussed _{are the size of the ad (}_S_{) and its brand prominence (}_B_{), which} above.1_{This captures the profile of the effect of the position}

we model nonparametrically. Each assumes a discrete, but variable on noted scores, while remaining smooth. The dotted _{small, number of levels (0–3 and 0–4, respectively), and it is} line denotes the estimate from a local regression with a _{inappropriate for such variables to be modeled using a cubic} smoothing parameter set so that the peak in the noted scores _{regression spline (Smith, Sheather and Kohn, 1996). Instead,} for the last tenth of the issue could be caught. However, an _{we use a dummy variable basis for each, so that}

unavoidable by-product of this is that the rest of the curve is

nonsmooth. The dashed line corresponds to a local regression _g₍_S₎₅

_o

3 i51

ciIi(S) and h(B)5

o

4

i51

diIi(B), where the smoothing parameter is set larger so that the

esti-mate is smooth, but key features of the curve (such as the

where Ii(x)5

5

1 ifx5 i

0 otherwise . exposure of the middle fifth and last tenth of magazine) are

bleached out.2_{A further discussion of the comparison of local}

Here, g and h measure deviation from the base exposure regression, Bayesian regression and other approaches can be

attributable to an ad withS50 (size less than a full page) and found in Smith and Kohn (1996).

B50 (brand not present). The termscianddiare regression The second advantage of our approach is that there are

coefficients that we estimate using the Bayesian hierarchical no alternative nonparametric regression approaches that can

model. This ensures g and h measure significant deviations extend to the large integrative regression models of the type

from the base case ofS50 andB50, as opposed to those developed here.

that would be obtained using least squares. Therefore, the In addition to ads within the magazine, each issue had a

model for the main effects of the major features of the ad is few ads that were inserts and thus had no page number. To

[Eq. (3)]: deal with these, we created a dummy variableIfor the presence

(I 5 1) or absence (I 5 0) of inserts. In our integrative regression model, the combined effect of position in the

maga-zine and inserts is [Eq. (2)]:

1

o

It is likely that consumers have different ad readership scores

1b23(I2 1)(P2p20)31, [2]

for different product categories. Hanssens and Weitz (1980) addressed this issue by classifying products into three groups—routine, unique, and important products—and

mod-1_{This was produced using the Splus compatible package “br” available from}

the Statlib repository. _{eled each group separately. In contrast, our model allows the}

2_{The local regression was undertaken using the default local regression}

simultaneous estimation of product effects by introducing

estimator “loess” in Splus with the smoothing parameter set to 0.18 for the

specific interactions between the product type variables and

first fit and 0.75 for the second. No single smoothing parameter for such an

ad content variables. This allows the investigation of

product-estimator would enable an estimate that is both smooth and captures the

(6)

There are 23 product types T1, . . ., T23, recorded in our color (i5 4) of the ad. They enter into the model as main effects and the following terms included [Eq. (5)]:

Starch data, which are described in the appendix. These vari-ables are binary and enter the model as main effects but are

o

8 thought to be mainly of significance as linear interactions with

budgetary and design variables. To explicitly model these interactions, we grouped the product types into four categories

Issue Number

based on the criteria in Rossiter and Percy (1987, pp. 166–

174)—the type of decision and the type of motivation. Rossiter The data consists of ads published in the 13 consecutive and Percy (1987) categorize decisions into two broad types— monthly issues from March 1992 to March 1993. The first low involvement decisions that have a low level of perceived 12 issues (M51, . . ., 12) were used as a calibration sample, risk (both economic and psychosocial) for the consumer and and the March 1993 issue was used as a hold-out sample for high involvement decisions that have a high level of perceived model validation. It seems unreasonable to attempt to estimate risk. Similarly, there are two types of motivations for product any seasonal or trend components because the data used to fit the model span only a single year. Nevertheless, it is impor-purchases—informational (which reduce the negative

motiva-tant to control for issue specific effects. Consequently, we tion associated with the purchase) and transformational

modeled the issue effect nonparametrically with a dummy (which increase the positive motivation associated with a

pur-variable basis, so that [Eq. (6)] chase). We call the four categoriesG1, . . .,G4and define them

as follows:G1, low involvement and informational; G2, low

l(M)5

o

12

i52

riIi(M). [6]

involvement and transformational;G3, high involvement and informational; G4, high involvement and transformational.

The functionlmeasures any significant deviations in reader-Each product typeTiis classified as belonging to one of the

ship scores for the April 1992 to February 1993 issues from four categories, with the classification given in the appendix.

that of March 1992 and controls for any potential monthly For example, T1 is women’s apparel and accessories and is

effects on the training sample. Here,ridenote regression coeffi-classified as G4 (high involvement and transformational).

cients andIi(·) is as defined as before. There are 136 linear interactions ofG1, . . .,G4 with the ad

attributesX1, . . .,X31,S,B, andI, which we model as simple

Transforming the Dependent Variables

multiplicative interactions in equation [1.4] below. We use

There are three measures of ad readership in the Starch data: the Rossiter and Percy (1987) categories to illustrate the

meth-noted, associated, and read-most scores, each of which is a odology, but there are other ways to classify products, and

proportion. Simple histograms and density estimates demon-the results may vary according to demon-the classification.

strate that the raw scores are far from normally distributed The variableRfor the presence or absence of recipes was

and may require transformation to satisfy both the additivity treated as a special case, and only its interaction with G2

assumptions in a regression model and the assumptions of a (the group that is low involvement and transformational and

Gaussian error distribution. Previous researchers used trans-includes food and drink products) is included. The following

formations such as arcsine (Finn, 1988) or logarithmic (Hans-terms were included in the full model [Eq. (4)]:

sens and Weitz, 1980). These transformations typically are decided in advance and imposed on the data. We considered nine different normalized transformations for the dependent variables of the type

Tl(y)5al1bltl(y).

In the abovel 51,2, . . ., 9 indexes our nine transformations andtl(y) is what we call the “base transformation.” This can be any monotonically increasing transformation, which we normalize using two constantsalandblto get the actual data transformationTl(y). These constants can be calculated from the data for each transformation so that the dependent variable has the same median and interquartile range pre- and

post-Variables Describing the Message in the Ad

_{transformation; see Smith and Kohn (1996) for details. In}

Twenty-eight ad message binary variables X[j]

i were created effect, these constants normalize the base transformations con-from four original Starch categorical variables for the purposes sidered to make them scale and location invariant—a property of inclusion in the model. These variables are described in the that eases the qualitative interpretation of the regression esti-appendix and include predominant feature (i51), headline mates.

(7)

Table 1. Transformation Index and Base Transformations

Noted Score Associated Score Read-Most Score

l _t_l₍_y₎ _a_l _b_l _a_l _b_l _a_l _b_l

1 F21₍_y0.1₎ ₂_0.516 _0.678 ₂_0.124 _0.459 ₂_0.537 _0.692

2 F21₍_y0.25₎ ₂_0.063 _0.564 _0.165 _0.396 ₂_0.072 _0.573

3 F21₍_y0.5₎ _0.240 _0.478 _0.368 _0.346 _0.237 _0.483

4 F21₍_y0.75₎ _0.398 _0.428 _0.478 _0.316 _0.397 _0.431

5 F21₍_y₎ _0.500 _0.394 _0.552 _0.295 _0.500 _0.395

6 F21₍_y1.5₎ _0.631 _0.347 _0.649 _0.266 _0.630 _0.347

7 F21₍_y2₎ _0.713 _0.315 _0.712 _0.246 _0.712 _0.315

8 log(11y) 0.819 0.625 0.892 0.873 0.798 0.584

9 sin21₍_y0.5₎ ₂_0.278 _0.991 ₂_0.117 _0.820 ₂_0.281 _0.994

The first column gives the transformation index, whereas the second the base transformations. The remaining columns give the constants required to make the normalized transformation scale and location invariant. These constants are provided for each of the three dependent variables considered in our dataset. The last two transformations were proposed by Finn (1988) and Hanssens and Weitz (1980).

which include those of Finn (1988) and Hanssens and Weitz are severely limited. All-subset regression requires 2p regres-(1980). The other seven are of the formF21₍_yQ_{), which are} _{sions to be undertaken—something that is computationally} probit transformations with various values for the skewing infeasible. The stepwise procedure, while feasible, traverses only parameter Q. It is important to note that the methodology a small number of subsets, which often leads to the situation allows a data-driven selection of the most appropriate normal- that forward and backward stepwise procedures result in sub-ized transformation from those listed in Table 1. A user could stantially different estimates (Kass and Raftery, 1995). well select any finite number of transformations considered All these problems lead us to use a Bayesian hierarchical likely and use the procedure to select between these. As imple- model, coupled with a Gibbs sampler, to estimate this large mented, our methodology may lead to different transforma- regression model. This approach can handle a large number tions for the noted, associated and read-most scores. of variables (p 5 262 in this case), even when collinearity exists, without being hindered by the problem that there are only about four times as many observations as coefficients to

Estimation Methodology

_{be estimated. It can do so because it reduces the effective} dimension of the problem by explicitly modeling the

possibil-Introduction

ity that many of the terms in the design matrix may be redun-All the terms constructed in the previous section listed at

dant. Using the Gibbs sampler to undertake the computations equations [2] to [6] can be collected together to form a large

results in the redundant variables being identified reliably. design matrixX with p 5 262 terms for each of the three

This is because the search over the distribution of important regressions [Eq. (7)]

subsets is stochastic, rather than deterministic as in stepwise

Tl(y(j))5Xb 1 e forj5 1, 2, 3. [7] _{regression, and it traverses many more likely combinations of}

regressors than the stepwise regression approach. Further-Here, we consider a regression for each of the three

trans-more, all other approaches require prespecification of the formed readership scores; so thatTl(y(1)),Tl(y(2)), andTl(y(3))

data transformation, whereas the Bayesian methodology can aren-vectors of the observations of the noted, associated and

simultaneously select the appropriate data transformation and read-most scores, respectively, transformed according to the

identify significant variables. transformationTl. The vectorbis made up of the regression

coefficients introduced earlier (thea,bis,cis,dis,ris, and various

The Bayesian Hierarchical Model

bs) whereas the vectoreis made up of errors distributed iid

N(0,s2_).

This section briefly describes the Bayesian hierarchical model, Estimating such a regression by using least squares presents with a full exposition being found in Smith and Kohn (1996). a number of problems. First, there is often collinearity between _{The key idea behind such a model is that we explicitly model} the regressors. Second, even when collinearities are elimi- _{the possibility that terms in the regression at (6) may be} nated, variability in the regression coefficient estimates would _{superfluous and can be omitted from the regression. To do} be largely due to a lack of degrees of freedom. Third, the _{this, we use the vector} _{g 5} ₍_g₁_, _g₂_{, . . .,} _g_p₎₉ _{of indicator} estimates for the nonparametric function componentsf,g,h _{variables, where each element is defined as}

andlwould be nonsmooth.

Commonly used approaches for tackling this problem are _g i5

5

(8)

That is,gi is a binary variable that indicates whether or not in the model, the sample enables Monte Carlo estimation of the posterior probability Pr(l 5i|data).

bi is zero and therefore whether or not the corresponding

From this distribution, we can pick the single most likely independent variable enters the regression. Therefore, the

vec-transformation and label it the “mode” vec-transformation lˆM. torg explicitly parametrizes the subsets of regressors in the

Following Box and Cox (1982), we estimate the remaining linear regression at (6). The regression can be rewritten,

condi-features of the model conditional on this best transformation. tional on this “subset” parameter, as

The sampling scheme is run again, and another Monte Carlo

Tl(y(j))5Xgbg1 e forj51, 2, 3. _{sample for the subset parameter} g _{is obtained. Using this,} estimates of the expected value ofbgiven the data (called the Here,bg are the regressors that are non-zero, giveng, and

posterior mean), smoothing over the distribution ofg, can be Xgis a design matrix made up of the corresponding columns _{obtained; that is, we estimate E(}_b_|_{l 5 l}_ˆ

M, data). Estimates of X.

of the probability each term in the regression is non-zero (and Because this is a Bayesian methodology, it is necessary to

therefore significant), given the data, can also be calculated; place priors distributions on the unknown parametersg, l,

that is, we estimate Pr(gi51|l 5 lˆM, data). Our

implementa-b, and s2_{. Ideally, such priors are “uninformative,” unless}

tion followed that outlined in Smith and Kohn (1996), but there is some real information on a parameter or group of _{where our sampling scheme was run for 10,000 iterations for} parameters (for example, from previous research). The priors _{convergence and 20,000 iterations to obtain the Monte Carlo} we use here are listed below. _{sample of}_g_{. This is a conservative sampling length and took}

approximately four hours on a standard DEC UNIX worksta-1. The prior distribution ofbgconditional ong, s2 and

tion. Similar results were obtained with sampling runs of

one-lisN(0,ns2_(X₉

gXg)21). The variance matrix in this prior

tenth the length and took under one hour to run. is simply that of the least squares estimate ofbgblown

This produces estimates of the transformation and the re-up by a factor ofnand therefore is relatively

uninforma-gression parameters that are robust to the fact that it is un-tive. A fully uninformative prior forbgcannot be used as

known a priori which terms actually are in the regression. it results in what is commonly called Lindley’s paradox

This is in contrast to least squares regression which estimates (Mitchell and Beauchamp, 1988).

the regression coefficients given some particular known sub-2. The prior density fors2 _given_g _{is proportional to 1/}

set, or value of,g, as well as a prespecified data transformation.

s2_{. This is a commonly used prior for} _s2 _{and means} that logs2_{has a flat prior or uninformative prior (Box}

and Tiao, 1973).

_{Empirical Estimates and}

3. We assume that givenl, thegiare a priori independent

Methodological Comparison

withPr(gi50)5 pi,i51, . . ., p.The probabilities

piare specified by the user. In this article, we takepi5

Introduction

1/2 for all i, which can be considered uninformative. _{We applied our estimation methodology to the integrative} However, our methodology allows for a user to impose a _{regression models for the Starch scores noted (}_y(1)_{), associated} prior assessment of the relative importance of individual ₍_y(2)_{), and read-most (}_y(3)_{). The first 943 observations, consisting} terms in the form a nonuniform prior ong. For example, _{of the March 1992 to February 1993 issues, are used for} estima-ifX7(photo used) is considered more likely to have an _{tion. The March issue, consisting of 62 ads, is used for model} impact on ad recognition than other variables, a higher _{validation. The sections below discuss various aspects of the} value ofpicould be imposed on that variable. _{model estimates. Model Validation below discusses the} pre-4. We assume that each of the nine candidate transforma- _{dictive performance of the model on the hold-out sample} com-tions is equally likely, that is,Pr(l 5i)51/9 for i5 _{pared with the application of standard regression approaches}

1,2, . . ., 9. This is an uninformative prior forl; although _{to such a dataset, whereas Managerial Implications discuses} if a researcher wants to encorporate prior beliefs, some _{some of the managerial implications of the empirical results.} transformations can be given a higher prior probability

than others.

_{Data Transformation Estimates}

The most likely transformations, given the data, were (once

Estimation

_{normalized) sin}21₍_y0.5_),_F21₍_y1.5_{), and sin}21₍_y0.5_{) for the noted,} Smith and Kohn (1996) develop a Gibbs sampling scheme to associated, and read-most scores, respectively. The posterior estimate this Bayesian hierarchical model and show how it probabilities of all nine candidate transformations for each of can be implemented. The sampling scheme produces a Monte the three scores are given in Table 2.

(9)

Table 2. Estimated Posterior Probabilities for Each of the Nine Candidate Transformations

log(y11) 0.000 0.000 0.000 0.000 0.000 0.000

sin21₍_y0.5₎ _0.305 _0.001 _0.856 _0.190 _0.000 _0.751

The first column gives the respective base transformation (though we consider the normalized version of this, as discussed in text under Transforming the Dependent Variables). The first three columns correspond to the posterior probabilities by using the default uninformative prior given at item (4) in text under Bayesian Hierarchical Model, whereas the last three columns are for the informative prior that places more prior weight on the seven probit transformations. The posterior probabilities of the most likely transformations for each score and prior are highlighted in boldface and do not change with respect to either prior.

that the asymmetric arcsine transformation discussed by Finn (Diamond, 1968; Hanssens and Weitz, 1980), which showed a significant negative coefficient for the page number variable. (1988) is the most likely (in that it has the highest posterior

probability) of those considered here for the noted and read- However, the nonparametric regression is able to capture the more sophisticated relationship between noted scores and most score regressions. However, in the associated score

re-gression, a skewed probit transformation is the most likely, position than the fixed linear relationships imposed in prior research. After the rapid drop that occurs in the front of the given the data. The logarithmic transformation proposed by

Hanssens and Weitz (1980) is the most unlikely, with a poste- magazine, there is a resurgence in ad exposure for the middle 40% of the magazine. After this, positioning an ad 80–90% rior probability equal to zero (to three decimal places) for all

three scores in our datasets. into the magazine appears to be the worst choice, whereas the last 5% seems to be an improvement. Notice that the The posterior probabilities of the transformations seem to

be fairly insensitive to choice of the prior distribution. For estimate for the effect is slightly different than that uncovered in the univariate nonparametric regression found in Figure 1 example, we used an informative prior that ascribed double

the probability to the seven skewed probit transformations as because the current regression controls for other effects that are correlated with the position variable, as well as having a prescribed for the arcsine and logarithmic. That is, the prior

Pr(l 5i)5 1/8 fori5 1, . . ., 7 and Pr(l 58)5 Pr(l 5 transformed dependent variable.

Figure 3 (b.1) provides the estimate offfor the associated 9)51/16. The data transformations selected using this

infor-mative prior were the same as those selected using the uninfor- scoresy(2)_{. The profile of the estimate differs substantially from} that of the noted score and indicates that for high associated mative prior, although the posterior probabilities change

slightly (see Table 2). These results are reassuring because scores the worst place to advertise is 70–90% into the maga-zine, whereas the last 5% is advantageous.

they demonstrate that the transformation selection is based

on information from the data and is not “prior driven.” The insert dummy variable Iwas a significant contributor to all three scores, with Table 3 providing the insert main effects and interactions with the product categories that had

Impact of Variables Related to the Position

a greater than 35% chance of being non-zero (i.e., Pr(gi 5

of the Ad

1|data, l 5 lˆM) . 0.35). Also included in the table is the Figures 3 (a.1) (a.2) and (a.3) outline the impact ofP, the

estimated intercept of f(P) from equation [2], namely, the position of the ad in the issue, on ad recognition scores. It is

coefficientbˆoof (12I). Notice that inserts have a similar effect evident from these figures that the position of the ad is a

as advertising on the front cover of the magazine, although for determinant of noted and associated scores, but is not related

the associated and read-most scores they achieve slightly to the read-most score.

higher exposure. Figure 3 (a.1) is a plot of the regression spline estimate of

the responseffor the noted scorey(1)_{. This estimate suggests}

Impact of the Major Features of the Ad

that advertising in the front end of the magazine (the first

The size of the ad has a strong effect on the noted scores, but 15%) is immensely beneficial from the point of view of

at-the marginal gain decreases after a full-page ad (S51). This tracting casual attention to the ad leading to high noted scores.

is illustrated in Figure 3 (a.2), which plots the estimate of the However, there is a dramatic decrease in exposure as ads are

main effect,g, of ad size as the solid curve. Interestingly, for placed further into the magazine, up to approximately 20% in

(10)

transfor-Figure 2. (a–c) Normal probability plots based on the residuals obtained from each of the three regressions. If the quantiles of a normal and the observed residuals are linearly related with a slope of 45 degrees, then this indicates that the residuals follow a normal distribution.

mational” (for example, fashion apparel and accessories, is a very flat ad size main effect. However, in a similar manner to associated scores, the figure indicates transformational household furnishings, and jewelry), the size of the ad has a

much more pronounced effect. When combined with the main products react positively to increases in ad size. This is caused by positive slopes forSG2andSG4, which are non-zero with effect, it produces the relationship represented by the long

dashed line. This is caused by a value of 0.045 for the slope a probability of 0.649 and 0.765, respectively.

The relationships between the three scores and brand prom-of the interactionG4S, which is identified as having a posterior

probability of being non-zero of 0.946. inence are shown in Figure 3 (a.3, b.3, and c.3). For both the noted and associated regressions, there are strong nonlinear For the associated scores, ad size has a marginal main

effect, although some effect may occur on products that are effects, whereas there are no significant interaction effects with any of the four product categories. In particular, for associated high involvement and transformational because there is a

pos-terior probability of 0.411 thatSG4is non-zero and positive, scores it seems that brand names have little effect unless they are highly prominent. For read-most scores, brand promi-something that is reflected in the interaction plots in Figure

(11)

informa-Figure 3. Estimates of the functionsf,g,h, andlfor each of the three regressions. The rows correspond to the four functions, whereas the three columns correspond to the three regressions arising from the dependent variablesy(1)_,_y(2)_{, and}_y(3)_{, respectively. The figures in each column are plotted with the same}

range on the vertical axis, so that an idea of the comparative strength of the effects within each regression can be obtained. This range is set equal to the distance between the 85th_{percentile and 15}th_{percentile of}

the transformed dependent variable of each regression (that is,T9(y(1)),T6(y(2)), andT9(y(3)). The main effects

are in bold, whereas interactions also are plotted for the effects ofBandS.

tional products, with the slope ofBG3 being20.039 with a in the three regressions (excluding those involvingP,I,B,S, andM), which have a posterior probability greater than 0.35 of posterior probability of being non-zero of 0.863.

(12)

Table 3. Estimated Insert Effects _{magazine, irrespective of product type. For example, variables}

such as right hand position of the ad and headline color are

Term Probability Coefficient

likely to be related to noted scores for all products.

For noted

I 0.992 0.402

_{Product-Specific Effects}

(12I) 0.992 0.442

It is apparent from Table 4 that product-specific effects are

For associated

I 1.000 0.743 very important for the three models. First, there are product

(12I) 1.000 0.739 _{types that are significant main effects. For example,}_T

12(motor

For read-most

vehicles) has a negative coefficient in all three regressions and

I 1.000 0.555

a posterior probability (to three decimal places) of being

non-IG1 0.522 0.094

zero of 1.000, 0.530, and 1.000 for noted, associated, and

(12I) 1.000 0.548

read-most scores. These, and the other product main effects

The posterior mean estimates of the regression coefficients,E(bi❘l 5 lˆM, data), are in the found in Table 4, appear consistent with the positioning of

“Coefficient” column. The “Probability” column contains the posterior probabilities of

the magazine as primarily for women with a focus on domestic

these coefficients being non-zero, Pr(gi51❘l 5 lˆM, data). The results are for the terms

involving the insert dummy variable, including the noninsert intercept (12I). Only those _{and entertainment issues. Second, the product-specific effects}

coefficients that have a posterior probability.0.35 of being non-zero are reported.

manifest themselves as interactions between ad characteristics and the product categories G1,G2, G3, andG4. This already

Table 4. Coefficient Estimates and Estimated Posterior Probabilities of Being Nonzero

Description Term Prob. Coeft. Description Term Prob. Coeft.

For noted score regression

RH position X3 0.777 0.033 Cntnt is fin X10 0.724 0.02

Hdln color X20 0.838 20.015 Copy area X23 0.428 20.011

Furnishings T6 0.431 20.021 Building T8 0.998 20.211

Travel T9 0.909 20.096 Motor vehicles T12 1.000 20.157

Cosmetics T15 0.813 0.045 Jewelry T17 0.656 0.071

For associated score regression

Hdln color X1 0.992 20.018 Advertorial X30 0.472 20.023

Coupon X31 0.506 20.012 (Cntnt is fin)G1 X10G1 0.584 0.021

2 0.522 20.01 Pred col yel/org/brwn X[4]6 0.582 0.009

Pred apl income X[3]

6 0.57 0.014 Building T8 0.459 20.041

Motor vehicles T12 0.53 20.026 Cosmetics T15 0.565 0.019

Pets T22 0.777 0.047

For read-most score regression

Hdln color X20 0.65 0.011 Hdln type X21 0.467 20.006

(Num words)G1 X24G1 0.972 20.072 (Square ill)G2 X5G2 0.65 0.028

Motor vehicles T12 1.000 20.198 Jewelry T17 0.633 0.071

Government T19 0.923 0.184

This table provides the posterior means for the slope coefficients,E(bi❘l 5 lˆM, data), in the “Coeft.” column. Their respective posterior probability of being non-zero, Pr(gi51❘l 5

lˆM, data), is provided in the “Prob.” column. These are provided for all three regressions, but only for coefficients that had a posterior probability greater than 0.35 of being

non-zero. The description column provides a short description of the terms—for a full description, see the appendix. Terms that involve the variablesP,I,S,B, andMare not included

(13)

Table 5. Out of Sample Performance

Procedure

Regression Bayesian OLS Step and AIC 30-Factor 50-Factor 100-Factor

Correlation between predicted and actual values

Noted 0.764 0.706 0.743 0.727 0.695 0.620

Associated 0.511 0.507 0.488 0.464 0.331 0.347

Read-most 0.504 0.368 0.357 0.273 0.220 0.245

Mean absolute error of predicted values

Noted 0.075 0.086 0.092 0.109 0.102 0.100

Associated 0.070 0.074 0.151 0.111 0.114 0.086

Read-most 0.095 0.115 0.120 0.146 0.145 0.121

Mean squared error of predicted values

Noted 0.009 0.012 0.011 0.019 0.017 0.015

Associated 0.008 0.009 0.043 0.027 0.023 0.011

Read-most 0.015 0.021 0.023 0.036 0.032 0.022

This table contains diagnostics for the predicted readership scores in the hold-out sample (March 1993 issue) by using a variety of approaches. The six methods used are arrayed across the top of the table and are described in the text, while the results for all three readership scores are presented. The diagnostics include (1) the sample correlations between the actual readership scores and predicted scores, (2) mean absolute prediction error, and (3) mean squared prediction error. Here, OLS stands for ordinary least squares and AIC for Akaike’s information criterion.

has been seen, to some extent, with their interaction effects 1979) with 30, 50, and 100 factors, which explained 54.62, 66.71, and 81.75%, respectively, of the variation in the design. withB(brand prominence) andS(size of ad). However, there

are many additional interaction terms found in Table 4. For Using these procedures, we forecast the (transformed) read-ership scores and calculate (1) the sample correlation between example, forG2(low involvement and transformational)

prod-ucts, having more than 50 words in the copy (X24) appears these and the (transformed) actual readership scores, (2) the mean absolute error, and (3) the mean squared error. These are to provide a decline in read-most scores, with a posterior

probability of 0.096 of having a non-zero slope coefficient. found in Table 5 and, by all measures, the Bayesian approach results in the most accurate forecasts. While these results Table 4 provides a large number of other such interaction

effects between product categories and both ad attribute and demonstrate the advantage of using the Bayesian approach, this is, of course, specific to this single dataset. For comprehen-ad message variables.

sive simulations on the reliability of such a semiparametric regression approach, see Smith and Kohn (1996, 1997).

Model Validation

To demonstrate the effectiveness of our estimation

methodol-Managerial Implications

ogy, we compare its predictive performance to some other

alternative estimation procedures. The prediction is for the The methodology provides a way in which many of the design and budgetary issues facing prospective advertisers of this 62 ads in the hold-out sample of the subsequent March 1993

issue of the magazine and is undertaken for all three readership Australian women’s magazine can be addressed. For example, the estimated effect of the position of the ad in an issue (P) scores. Throughout, we use the transformed dependent

vari-ables (as selected by our procedure) to enable a fair compari- on its noted score, found in Figure 3 (a.1), suggests that an advertiser who is interested in attaining high levels of casual son of the approaches.

The first alternative is an ordinarily least squares regression exposure may be prepared to pay a premium to place the ad in the front 10% of the magazine. However, the nonlinearity based on the 260 linearly independent regressors in our

sam-ple, which is included simply as a benchmark. The second is of the relationship also indicates that locating the ad 40% into the magazine is also an advantageous position. If a more in-a forwin-ard selection stepwise regression procedure coupled

with Akaike’s information criteria, or AIC, (Akaike, 1978). depth exposure is required, the estimated effect of the position variable on read-most scores found in Figure 3 (c.1) indicates Here, the model uncovered during the forward selection

pro-cedure with the maximum AIC value was selected as our that it is not worth any extra cost to position the ad at any particular location in the magazine.

regression model and least squares used to estimate the

regres-sion coefficients. The last alternative uses factor analysis on Identifying such nonlinearities is important in identifying changing marginal relationships between ad characteristics the 260 linearly independent regressors in our sample to

reduce the dimensionality of the problem and then uses least and readership scores. For example, consider an advertiser who is faced with a decision on how prominent to make the squares to estimate the coefficients for these factors. We used

(14)

average, making it “impossible to miss” (B 5 4) results in provide an insight into the form of the relationship between variables such as position of the ad in an issue, size of the ad higher noted and associated scores than “not present” (B5

0). However, the nonlinear relationship suggests that placing and brand prominence, and the readership scores. They also identify product-specific effects through the use of interaction the brand somewhat in the middle of these extremes, as “easy

to miss” (B52), is no different than omitting it completely. terms based on a categorization suggested by Rossiter and Percy (1987). As such, they have implications for the design It does not result in half the level of noted and associated

exposure as would be suggested by a model that imposed a of ads by potential advertisers in this magazine. More gener-ally, with the growth of complex databases in marketing, we linear relationship.

The ability of the methodology to incorporate a large num- feel that our technique provides the researcher with a powerful modeling tool that is practical, easy to implement, and can ber of product type interaction variables into a regression

framework with a relatively low sample size also proves impor- be applied to complex datasets containing a large number of variables, such as Starch print advertising data.

tant. It enables a more detailed modeling of the complex interaction effects and provides a measurement of the

relation-The authors would like to thank John Rossiter for helping them with the

ships between product type, ad characteristics, and exposure.

classification in the section titled “Variables Describing the Product Category”

For example, whereas Figure 3 (a.2) confirms that larger ads

and for permission to use the Starch data. Michael Smith was supported by

obtain higher casual attention, it is particularly important

an Australian Postgraduate Research Award and a Monash Faculty grant. The

for products that are categorized as high involvement and _{work of Sharat Mathur and Robert Kohn was partially supported by an} transformational. Moreover, Figure 3 (c.2) identifies that Australian Research Council Grant. We are grateful to Professors Gary Lilien, David Midgley, John Roberts, and John Rossiter for comments on a previous

higher in-depth exposure (as measured by the read-most

version of the article, as well as three anonymous referees whose comments

score) can be obtained by larger ads when the product is

helped improve the article.

transformational, but not when it is informational.

Advertising managers also can obtain predictive exposure

ratings for proposed ads, given decisions on design and mes-

_References

sage factors. This is viable because the methodology allows a _{Akaike, H.: A Bayesian Aanalysis of the Minimum AIC Procedure.} reliable estimation of the collective impact of ad characteris- _{Annals Institute Statistical Mathematics}_{30 (1978): 9–14.} tics. Such reliability is reflected in the relative improvement _{Box, George, and Cox, David: An Analysis of Transformations} Revis-in predictive ability found for the ads Revis-in the March 1993 issue. ited, Rebutted.Journal of the American Statistical Association 77

(1982): 209–252.

Ad design then can be tailored to maximize this predicted

exposure, given product type and budgetary constraints. Box, George, and Tiao, George:Bayesian Inference in Statistical Analy-sis, John Wiley and Sons, New York. 1973.

Casella, George, and George, Edward: Explaining the Gibbs Sampler.

Summary and Conclusion

The American Statistician46 (1992): 167–174.

This article provides a new modeling approach to regression

Curry, David J.:The New Marketing Research Systems: How to Use that allows the data to determine the functional form of the

Strategic Database Information for Making Better Marketing Deci-independent variables and to identify the significant indepen- _sions_{, John Wiley and Sons, New York. 1993.}

dent variables, as well as an appropriate transformation of the _{Diamond, Daniel S.: A Quantitative Approach to Magazine} Advertise-dependent variable. It can handle regressors that lead to a _{ment Format Selection.}_{Journal of Marketing Research}_{5 (1968):} regression matrix that is collinear (such as in the Starch point 376–386.

advertising dataset examined here) because it explicitly ac- _{Edell, Julia A., and Staelin, Richard: The Information Processing of}

Pictures in Print Advertising.Journal of Consumer Research 10

counts for the possibility that many of the regressors are

(June 1983): 45–61.

redundant. It reduces the subjective requirements of

tradi-Eubank, Randolf L.:Spline Smoothing and Nonparametric Regression,

tional techniques in prespecifying functional forms and

trans-Marcel Dekker, New York. 1988.

formations of the dependent variable. Whereas an

uninforma-Finn, Adam: Print Ad Recognition Readership Scores: An Information

tive prior is used in this analysis, if a researcher has a strong

Processing Perspective.Journal of Marketing ResearchXXV (May

prior information on which regressors are important, or which

1988): 168–177.

data transformation is most likely, then it can be easily

incor-Frankel, Lester R., and Solov, Bernard M.: Does Recall of an

Advertise-porated into the analysis.

ment Depend upon Its Position in the Magazine?Journal of Adver-These features of the methodology prove useful in our _{tising Research}_{2 (December 1963): 28–32.}

analysis of a dataset from an Australian monthly magazine for

Gardner, Burleig B., and Cohen, Yehudi A.: ROP Color and Its Effect

women. They enable the development of an integrative model _{on Newspaper Advertising.}_{Journal of Marketing Research}_{1 (May} that can capture the complex interrelations between ad charac- 1964): 68–70.

teristics and exposure measures that exist in such data. All _{Gelfand, Alan E., and Smith, Adrian F. M.: Sampling-Based} Ap-features of the model are estimated simultaneously, the relia- proaches to Calculating Marginal Densities.Journal of the American

Statistical Association85 (1990): 398–409.

bility of which is reflected in its improved predictive

(15)

Gibbs Sampling.Journal of the American Statistical Association88 _X₁₀_{ad content relates to “actual product or package” (0,1)}

(1993): 881–889. _X

11ad content relates to “actual product or package” (0,1)

Hanssens, Dominique M., and Weitz, Berton A.: The Effectiveness _X

12number of illustrations (0,1), 0-multi-illustrations, 15

of Industrial Print Advertisements across Product Categories.Jour- _{single main illustration} nal of Marketing Research17 (1980): 294–306.

X13bleed (unbordered) (0,1)

Hardle, W.:Applied Nonparametric Regression, Econometric Society

X14headline length (0,1), 05 ,8 words

Monographs, Cambridge University Press, Oxford, UK. 1990.

X15large headline (0,1), 15 .1.3 cm high

Kass, Robert E., and Raftery, Adrian E.: Bayes Factors.Journal of the

X16small headline (0,1), 15 ,1.3 cm high American Statistical Association90 (1995): 773–795.

X17headline above main illustration (0,1)

Mardia, Kantilal, Kent, J., and Bibby, John: Multivariate Analysis,

X18headline beside main illustration (0,1)

Academic Press, London. 1979.

X19headline below main illustration? (0,1)

Mitchell, T. J., and Beauchamp, J. J.: Bayesian Variable Selection in

X20headline color (0–2), o5no color, 15 partial color,

Linear Regression.Journal of the American Statistical Association

83 (1988): 1023–1036. 25all color

Rossiter, John R., and Percy, Larry:Advertising and Promotion Manage- X21headline type (0–2), o5all reverse, 15partial reverse, ment, McGraw-Hill, New York. 1987. ₂₅_{all straight}

Smith, Michael, and Kohn, Robert: Nonparametric Regression Using _X₂₂headline case (0–2), 05all lower, 15partial upper,

Bayesian Variable Selection.Journal of Eocnometrics 75 (1996): ₂₅_{all upper} 317–344.

X23copy area (0,1), 15 .1/3 area

Smith, Michael, and Kohn, Robert: A Bayesian Approach to

Nonpara-X24number of words in copy (0,1), 05 ,50 words

metric Bivariate Regression.Journal of the American Statistical

Asso-X25opposite page is mainly other advert (0,1) ciation92 (1997): 1522–1535.

X26opposite page is mainly related educational (0,1)

Smith, Michael, Sheather, Simon, and Kohn, Robert: Finite Sample

X27opposite page is mainly related editorial (0,1)

Performance of Robust Bayesian Regression.Journal of

Computa-tional Statistics11 (1996): 317–343. X28opposite page is mainly unrelated editorial (0,1)

X29opposite page is mainly part of the same multipage ad

Soley, Lawrence C., and Reid, Leonard N.: Industrial Ad Readership

as a Function of Headline Type.Journal of Advertising12 (1983): (0,1)

34–38. _X

30advertorial (0,1)

Starch, Daniel:Measuring Advertising Readership and Results, McGraw- _X₃₁_{coupon (0,1)}

Hill, New York. 1966. _P_{position in issue, continuous on [0,1], 0}₅_{front cover,} Trodahl, Verling C., and Jones, Robert L.: Predictors of Newspaper ₁₅_{back cover}

Advertising Readership.Journal of Advertising Research5 (1965):

I insert (0,1)

23–27.

Ssize of ad (0–3), 05 less than full page, 15full page,

Twedt, Dik W.: A Multiple Factor Analysis of Advertising Readership.

25double page, 35multiple pages Journal of Applied Psychology36 (1952): 207–215.

Bbrand prominence (0–4), 05not present, 45impossible to miss

Appendix: Description of the Data

R recipe (0,1)

M issue number (1–13), 15 March 1992, 135 March There were 1,005 observations. The first 943, March 1992 to

1993 February 1993, formed the calibration sample. The remaining

advertisements, numbers 944–1,005 from March 1993, were

Product-Type Independent Variables.

used for model validation. The following is a list of the

inde-All the variables take the values 0 and 1 only, where 15true pendent variables.

and 05 false. The category to which each product belongs is provided in brackets. The product categories areG1 (low

Advertisement Attribute Variables.

involvement and informational), G2 (low involvement and

X1 color (0–2), 05monotone, 1 52-color, 25 4-color _{transformational),} _G

3 (high involvement and informational)

X2 left-hand position (0,1) _and_G

4(high involvement and transformational).

X3 right-hand position (0,1)

T1(G4) women’s apparel/accessories

X4size of issue, (0–2) 05296 pages or less, 15300 to

T4(G1) toiletries and health 320 pages, 25 more than 320 pages

T7(G1) household goods

X5 square finish in main illustration (0,1)

T10(G4) men’s apparel/accessories

X6 silhouette or other shape in main illustration (0,1)

T13(G2) books and magazines

X7 photo used (0,1)

T16(G1) hair

X8 size of photo (0,1), 05 ,1/2 of space

T19(G3) government

X9 ad content relates to “end result of using or eating the

(16)

T2(G2) food teristics: X[1]

1 person(s) 181; X[1]2 baby(ies); X[1]3 child(ren);

T5(G3) household appliances X[1]

4 animal(s);X[1]5 women’s fashions;X[1]6 furnishing(s);X[1]7 a

T8(G4) building and decorating premium offer;X[1]

8 the product(s).

T11(G4) children’s wear

2. HEADLINE APPEAL. Again, each ad can have only one

head-T14(G2) drink

line but may have any of the following appeals:X[2]

1 a promise;

T17(G4) jewelry

T20(G4) craft X[2]

2 new feature of the product;X[2]3 a question;X[2]4 an

exclama-T23(G3) finance tion;X[2]5 news item; X[2]6 prize(s) in a contest;X[2]7 price; X[2]8

T3(G3) pharmaceutical new product.

T6(G4) household furnishings

T9(G4) travel and holidays 3. PREDOMINANT APPEAL. This can have any of the following

T12(G4) motor vehicles characteristics; X[3]

1 social status; X[3]2 increased pleasure; X[3]3

T15(G2) cosmetics/beauty personal security; X[3]

4 health or hygiene; X[3]5 knowledge or

T18(G2) records quality;X[3]6 increased income/capital gain/savings.

T21(G1) baby

4. PREDOMINANT COLOR OF ADVERTISEMENT. The registered

Add Message Dummy Variables

colors are: X[4]

1 red; X[4]2 blue; X[4]3 green; X[4]4 grey, black, or monotone;X[4]

5 pastel shades, no predominant color;X[4]6 yel-1. PREDOMINANT FEATURE. Each ad can have only one