D ATA G ATHERING - INTERNET PRICE COMPETITION METHODOLOGY

4. INTERNET PRICE COMPETITION METHODOLOGY

4.3. D ATA G ATHERING

the spectrum, a graduate student may choose to search the entire marketspace for the lowest price because their time value of money is less.

This methodology is tested with an exploratory data set in the next chapter, Chapter Five. By testing the methodology with data collected for three markets selling homogenous goods—books, compact discs, and software–

the limitations of this methodology along with the development of new hypotheses are explored. In particular, Chapter Five will try to find the data to support the four hypotheses in this chapter with the data collected.

Assumption 6 (A6): The prices physical retailers post on the Web are the same prices they have in their physical stores.

Finally, the granularity of temporal data collection is chosen. How often should changes in prices be examined requires a balance of collection and processing time with a measurement of signaling and response in the marketplace. While choosing this level of granularity mostly affects Hypothesis 3, the time periods chosen should not be too long when many price changes could occur and the observations do not collect price points for all in- between changes. If the time periods are too short, the overhead necessary to collect and analyze large amounts of data may be too difficult. Chapter Five will use two different granularities of weekly and monthly changes.

Once this information is determined, there is now a set of titles, retailers, and times that are used to uniquely describe the dependent variable of price. The title/retailer/time pairing is the independent variables chosen for the analysis and the dependent variable of price is collected. While data may be analyzed across titles, retailers, or titles, the fundamental element at the lowest level of granularity is the price for a unique title/retailer/time combination. How does one find the dependent variable of price?

One of the benefits of the Internet marketplace is that Internet tools can be used to collect the price data. A mechanism to ensure easier data collection is to select retailers in both categories that posted prices on the Internet.

Most physical retailer price data in the next chapter was collected through the Internet in the same way it was collected for Internet retailers. This methodology allowed lower cost data collection when compared to the many hours necessary to visit physical locations and track prices. The result is that a greater amount of data was collected.

However, this method of data collection means that the results must be carefully interpreted. Even if the physical retailers posted the same prices as they had in their stores, it may be that they differ systematically from stores that avoid posting prices on the Internet. Furthermore, in the data from Chapter Five, posted prices were not verified with the in-store prices for these retailers. Accordingly, any price difference found should be interpreted as differences in posted prices and not necessarily differences in actual in store prices.⁴³ The vast majority of sales for the "physical retailers" were via their physical retail locations, not Internet sales, during the sample period as indicated by the retailer information in Appendix C. For each pairing, a URL (Uniform Resource Locator)–the unique Web identifier– is used which lists the price for this title/retailer pairing.

While using a URL provides easier data collection, there are limitations to its use. Because many retailers who have Internet Web sites aggregate a large number of products, they do not code individual Web pages and assign a URL.

Rather, these retailers have databases of their products and product information and the HTML document is 42 Chapter Five explores the book, compact disc, and software markets.

43 An extension of this work and an opportunity for future research could explore the robustness of the results to a finer partitioning of retailing types, including a check of prices charged in different channels by the same retailer. This exploratory analysis will only focus on one type of distinction: prices posted by retailers with a physical presence vs. those that are pure Internet retailers.

generated on demand. As long as the URL used for the data collection asks for the same information to be processed this is not a problem. However, if the syntax for generating a Web page changes or if the URL includes temporal information that expires, revisiting the same URL over time may give false information. While the data collection in Chapter Five did not find these problems to be ubiquitous, it may be in the best interest of Internet retailers to do this as sophisticated search intermediaries and increased price competition erode profit margins. In effect, the retailer could increase consumer search costs by having URLs that expire.

The resulting data collection methodology for analyzing prices set by Internet and physical retailers is shown in figure 4.1. The main idea is to collect prices for different products sold by different retailers over time. The data can then be used to describe what a particular user would expected to pay given particular environment variables such as their experience with Internet shopping and the alternatives they have in the physical market. In graphical terms, the methodology is described in Figure 4.1. Figure 4.1 is a diagram that identifies the elements of this data collection. Then, at t = 0, the Web page is downloaded, the page parsed for price information, and the data point is extracted. This process is repeated in subsequent time periods.

Figure 4.1: Data Gathering Flow Chart

Titles Retailers

Determine URLs for Internet retailers

Download web page

Parse web page for price

Data point Pirt

t = 0

t = t + 1

Automation of the process of downloading the Web page is possible but automating the parsing of that Web page is more difficult. Given a set of URLs, a script can fetch the data (the HTML document) and save it locally for future analysis or the script can immediately hand this data to the parser. This process eases the burdens on labor of collecting a large number of data points. The more difficult task is to parse through that information to find the price. Because there is no standard way to report price, sometimes the price is ambiguous of not available in the source code. For example, the retailer can put the price in an image format so the source code would look something similar to “<b>price:</b> <img src=”price.jpg”>” and analysis of this would not be possible. The ability for new protocols such as XML to standardize Internet commerce fields may make this parsing process easier, but as Zettelmeyer (1995) discusses, making the search process easier for consumers may not be in the best interest of Internet retailers. Putting price information into a picture format is just one way the can increase the consumer search costs.

The final element of the methodology shown in Figure 4.1 involves the temporal process of collecting data from Internet retailers over time. By finding the URL of a Web page that describes a unique retailer selling a unique titles can greatly aid in data collection. Because the URL may not change even though price changes, linking to a URL at different times can get access to different data points but a common retailer/title pairing. Downloading the source code from the URL is not difficult, but parsing the information within the HTML coding may be. Therefore, while it is relatively easy to automate the Web page download, a human filter may be necessary to parse the Web page and extract the data point.

When gathering the price information there is an open question of whether to include tax and shipping costs. This is a difficult issue because tax rates vary from state-to-state and shipping options often vary. For example, purchasing a book in a local Barnes & Noble in the Boston area would mean a six percent sales tax while buying the same book via the Internet with Amazon.com, this sales tax could be avoided. However, Amazon.com would charge for shipping and the Barnes & Noble “shipping costs” would materialize in the form of personal transportation costs.

This methodology eliminates this confusion by only examining the price of the title and separating out the sales tax and shipping cost. This is reasonable because consumer behavior often does not always factor in the sales tax and/or shipping costs that accompany such an order. This is also consistent with the theory developed in Chapter 3 on the price setting and aggregation role of the intermediary separate from the transport of the good to the consumer. In essence, the transaction agreement is separated from the transaction fulfillment.

This chapter uses the four hypotheses and methodology developed in Chapter Four to analyze an exploratory data set. The analysis in this chapter indicates why there is still friction in Internet commerce markets. This analysis describes possible reasons why no evidence was found to support three of the four hypotheses in Chapter Four.

Finally, this chapter concludes with a discussion of the strengths and weaknesses of Chapter Four’s methodology, and the need for more data collection to determine whether or not the exploratory data is representative of the future of Internet commerce.

This chapter introduces both a quantitative and qualitative discussion of early pricing strategies for Internet commerce by examining prices for several homogenous products to test the four hypotheses. Two types of retailers are chosen: those which rely exclusively on the Internet for sales and a matched set of firms selling identical products who primarily use conventional channels. In total, 30,254 (23,789 from February/March 1997 and 6,465 from May 1997 through January 1998) observations from 52 different retailers for 337 distinct titles of books, music compact disks and software were collected and analyzed.⁴⁴ Because of a major new entrant in the Internet commerce book market, Barnes & Noble, the analysis for the book market was extended past March 1997 for monthly observations from May 1997 through January 1998.

While exploratory, the data presents some striking results. Specifically, there is no support of three of the four hypotheses from Chapter Four indicating that the Internet market may not be as frictionless as the theory suggests.

Each hypothesis is explored in greater detail in Section 5.1. In this sample of products and firms, the data does not show that Internet retailers who rely exclusively on the Internet for sales do not have lower prices than retailers who primarily use conventional channels do (Hypothesis 1). Furthermore, in two of the three markets studied, there was more price dispersion among Internet retailers to contradict Hypothesis 2. However, there is data to support Hypothesis 3 because the Internet retailers in the sample changed prices more frequently than their counterparts with physical stores. Finally, there did seem to be differences in the results from the three markets which indicates that product and market characteristics do matter which contradicts Hypothesis 4.

44 This data collection follows an even more exploratory conducted in June-August 1996 with the top ten titles in the same three markets: books, compact discs, and software. The findings indicated that prices are not lower on the Internet and led to the development of the February/March 1997 data collection and analysis. Bailey, Brynjolfsson, and Smith (1998) detail the early findings and compare them to the February/March 1997 findings.

The conclusions of these results should not be taken out of the context of the exploratory nature of the data. By applying this methodology to a relatively small and static data set, no evidence was found to show that a Bertrand price competition model holds for Internet commerce. While this analysis is still quite preliminary and subject to change as the Internet evolves, it does suggest that some of the existing assumptions and theories about the effect of the Internet on pricing need to be extended or revised to account for actual practice.

Other data sets and analysis in other markets do support this Chapter’s conclusions. The report by the OECD (1998) used analysis from Goldman Sachs that showed that a market basket of thirty-one items purchased on the Internet costs $457.89 while the same market basket costs $453.07 in Wall-Mart. The report by Lee (1998) examines prices for used automobiles sold at auction and finds that the price for the automobiles sold via the Internet was higher than the price of a similar automobile when sold in a physical market.⁴⁵ However, the immaturity of Internet commerce makes all such analyses exploratory.

Nakagawa (1997) examines pricing strategies on the Internet for music CDs, computer-related goods, books and wines. While his work does not explore price competition among retailers as this thesis does, he does develop a framework for pricing strategies by Internet retailers from a marketing perspective. Nakagawa finds that retailers targeting Internet consumers who are looking for the best product features and not the lowest price (such as consumers in the wine market) should not use price promotions (coupons, for example) to attract consumers.

Conversely, Internet retailers in the software market should offer price promotions to attract the price-conscious Internet consumers.

Even though the results are preliminary, these findings are not consistent with much of the conventional wisdom regarding Internet commerce or with the simplest interpretations of existing theory regarding the effects of lower search costs on price competition. Indeed, the systematic difference in prices for apparently identical products is in itself evidence that Internet markets are not “friction-free”. Accordingly, this chapter presents several alternative explanations for the results.

The empirical analysis suggests different reasons why the Internet market may not reduce friction. Section 5.3 explores five possible explanations why the data contradicts three of the four hypotheses. Several reasons are possible for these observations including: 1) high search costs: the Internet may not reduce the friction in the market sufficiently for consumers to search product features and prices at a low cost; 2) lack of trust: consumers do not trust all retailers the same, therefore retailers can maintain higher prices because they are more trustworthy; 3) market immaturity: Internet retailers are still experimenting with strategies for setting prices and developing consumer

45 Because the automobiles in Lee’s study were only similar and not identical, it is questionable whether or not his results are very significant.

Greater participation by consumers in an auction market may lead to higher prices regardless of the medium used to support the transaction.

Furthermore, a third party certified the automobiles sold on the Internet but the automobiles sold in a physical market auction were not.

relationships; and 4) price discrimination: retailers are able to separate groups of consumers to charge them higher prices based upon consumer demographics.

5.1. Description of the Exploratory Data Set

Selecting the parameters for data collection is an important part of the design of the experiment. During February and March 1997, the analysis examined different markets, retailers, titles, at weekly time intervals.⁴⁶ Each of these decisions affects the results so they will be examined in greater detail before a presentation of the results. For specifics on this data, Appendix A of this thesis lists the titles and some price data and Appendix C lists descriptions of the retailers. When the retailers and titles are selected for analysis, careful attention is paid to the characteristics of each to ensure proper sampling. With adequate depth in the number of retailers and titles that reflect the complex marketplace in the design stage of the research, this can ensure that the results generated have a greater chance of being statistically significant. By exploring only a subset of the wide-ranging titles sold and retailers that sell over the Internet, the analysis may lose a larger strategy developed by the retailer. For example, selecting only popular titles may only indicate prices set for loss-leader products. While more data is always preferred to less data, more data from the same class of retailers and titles may only be redundant and could give unequal weight to a category of titles, which would affect the results.

The three markets selected for this analysis—books, compact discs, and software—are all homogenous goods, which is necessary to support an assumption of the Bertrand competition model. Products were selected for their fairly high level of competition among retailers–the aggregation and pricing intermediaries–so price would be an important element of competition. Finally, markets were selected where suppliers (such as the publisher of titles in these three markets) did not try to impose a disintermediated market structure by selling directly to consumers.

Direct selling may skew price competition among the aggregation and pricing intermediaries because suppliers would be influencing the consumer prices. While there are publishers in each of these three markets that do sell directly, they usually sell at the list price of the title while the intermediaries or retailers sell at a discounted price (thereby trying not to undercut the retailers).

A total of 52 retailers were chosen for the study. The essential requirement for Internet retailers is that they have Web sites where they post their prices. Most of the physical retailers also were found to have Web sites. Because an exhaustive list of retailers in these three markets would be much larger than 52, a subset was selected to reduce the overhead of data collection. Retailer’s that allowed titles to be tracked by a unique URL (i.e.,

46 Chapter Two described why a reduced friction might lead to reduced transaction and menu costs. The preliminary exploratory analysis, which was conducted in June-August 1996, had two observations per week that rarely indicated price changes during the week. Therefore, weekly observations were appropriate.

www.retailer.com/title?12345) were given preference.⁴⁷ This choice does not introduce any significant selection bias because most Internet retailers use a unique URL for tracking, and those that did not use a unique URL did not appear to have prices significantly different from the retailers selected. Within the three markets, there were eight book, nine CD, and 35 software retailers.⁴⁸

A total of 337 titles were examined: 125 in the book market, 108 in the CD market, and 104 in the software market.

Approximately half of these titles came from the “most popular” lists in the different industries.⁴⁹ To ensure a greater depth, the remaining half are more “obscure” or “niche market” titles which came from specialized lists, recommendations from friends, or identification by niche retailers as a featured selection. While the gauge of popularity was thought to have some effect on the findings below, the analysis did not seem very different for the 50^th title on the Billboard list and a more obscure title. This mix of bestsellers and less-popular titles was chosen to reflect the mix of titles purchased by consumers in the market. If only bestsellers were chosen, then the prices observed may only reflect the loss-leader products. If a random sample of all titles sold were selected it might not give enough weight to the sales volume of popular titles.

The time period selected for the analysis in February and March 1997 was weekly. Humans made weekly observations by accessing the unique URL and recording the prices. While there are a total of 44,896 possible observations (8 weeks [observations] * [125*8 + 108*9 + 104*35]) not all permutations resulted in an observation.

Some retailers were out of stock of a title on a particular week and other retailers never carried some titles being tracked. In total, 23,789 observations were logged, which is approximately 56% of the total possible. The extension of the analysis for the book market changed the time granularity to monthly observations, which is discussed in Section 5.2. Additional details on the data gathering methodology can be found in Chapter Four.

With the markets, retailers, titles, and time period selected, all that was necessary was the collection of the dependent variable–price. While this may seem trivial, which price point to select is an important part of the data collection because there may be more than one price. For example, should the price data include shipping cost?

Should the price point come from the general price or membership price? For this analysis, member prices were not used except when there was no membership fee (disclosing an email address to an Internet retailer may create a free

47 This URL represents a dynamically-rendered html document based on a query from the user for title “ 12345.” Many of the retailers in the study had dynamically-rendered documents from a database as opposed to static html documents.

48 The reason for such a large number of software retailers is that the method for data collection in the software market was made easier by the UVision search intermediary. Instead of a unique title/retailer URL, there was one URL which could get the sales price for many (often 20 or more) retailers.

49 Books used the New York Times Bestseller list. CDs were taken from the Billboard top 50. Software titles were found in PC Week’s column on popular CD-ROM titles.

Dalam dokumen Aggregation and Pricing in Internet Commerce Joseph P. Bailey (Halaman 75-81)