INSIDER’S VIEWPOINT FROM YOUR WEB LIBRARY PROFESSIONAL: BILL DIMM,

MAGPORTAL

Bill Dimm, MagPortal’s CEO, holds a doctorate in theoretical elementary particle physics from Cornell University. Luckily for librarians and Web surfers, he has also been involved in math- ematical modeling and computer programming. His company, Figure 1.2 MagPortal’s retrieval for a keyword search on “health

Hot Neuron, has been facilitating access to free content since 1999.

NT: For the benefit of readers who may not know some of the publishing jargon: I’ve read in several journal articles that MagPortal is an “aggregator.” Can you help define aggregator in a way that will help them understand the site’s role as it relates to the individual reader?

BD: An aggregator collects together content from disparate sources in some organized way. For example, if you wanted to find the most recent articles on search engines, you could visit the Web sites of a half dozen publications on that topic, or you could just go to the appropriate category on MagPortal.com (http://magportal.com/c/edu/research) and find links to all recent articles in a single location. Not only would you find articles on search engines from publications dedicated to that topic through MagPortal.com, but you would also find articles on search engines from publications (like PC World) that only write about search engines occasionally.

NT: Bill, I have found many Web sites that list links to magazines and journals. What’s different about MagPortal?

BD:Unlike Web sites that provide a directory of links to maga- zine Web sites, we provide links directly to the individual articles. When you are trying to find an article on something specific, we take you straight to it instead of leaving you to hunt through the publications individually.

NT: Using Google, I see that at least 2,500 Web pages are linked to MagPortal. Many of them are public libraries, but many are pages on freelance writing (“Beginner’s Guide to Freelancing” at

http://www.poewar.com/articles/beginner.htm), and others include helpful pages about how to build one’s own Web site (for example, “Library Support Staff” at http://www.librarysupport staff.com/webpubhelp.html). I have also read articles about MagPortal that state it is helpful to put a link on a business site to MagPortal to facilitate access to current information. It’s obvi- ous how the individual gains from these links to you, but how does MagPortal benefit?

BD: Links to MagPortal.com bring more users to our site, which means more ad impressions and the opportunity to make more people aware of our premium feed offering.

NT: MagPortal isn’t just a set of links to online magazines and journals. Isn’t it an engine to help users find exactly what they want from publications with free content on the Web?

BD:MagPortal.com provides several ways to find articles. First, we have over 200 categories that our human editors populate with articles. New articles are normally added within one business day of the publisher putting them on the Web, so you can easily browse the most recent articles on a topic if we’ve cre- ated a category for it. Second, we provide a full-text search engine. Our search engine is more current than the generic search engines, and it has some features that generic search engines often lack. For example, in addition to sorting results by quality-of-match, you can also sort by date, and you can restrict the search to a particular category or publication. Finally, we provide “similar articles” links next to the articles (represented on the screen by wavy orange equal signs) that use our Hot Neuron Similarity software to provide listings of articles that our propri- etary algorithm determines to be similar to that article. Once you find an article you like, you can click the yellow highlighter pen

next to it to mark the article so that you can easily find it later.

Unlike a bookmark, this feature allows you to add your own annotation, and we will take care of fixing the URL if the publisher moves the article whenever possible.

NT: At the MagPortal site there is information about AmSouth Bank using MagPortal for fresh content. I believe these are called “feeds.” What is the site’s feed component all about?

BD:The premium feeds allow other companies to embed a mini MagPortal.com in their Web site that uses a subset of our data.

They get the article listings, search engine, and similar articles, restricted to the particular topics that they license. We provide them with a small piece of software to install on their Web server that automatically pulls any necessary data from MagPortal.com and displays it directly on their site to match the site’s look and feel. The article links take the user to the publisher’s site, as on MagPortal.com. We customize the datasets for the premium feeds to exactly match the client’s area of interest. Web sites using the service range from a very narrow focus like nursing, marketing, law, etc. to more broadly focused like AmSouth Bank’s site, which displays all of our business and Internet topics in its small business section.

NT: From a practical point of view, can you explain the process of indexing the free online content so that it is searchable for the general reader? For example, I enter “bin laden” and retrieve 608 hits—that’s a lot better than browsing to find articles about him. I’m hypothesizing that you have to take the raw data and feed it through an engine. How is this achieved?

BD:Like any search engine, we have a piece of software, called a “spider” that hunts through pages on the Web. Our spider has

been tailored for each of the publications that we cover, so it heads immediately to the table of contents page for the current issue of the publication to see if there is anything new. It also knows how to follow the “next page” links within an article and recognize that all of these pages are part of the same article (generic search engines index pages rather than articles, so they don’t have to collect things together this way). We then run the HTML for the pages through our parser, which attempts to extract the title, author, date, and body of the article. A human reviews the result for accuracy, writes or clips a very brief sum- mary, and categorizes the article.

To put this into the search engine, we cut each article down into words and discard any unimportant ones like “the” (called

“stop words”). We count the number of times each word appears in the article, and we normalize by the length of the article. This information goes into a database. Much as the index at the back of a book helps you to easily find the pages a word occurs on without searching each page one by one, this database allows our search engine to quickly find out which articles contain a particular word. It also tells us how important that word is in each of the articles (i.e., how many times the word appears in the article relative to the total number of words in the article), which allows us to sort the search results by quality of match.

NT: What topics seem to interest site visitors the most?

BD: We have a very broad user base ranging from students doing schoolwork to professional searchers to people who are just browsing for articles on their favorite pastime. We do notice surges in traffic from people hunting for reviews of hot new elec- tronics products or car models. We usually have such things in our index long before the generic search engines do.

Other Aggregators and Portals for Online

Dalam dokumen Building a World Class Personal Library with Free Web Resources (Halaman 59-64)