• Tidak ada hasil yang ditemukan

Our agent-based data integration system consists of software agents providing com- binations of data processing services in a heterogeneous, distributed environment, as best suited for the problem of weather data provisioning (Figure 4). In Figure 4, we depict agents as special three-part images that are explained in our previous paper (Kalra & Steiner, 2005). The Weather Tool’s agent system is implemented and deployed using our in-house agent development and deployment framework, multi- agent development environment (MADE). MADE provides essential enhancements to JADE (Java agent development environment) (Bellifemine, Poggi, & Rimassi, 1999) with simplified interfaces for agent development and capabilities such as subscription services. JADE is a Java-based middle-layer software framework for agent development and deployment supporting the FIPA standards (FIPA). JADE provides the Agent Communication Language (ACL) message performatives and protocols for agent communication. At this point, we use a pre-determined metadata for describing the services in the absence of Semantic Web technology. Agents register

and advertise their services, namely the “retrieval,” “repository,” and “translation”

services (as described as follows) using the JADE directory facilitator (DF).

We use service descriptions to describe the type and content of the data provided by a source for the agents to publish their services. These services descriptions can then be discovered by other agents in the system. The system has four types of agents:.retrieval agents, repository agents, translation agents and application agents. Once registered, the retrieval agents either start retrieving data from their sources or wait for a request for data. The repository agent, after registering its repository service with DF, looks up the retrieval services and subscribes to them.

The repository agent periodically checks for and subscribes to new retrieval agents that join the system. The application agents look up the weather repository service and subscribe to them for their data requirements. The repository agent looks up and invokes translation services when data is requested in formats other than its standard, data storage format.

. Retrieval.agents: The Weather Tool retrieval agents provide the data extrac- tion, parsing, and standardizing services. These lightweight, wrapper-style agents extract data as it exists at the data sources, use custom weather data format (“QLI Format”) to parse the data into, and pass the weather data to their subscribers. The retrieval agents are implemented specific to the data source they are responsible for retrieving data from and are hence lightweight in terms Figure 4. System architecture for the weather tool

Web Pages Retrieval Agent Email

Email

Retrieval Agent

Structured Text Structured

Text

Retrieval Agent

Web Service Retrieval Agent

Standardized data formats

Application-specific data formats

Data Repository

Repository Agent Domain-Specific Consolidation &

Validation Rules

Dispersion Modeling Application Agent

GIS Awareness Application Agent Traffic

Monitoring Application Agent

Translation Agent Retrieval Agent

Database Web Pages Retrieval Agent

Web Pages Retrieval Agent Email

Email

Retrieval Agent Email Email

Retrieval Agent

Structured Text Structured

Text

Retrieval Agent Structured

Text Structured

Text

Retrieval Agent

Web Service Retrieval Agent

Web Service Retrieval Agent

Standardized data formats

Application-specific data formats

Data Repository

Repository Agent Domain-Specific Consolidation &

Validation Rules

Data Repository

Repository Agent Domain-Specific Consolidation &

Validation Rules

Dispersion Modeling Application Agent

Dispersion Modeling Application Agent

GIS Awareness Application Agent

GIS Awareness Application Agent Traffic

Monitoring Application Agent

Traffic Monitoring Application Agent

Translation Agent Translation Agent Retrieval Agent

Database

Retrieval Agent Database

of process logic and concentrate only on the process of retrieval. These agents do not maintain any repositories of the data they collect.

. Repository.agents: The repository agents provide the data completion, con- solidation, storage, and updating services. Repository agents either subscribe to or send individual requests to the available retrieval agents as per the user input. In our current implementation, the repository agent maintains a reposi- tory of weather data in QLI format using the MySQL database. The repository agent also uses the discovery and the integration of agent services to segregate the original data request into individual service-specific requests.

. Translation.agents: The translation agents provide format conversion services between the QLI Format and other source-specific formats. The translation agents apply information-specific conversion rules and are very specialized in particular format conversions. The translation agents publish the formats of their input and output data. For example, GIS Data Converters translate data from weather repository specific “QLI format” to GIS application specific format. For example, agents that wrap any GIS applications extract a set of weather data attributes from the repository storage format.

. Application.agents: Application agents are wrapper-style representatives of various applications in the Weather Tool and act as information consumers.

Application agents look up and either request for or subscribe to repository agents that provide services for the type of data that they require.

The.Weather.Tool.Test.Case

Beyond the need to directly inform human users, weather data is often an input for many software applications including emergency response tools that support scenario planning and building foresight for emergency planners. The applications require different components of weather data as input, for example, viewing daily-summa- rized weather data on a map of a geographical area helps provide a situational view of the world. In our test case, maps/GIS are one of the weather data consumers.

The other application in the test case is a traffic monitoring and forecasting service.

Besides analyzing, predicting, and planning for the current traffic purely based upon the time of the day and the road conditions, a traffic monitoring system should take into account current and forecast weather conditions. Warnings such as snow, ice, and high winds can help traffic experts to anticipate traffic problems, slowdowns, and accidents, due to poor road conditions. For such applications, the weather re- quirements can range from historical to forecast weather data.

Weather data comprises of readings on weather elements such as temperature, pres- sure, wind speed, wind direction, dew point, precipitation, humidity, and so forth.

The weather experts collect data by using sensors and other advanced equipment, for

example, the surface weather data (weather at sea-level) for a given time and loca- tion is measured and can be accessed at the National Weather Service (NWS) Web site. Weather data is validated within two to three days and can be accessed at the National Climatic Data Center (NCDC) Web site (NCDC) for a few pre-determined weather stations only. National Oceanic and Atmospheric Administration (NOAA) has non-validated current forecast data. The Forecasts Systems Laboratory’s (FSL) Web site allows requests for weather data at several altitudes in the atmosphere, the radiosonde data (A radiosonde is an instrument for sending sensors up in the layers of the atmosphere for measuring weather data). In the test case, there is one retrieval agent each for the NCDC and the NOAA Web site. The NOAA-agent retrieves data every hour as the NOAA Web site has transient data. The NCDC and FSL agents retrieve data on demand as the corresponding sites maintain their own repositories and provide URL-based query support. In the test case, we have the following agents:

• The retrieval agents:

NOAA.forecast.agent: Retrieves forecast weather data from the NOAA Web site

NOAA.historical.agent: Extracts validated historical weather data from the NCDC Web site

• The application agents:

Map. agent: Represents a map component that displays the map of Delaware with various layers of geographical data including location of schools, fire departments, police stations, and hospitals

Traffic agent: Represents a traffic monitoring and analysis application

• The repository agents:

Weather.repository.agent: Stores data and provides complete weather data for a given date and location to application agents

• The translation agents:

Map.translation.agent: Provides QLI Format to GIS layer format con- version

Traffic translation agent: Provides QLI Format to traffic specific format conversions.

NOAA.forecast.agent: Retrieves and parses data from the NOAA fore- cast service Web site into standard JAVA objects. The NOAA Web site is shown in Figure 2. Detailed forecast data is available for the next 60 hours and summarized data exists for the next ten days. NOAA Historical Agent retrieves and parses data from the NCDC Web site on demand.

Historical data is available for every hour in the format shown in Figure 1. Historical data for a day only becomes available after two to three days.

Historical data is validated by the experts and is more detailed than the forecast data in terms of available weather attributes.

The map agent represents a GIS application that allows users to view data from the geographical perspective, based upon data such as latitude and longitude. Weather data for such applications is typically measured for an area or a point. GIS applica- tions can provide the geographical perspective for weather data by showing their coverage area on the map of a region. The traffic agent represents a traffic monitoring and analysis application that utilizes the wind speed and other daily warnings fields such as heavy rain, snow, sleet, icy road conditions, and so forth, of the weather data. As weather data exists in the repository for a region and a time, the traffic agent requests and subscribes to current weather data for the geographical regions to which the various roadways belong.

The map translation agent extracts certain features from the standard weather data, thereby reducing the set of attributes to those required by the map agent. The traffic translation agent extracts wind speed and other warnings from the standard format weather data. The weather repository agent stores forecast and historical data into one weather repository as and when it gets updates and new data from the scrapers.

The weather repository agent also performs validation, completion, and consolida- tion of data. It successfully updates the previously-existing forecast data when the corresponding historical validated data becomes available.

The NOAA forecast agent and the NOAA historical agent register with the JADE directory facilitator service as “weather retrieval” service providers. These agents retrieve data and send update messages with the current results either to their subscribers or to the requesting agent. The weather repository agent registers its

“weather repository” service and performs a DF look-up of weather retrieval service providers. The weather repository agent subscribes to all the resultant agents using the subscribe messages. The map agent and the traffic agent, as per their require- ments, perform look-up for weather repository services and either subscribe to or send queries to the resultant agent(s) using the ACL subscribe or query messages.

The map translation agent and the traffic translation agent register their “map con- version” and “traffic conversion” services with the DF, respectively. The weather repository agent checks its repository for requested weather data and accordingly sends requests to appropriate retrieval agents. The weather data requests are chan- neled to appropriate translator agents in case of a specific format request. The results received from the translation agents are sent to the requesting application agents in appropriate ACL inform or query reply messages.

Example.Enhancement

The Weather Tool test case worked well, with agents registering, finding, and invok- ing the various data services. At this point, we identified a need to introduce a new application to the Weather Tool. This application is an air quality modeling system that models the transport of pollutants in the air following an accidental or intentional release of a biological or chemical substance. The specific application is CALPUFF (CALPUFF) that is the default choice of the U.S. Environmental Protection Agency (EPA) for modeling long-range transport of the pollutants. CALPUFF computes an approximate plume of the pollutant over a period of time based upon the weather data for that time period. The assessment of the pollutants is done based upon the weather data in a geographical area. The details of CALPUFF and the weather data used by CALPUFF can be found in our previous paper (Kalra & Steiner, 2005).

To integrate the air quality modeling system, we needed a translation agent for converting standard data to the formats that CALPUFF required, the “CALPUFF translation” agent. We also built the logic for merging the radiosonde observa- tions with the existing surface weather data into a service providing agent — the

“radiosonde consolidation” agent. As and when the radiosonde data for a time and location becomes available, it is added to the existing surface data for the same time and location. We also implemented a new retrieval agent; the “radiosonde agent”

that periodically retrieved and parsed data from the FSL Web site and passes on the results to its subscribers. The weather repository agent employed the CALPUFF translation agent to combine the radiosonde data for a particular time and region with the existing weather record for that time and region.

Once the required agents were implemented, there was no need to bring down the system to incorporate this new weather data source. Rather, upon launching, the new retrieval, translation, and the consolidation agents dynamically joined the system by registering their retrieval, translation, and consolidation services respectively with the DF. During its periodic check for new weather retrieval service providers, the weather repository agent subscribed to the new retrieval agent and started receiv- ing the new weather data. Finally, to get the newly available data from the weather repository agent to CALPUFF programmatically, we wrapped the CALPUFF mod- eling system with an application agent, the plume agent.

Conclusion

“The Goods.” The example enhancement mentioned in the previous section demon- strates how flexible agent architectures can be used for solving a variety of problems associated with collecting data from a number of disparate, independently-managed information sources, where the quality, format, and sources of the data may change

sources and data formats without hampering the operational system. It also enables the sharing of data in different formats and different subscription models over a wide range of applications.

The Weather Tool’s agent-based approach proves to have many advantages over the current approaches. The Weather Tool seamlessly handles new data sources as opposed to the statically-configured data warehouses. The Weather Tool architecture has middle layer repository agents acting as data provisioning, data consolidation, and validation service providers. The other approaches handle mostly validated data, as adding a validation logic layer to the system components can slow down information flow considerably.

“The Bads.” Our current architecture strictly associates one data source with each retrieval agent. This can be extended to have one retrieval agent scrape data from many sources. This allows the retrieval agents to continue to scrape data from other sources when it fails to scrape data from a particular source due to changes such as format change. The data retrieval agents can be enhanced to include a thin verifica- tion and consolidation layer to support the above functionality. Another important direction for improvement is to have discovery-style retrieval agents that go out and search for the relevant sources given concise information descriptions. This will also take care of the source failure case, which is not addressed in the current system.

The Weather Tool stores all its standardized, integrated, and validated data in a single central repository, thus giving rise to a single point of failure. The best solution for this would be to implement a federated network of replicated and/or distributed repositories as an extension to this architecture.

The Weather Tool in its current form assumes that the data sources (new and existing) and the external applications understand and share a common metadata that is used for representing the services and the service descriptions. These restrictions make the Weather Tool rely upon certain constants that are not industry standards by any measure. The obvious solution to this problem is to use Semantic Web technology.

Another problem is associated with the absence of a required service, where a simi- lar service exists in the system but remains invisible to the system. The extensions of the semantic technology, specifically the concept of metadata/schema/ontology mapping is key to solving this problem. The ontology mapping tools derive map- pings between two data schemas using concepts such as word processing, informa- tion flow, and so forth. In the next section, we talk about the next generation of the Weather Tool that takes advantage of the Semantic Web technology for enhanced performance and flexibility.

From the overall architecture point of view, the vision is to have a completely- autonomous adaptive network that adapts to the data traffic and determines when and where data should be cached or stored while at the same time optimizing the request-response time. Such a vision is also planned for in the next generation of the Weather Tool.

Intelligent. Data. Management:...

The. Second. Generation

The Weather Tool has performed very well within the context of the weather-re- lated applications. The agent-based, service-oriented architecture of the Weather Tool has brought known and unknown data sources within one data environment, where their data is accessible at a central repository. The idea of intelligent data management (IDM) has evolved to overcome the universal schema and the central repository aspects. The IDM framework enables connectivity to a set of disparate data sources (including applications) QLI’s IDM framework provides dynamic integration and (re-) configuration of several diverse, geographically-apart data sources, thereby offering a virtual data environment. No data is collected centrally but the data sources are polled for data on demand. IDM draws upon the service oriented architecture (SOA), Semantic Web (SW), and multi-agent system (MAS) technologies for representing, deploying, combining, and invoking various data- processing services (such as data gathering, validation, completion, translation, and formatting). These services are provided by agents supported by MAS mechanisms for agent discovery, communication, cooperation, and easy system configuration for service deployment and usage. The agents perform semantic similarity matching between the data requests and available data provisioning services and invoke them to fulfill the requests. The IDM architecture also provides for a simple, machine- processable, highly-structured query-based data access interface to its users. The query interface allows for formulation, decomposition, and solving of queries based upon representative keywords. The keyword queries range from keywords-based search for unstructured plain text data to highly-structured schema templates across a large number of heterogeneous data sources. Scalability in terms of functional- ity and types of queries as well as the number of data sources has been one of the guiding factors of the architecture of the query interface.

Conclusion

Effective operation within any enterprise relies heavily upon persistent, ad hoc, and intelligent collection of data from geographically-distributed heterogeneous sources.

The advantage of information as derived from efficient and effective data retrieval is crucial to effective enterprise management. As the number of relevant data sources increases, the complexity of integrating them into the existing system renders the current information technology systems incapable of providing information to the analysts. The demand for innovative technology supporting dynamic data integra- tion in a highly distributed environment is unprecedented, both in the industry and