LSDIS > Projects > Past Projects > ADEPT > Details

ADEPT

For the Alexandria Digital Earth Prototype (ADEPT Project), it has been proposed that UGA will play lead role in Iscape (information landscape) construction, involving the following activities:

  • Propose a tool or methodology for ontology construction for use by domain experts.
  • Prerequisite:
    1. Other team members have decided on the courses and relevant available assets -- documents/data, sources/repositories.
    2. Example information needs at high level from which we can guess/propose specific metadata and ontologies to be used/developed.
    • Define metadata schema and database.
    • Design and implement automatic and manual extractions as needed [with ability to add run-time geospatial processing capability by the members (UGA will NOT develop a new algorithm, but can help adapt preexisting or new processing capabilities in Iscape processing).]
  • Propose first version of detailedIscape specification (on the top of Web-centric XML and RDF based infrastructures).
  • Use all of the above to define a few Iscapes for consideration by team members and refine them based on input.
  • Develop, unoptimized, demonstration of Iscape processing for use in initial year 1 trial.

To achieve these objectives, we have identified certain focus areas that we need to concentrate on:

  1. Design and implementation of a metabase for Geographical Information Systems (GIS) domain.
  2. Implementing a diverse range of extractors for different web sites providing information pertaining to the geographical information systems domain. This will allow for extracting relevant metadata to populate the metabase.
  3. Design and implementation of ontologies for GIS domain specific vocabulary of terms. This will use ontology designer that is relevant for the purpose.
  4. Design and creation of a few representative Iscapes for performing queries on the metabase.
  5. Use of the agent architecture to process the Iscape request.
  6. Probable extension of architecture to implement extractors as distributed, stand-alone agents that could run on the server site or on client side and update the metabase.

Metabase for GIS

Metadata consist of information that characterizes data. Metadata are used to provide documentation for data products. In essence, metadata answer who, what, when, where, why, and how about every facet of the data that are being documented.

In the context of geospatial digital data, metadata is the information which describes the content, quality, condition, and other appropriate characteristics of the data.

The FGDC standard is designed to describe all possible geospatial data.

DataSets

DataSets are the basic entities that the system deals with. Each dataset is described by a set of metadata and is assigned a unique dataset-id when it is entered in the metabase.

No choice of supported metadata can satisfy the needs of all conceivable GIS applications (though FGDC metadata specification is quite comprehensive). In our system we choose a number of attributes that we think describe a particular dataset well enough to support most important queries and for which many sources provide the values. If a certain attribute should not be supported by a particular extractor, a default value has to be assumed.

Implementation of Extractors

One problem with extracting metadata from Web pages is that the different Web sites providing metadata differ widely in their structure and also in the metadata that they provide. This is in a way desirable as it provides a way of differentiating between the sites and retrieving information from a diverse range of information sources. However, since a standard way of describing metadata has not yet been developed, this means that specialized extractors are needed that would allow for retrieval of metadata to be used for attribute-based queries.

Even though extractors are designed to be as generic as possible, they might have to be changed if the Web site changes its structure. The change required can be minimal or can take considerable effort depending on the extent to which the Web site’s structure changes.

The extractors go to different Web sites, analyze the contents of the HTML pages, process the contents and return the relevant metadata in the form of XML. In the future, it is possible that with the advances in XML, the need for specialized extractors may become obsolete. It is also possible that in the future, extractors may be implemented as agents that are stand-alone versions on client or server sides.

ADEPT investigates information requests built upon the concept of an ISCAPE (Information Landscape).

Here is our ISCAPE working definition:

ISCAPE (Information Landscape) is a collection of semantically related information assets, along with the ways to analyze and visualize them, that facilitate learning about the Digital Earth. These information assets:

  • May be heterogeneous in syntax/format, structure, and media.
  • May represent integration or synthesis of multiple assets
  • May be obtained by different locations (web sites, repositories, databases) using a variety of query languages and information retrieval techniques and access methods.

Iscapes are designed to correlate information across the Internet. Iscapes are dynamic in that the information they lead to is not a single, hard-coded Web page but rather a collection of related datasets that is generated at runtime. It can therefore be considered an information request that can be expressed as a combination of keyword, attribute, and content based search. Below is a sample:

We will be using RDF (Resource Description Format) as the framework for constructing Iscapes. RDF uses XML as its underlying syntactic model.

The attribute search part of an Iscape is constructed by describing involved entities taken from different domain ontologies.

Design and implementation of ontologies

In Iscapes the terms that describe entities can be ambiguous - e.g. "cricket" means something to a cricket (game) fan as compared to a biology professor. In order to eliminate ambiguity, information beyond the names of the correlated terms is required. Ontologies provide such additional information.

Ontologies are used to describe not only the datasets we are interested in, but also the retrieved datasets, which are of a heterogeneous nature. One set of query may contain maps, real videos and Word Perfect documents. The agent (display agent) must be able to distinguish between the different dataset types to display them in an adequate manner and act properly when the user retrieves them. A classification of all possible datasets not only supports this behavior but has also additional advantages. Advanced attribute queries can be made such as "retrieve only images", and the results can be displayed in group of related dataset types. Multiple ontologies that are topic specific will be created to describe the whole model.

We have chosen the Resource Description Format (RDF) Schema for defining ontologies. A graphical ontology designer tool (already created by a third party) may be provided to the domain experts to design their own ontologies.

Agent Architecture

This project uses a multi-agent system to process the Iscape. Six agent types are involved in this task: the User Agent, one or more Broker Agents, an Ontology Agent, a Query-Planning Agent, and possibly many Resource Agents.