With increasingly pervasive global information infrastructure, we continue to face more quantity, more distribution, more autonomy, and more heterogeneity among the accessible information, information sources, and users. We need to deal with more heterogeneous information consisting not only of a broader variety of digital data, but also operations and computations (such as simulations) that can create new data and information.
The scale of the problem has changed from a few databases to millions of information resources, and the new resources are added independently to the accessible set of resources, as other resources change rapidly or disappear. Currently favorite strategies that depend on keyword-based access or involve only representational or structural components of data are usually found to provide a poor quality of result, and their lack of precision leads to increasing information overload. We fully expect increasing standardization and interoperability at system, syntactic, and structural levels to address many issues. However, the key challenges to be faced are at the semantic level, where people would increasingly expect the information systems to help them not at the data level, but at the information, and increasingly knowledge levels. The InfoQuilt project utilizes the progress in system, syntactic and structural interoperability, while developing new solutions to achieve semantic interoperability. Its contributions so far have been in developing an information brokering architecture, specification and processing of media-independent information correlations (called MREFs), and developing technique to exploit multiple, pre-existing ontologies.
We believe that semantic interoperability is the key to progress towards our vision of Infocosm-- a society whose members will have information anywhere, any time, and in many forms, for knowledge creation and use, effective decision-making, better learning, and more fun.
InfoQuilt investigates three enablers and capabilities to achieve semantic interoperability:
Terminology (and language) transparency: This will allow a user to choose an ontology of his or her choice (e.g., one based on LCC for querying bibliographic data or FGDC for geospatial data), while allowing the information source to subscribe to a related but different ontology (e.g., an ontology based on DDC or UDK, respectively. The latter recognizes some overlap between geospatial data sets and environmental data sets, and their respective modeling).
Context-sensitive information processing: The information system will recognize or understand the context of an information need and use it to limit information overload, both by formulating more precise queries used for searching information sources and by filtering and transforming the information before presenting it to the user.
Semantic correlation: This allows the representation of semantically-related information regardless of distribution and heterogeneity (including various forms of media) by the user or the third party, and their use for obtaining all forms of relevant information anywhere. We have proposed the concept of Metadata REFerence link (MREF) to represent and support corresponding information processing.
The InfoQuilt system uses an information brokering architecture, which adapts and extends the concepts of (1) federated environments (Heimbigner and McLeod 1985; Sheth and Larson 1990) in which resources, metadata, and ontologies are created, administered, and enhanced independently; and (2) mediator architectures (Wiederhold 1992) which involve decoupling information creators and providers from information users and better semantic-level services and interoperability. Three key components of our approach are metadata (especially domain-specific and content-based metadata), contexts, and ontologies. We characterize InfoQuilt as the third general information integration system, or a second-generation global information system.
A vertical slice of InfoQuilt that supports video assets of any type anywhere has been implemented in the VideoAnywhere project.
The current InfoQuilt system builds upon our earlier work on MIDAS and OBSERVER.
MIDAS:Media-Independent DomAin Specific information correlation
The MIDAS system represents our early work on supporting correlation of information stored in image and structured data. The set of objects satisfying constraints on both the image and structured representations may be considered as a logical collection using the InfoHarness terminology.
OBSERVER:Ontology Based System Enhanced with Relationships for Vocabulary hEterogeneity Resolution
OBSERVER supports multiple pre-existing ontologies to access heterogeneous, distributed and independently developed data repositories. The content of each data repository is described by one or more ontologies expressed in a system based on Description Logics (DLs). Each data repository is viewed at the level of the relevant semantic concepts. Information requests in OBSERVER are specified using concepts in a domain ontology chosen by the user. OBSERVER users ontological inferences to determine relevant data repositories and translated DL expressions to the local query languages of the relevant data repositories. The query processing allows controlled expansion of a user's query to involve other ontologies using an extended set of relatonships including synonyms, homonyms and hypernyms. We have also addressed the crucial issue of estimating the possible loss of information when using relationships othr then synonyms.
Members of the InfoQuilt project include Amit Sheth, Vipul Kashyap, Kshitij Shah, Clemens Bertram, Krishnan Parasuraman, and Tarcisio Lima.
©2005 LSDIS and the University of Georgia. All rights reserved.