LSDIS > Projects > Past Projects > InfoHarness & VisualHarness

InfoHarness and VisualHarness

InfoHarness (1993-1996)

Enormous amounts of heterogeneous information have been accumulated within organizations. It has become quite easy to create new information, but the knowledge about the existence, location, and means of retrieval of information has become confusing and is a significant deterrent to effective utilization of available data. The InfoHarness system is aimed at providing rapid access to huge amounts of heterogeneous information from the World-Wide Web browsers, without reformatting, restructuring, or relocating  heterogeneous data.  Thus InfoHarness provides uniform access to files and textual/semi-structured data of a very large variety (e.g., newsgroups, MS-WORD, Frame, postscripts, source code, AP news, etc.).  The data may be stored in a repository or is accessible on-line (e.g., NNTP server).

Some of the research in this project was commercialized as the AdaptX Harness platform, system and related services from Bellcore (Bell Communications Research Inc.)  InfoHarness is a trademark of Bellcore.

Capabilities of the InfoHarness System 

  • Use of multiple third party indexing engines.
  • Logical structuring of the information space without restructuring, reformatting or relocating the original information enabling access to information by logical units of interest. The ability to use independent indexing technologies
  • Attribute-based access to document collections.
  • Scalability by combining results from two independently indexed collections to enhance scalability of content-based querying. 
  • Scalability through use of multiple InfoHarness servers, and CORBA based access of data accessible through remove InfoHarness server.

One private industry funded (hence unpublished) extension of the InfoHarness system has been to provide Web-based browsing, querying and logical restructuring of data stored in relational databases, in addition to the access to heterogeneous textual data sources.

VisualHarness (1995-1997)

The VisualHarness system adds ability to access image data repositories to the InfoHarness system. It consists of extending the InfoHarness system with ZEBRA image access systems. The ZEBRA system demonstrates three features.

  • It is customizable: ZEBRA supports keyword, attribute and content based searching, where the user can specify relative weights of the three search strategies as well as features within the content (e.g., image features of color, composition, texture, structure).
  • It is extensible: New metadata can be added and the corresponding access can be supported. Also, different Visual Information Retrieval (VIR) engines and third party indexing technologies to be hooked into the system 
  • It is federated: It supports access to distributed, autonomous and heterogeneous data repositories. 
A key idea behind the ZEBRA system is to form a combination of these three  different access strategies to achieve better quality results. One interesting technique developed is the black box approach for extracting the required information from the VIR engine in the form we could access (distance metrics), without knowing the internals of the VIR engine. We have shown the validity of this black box approach by comparing the results obtained with this approach against the results obtained from a VIR engine.

jHarness (1998)

This is a Java-based version of the ZEBRA system with our own server (replacing Bellcore's InfoHarness server).

Additional Information

Research on InfoHarness and VsualHarness at the LSDIS has been funded in part by the Massive Digital Data Systems initiative as part of the project InfoHarness: A System for Scalable Search of Heterogeneous Information

The InfoHarness team members include Yana Kane-Esrig (Bellcore), Vipul Kashyap, William Leblanc, Srilekha Mudumbai, Kshitij Shah, Amit Sheth (PI), Leon Shklar (Bellcore), and Satish Thatte (Proj. Mgr., Bellcore). Haym Hirsh (Rutgers University) was a consultant for the project.

The VisualHarness team members include Srilekha Mudumbai, Krishnan Parasuraman, Kshitij Shah and Amit Sheth (PI).

jHarness contributors are Krishnan Parasuraman, Amit Sheth and Kshitij Shah.