Web Of Belief (WOB) framework that maintains trust and provenance for SWETO. Presented by Li Ding, Pranam Kolari, Anupam Joshi, Timothy Finin, Yelena Yesha (University of Maryland at Baltimore County) at the Trust on the Web Track at Developers Day.

The following is (a copy of) the abstract of the presentation of SWETO at Developers Day [Developers Day Home, 2004 World Wide Web Conference]

The emergent Semantic Web community [SW] needs common infrastructure for evaluating new techniques and software which use machine processable data. Since ontologies are a centerpiece of most approaches, we believe that for evaluating and comparing tools for quality, scalability and performance, and for developing benchmarks for different classes of semantic technologies and applications, the Semantic Web community needs an open and freely available ontology with a large knowledge base (or description base) populated with real facts or data, reflecting real world heterogeneity of knowledge sources. If the use of tools is to be for advanced semantic applications, such as those in business intelligence and national security, then instances in the knowledge base should be highly interconnected. Thus, we present and describe a Semantic Web Technology evaluation Ontology (SWETO) test-bed [SWETO]. In particular, we address the requirements of a test-bed to support research in semantic analytics, as well as the steps in its development, including ontology creation, semi-automatic extraction, and entity disambiguation. SWETO has been developed as part of a NSF funded project using Freedom [Semagix], a commercial product from Semagix based in part on an earlier academic research [Sheth et al 2002], and is being made available openly for any non-commercial use.

Initially, SWETO was developed to be a large scale dataset for testing algorithms for discovery of semantic associations. The schema component of the ontology reflects the types of entities and relationships available explicitly (and implicitly) in Web sources. Given that we have available the use of Semagix Freedom, the selection of Web sources narrowed down to open, trusted sources, with metadata available having (semi-) structured layout for the viability of extraction and crawling. Essentially, with the Freedom toolkit, we created knowledge extractors by specifying regular expressions to extract entities from data sources. As the sources are 'scraped' and analyzed by the extractors, the extracted entities are stored in appropriate classes in an ontology. Given that we extracted semantic metadata from a variety of heterogeneous data sources, including Web pages, XML feed documents, intranet data repositories, etc., entity disambiguation is a crucial step. Freedom's disambiguation techniques were used for automatically resolving entity ambiguities in 99% of the cases, leaving less than 1% for human disambiguation (about 200 cases).

Given that SWETO is intended for ontology benchmark purposes, we continue to populate the ontology with diverse sources thereby extending it in multiple domains. Version is populated with well over 800,000 entities and over 1.5 million relationships, with the next larger release due out soon.  SWETO access is available through browsing, XML serialization, and will soon be available though a Web service. SWETO has been used internally (LSDIS Lab) for discovery and ranking of semantic associations. Externally, our collaborators at UMBC are exploring trust extensions for SWETO, whereas within industry applications, Semagix uses it for evaluating fast semantic metadata extraction and enhancement in Marianas SDK.

SWETO is an effort of the SemDIS team, with significant effort in using Freedom by Gowtham Sannapareddy. It is partially funded by NSF-ITR-IDM Award # 0325464 and NSF-ITR-IDM Award # 0219649.



This material is based upon work supported by the National Science Foundation under Grant No. IIS-0325464 titled "SemDis: Discovering Complex Relationships in Semantic Web". Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.