Research
My dissertation research investigated the problem of enabling link-based analysis in semantic graph databases.It focused on developing solutions to three key problems. How can link analysis queries be expressed on Semantic Web databases? What type of disk based data storage model will support efficient evaluation of link analysis queries? How can the most important (top-K) answers to such queries be identified and efficiently computed?
Title: Supporting Link Analysis Using Advanced Querying Methods In Semantic Web Databases
ABSTRACT Presentation
There is an increasing demand for technologies that can help organizations unearth actionable knowledge from their data assets. This demand continues to drive the flurry of activities in data mining research where the emphasis is on technologies that can identify patterns in data. However, in addition to the patterns view of data, other data and knowledge perspectives are required to support the broad range of complex analytical tasks found in contemporary applications. For example, in some applications in homeland security, bioinformatics, business and other investigative domains many tasks are focused on connecting the dots. For this genre of applications, support for identifying, revealing and analyzing links or relationships between groups of entities (link analysis) is crucial. Currently, mainstream database systems do not provide support for such analyses and current solutions rely on exporting their data from their databases into custom applications to be analyzed. This has the disadvantage of additional overhead and precludes the ability to exploit other mature technologies offered by todays database systems. This thesis argues for database support for link analysis by providing an appropriate interpretation for such information requests in a graph database model. It addresses several key database issues with respect to supporting such queries. First, it identifies a number of querying constructs that are crucial to supporting linking analysis applications and proposes a formal query language called SPARQ2L that allows their expression. A formal semantics and characterization of the computational complexity of SPARQ2Ls query constructs is also presented. Second, it proposes a database storage model that supports efficient processing of queries while being tolerant of data persistence. The storage model combines a graph linearization strategy rooted in algebraic techniques for solving path problems with a set of heuristics for node and edge clustering that aims to minimize external path lengths. Third, it proposes a novel relevance model SemRank which exploits the machine processible semantics of data in ascribing relative importance to query results and offers a flexible or modulative ranking model enabling serendipitous knowledge discovery.INDEX WORDS: Link Analysis, Semantic Associations, Semantic Web Databases, Semantic Query Languages, RDF, SPARQL, SPARQ2L