Semantic Discovery: Discovering Complex Relationships in Semantic Web
A NSF Medium ITR project
Detecting Conflict of Interest (COI) using Semantic Associations
Collaborative work UGA & UMBC
Description:
The goal is to detect
potential conflict of interest by means
of analysis on semantic associations.
Using a subset of DBLP and FOAF
Take One:
We used a subset of DBLP and a subset of FOAF. Both networks were integrated with an algorithm for entity reconciliation
Live demo of Conflict of Interest Detection
Take Two:
We demonstrated scalability by using all of DBLP data and a much larger FOAF dataset (1 order of magnitude larger).
We improved the COI detection algorithm by using more robust collaboration strength measures and by considering more relationships (e.g., same-affiliation, co-editorship)
Data Sources:
-
DBLP data is used by means of SwetoDblp, which is an RDF version of DBLP data that incorporates additional relationships such as affiliation and publisher.
We used the March-2007 version of SwetoDblp (over 500K person entities, over 800K publication entities)
-
FOAF data comes from the crawled collection of Swoogle.
We filtered data from Swoogle by incrementally expanding upon person names matching those of a subset of DBLP.
The resulting FOAF dataset we used is available
(show/hide list).
However, we removed
foaf:mbox values to avoid making plain email addresses readily available to spammers
(in few cases where an email was used as URI, we removed part of the email's domain name)
Source Code:
The source code is in Java.
We used the Java-bindings of BRAHMS to load all the files (about 1GB).
We claim that scalability is possible by using an average laptop (and probably the first to use BRAHMS in OSX).
Earlier prototyping was done using main-memory implementation of SemDis API. The change to BRAHMS was quite easy because its Java-bindings implement such API.
The source code is
available (show/hide)
-
Code for COI detection, zipped and organized as an
ant project:
coicode.zip
-
Code for Entity Disambiguation, will be posted here shortly
The main-memory implementation of SemDis API uses Jena's ARP (RDF Parser).
Hence, some jar files are required and should be obtained from their respective distributions as indicated in
jars-list (show/hide)
-
Brahms_bindings_semdisAPI_v0_3.jar - comes in Brahms Java-bindings distribution
-
commons-lang-2.1.jar - from apache's commons-lang
-
commons-logging.jar - from apache
-
icu4j_3_0.jar - ICU4J v3.0 from IBM
-
jena_v2_3.jar - Jena's jena.jar version 2.3 (yes, we renamed it)
-
rhosearchAPI_v2.jar - our own interfaces for rho-search
-
samjanik.jar - our own implementation of a rho-search algorithm (provided by Maciej Janik), available as an ant project (zipped)
-
semdisAPI_v0_3.jar - from SemDis API
-
semdisImpl_v0_6.jar - from SemDis API (version 0.5 also ok)
-
xercesImpl.jar - from xerces
Evaluation Datasets:
Our evaluation datsets consists of sets of accepted papers in several conference tracks (of WWW2006) and their respective Program Committee members.
We ran our COI detection over these and manually verified a sample of the results to adjust our method.
There were relatively few relationships passing through the foaf part of the dataset and then back to DBLP entities.
Hence, we took a sample of 200 foaf:Person entities that have at least one foaf:knows relationship to verify that the detection of COI worked properly with FOAF data.
These datasets are available in
this list (show/hide)
Note: All tracks are from the 2006 World Wide Web Conference, which is one of the ones that separates Program Committee (PC) members across tracks
Publications:
Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection
(15th International World Wide Web Conference, Edinburgh, Scotland, May 23-26, 2006)
The contact person for details/problems/questions/etc on this page is
Boanerges Aleman-Meza (baleman uga.edu)
|