LSDIS > Projects > SemDis > TOntoGen

Semantic Discovery: Discovering Complex Relationships in Semantic Web

A NSF Medium ITR project

Test Ontology Generation Tool

Contact Person Matt Perry mperrycs.uga.edu

This tool can be used to generate large, high-quality data sets for testing semantic web applications. It has been implemented as a Protégé plugin. After creating an ontology schema in Protégé, the tool allows you to specify parameters controlling the relative distributions of instances of each class and property type in addition to the total number of class and property instances desired. The tool then generates an RDF instance graph with the specified characteristics.

Download:

Requirements:

  1. Protégé 3.0 or higher
  2. Java 1.5 or higher

Installation:

  1. Download and unzip the file "edu.uga.cs.lsdis.semdis.graphGeneration.plugin.zip"
  2. Copy the folder "edu.uga.cs.lsdis.semdis.graphGeneration.plugin" into the plugins directory of Protégé.
  3. In Protégé open your project and go to Project->Configure and select "GraphGenerator Tab". Now the tab should appear.

Usage:

Properties:

The parameters used for a given ontology can be saved to a file for quick loading later. These files should be named with the extension ".properties"

  1. An existing .properties file can be loaded with the "Load Existing Properties" button
  2. The current settings can be saved with the "Save Current Properties" button

General Parameters:

  1. Namespace specifies the namespace for the project, for example "http://lsids.cs.uga.edu/semdis/business"
  2. Abbreviation specifies the abbreviation for this namespace, for example "business"
  3. Number of Nodes specifies the total number of nodes to be generated
  4. Number of Edges specifies the total number of edges to be generated

Class Probability Parameters:

  1. An integer value is associated with each class. The corresponding probability for generating an instance of this class will be its value divided by the sum of the values for all such classes.

Property Probability Parameters:

Relationship Probabilities:

  1. An integer value is associated with each property type. The corresponding probability for generating an instance of this property will be its value divided by the sum of the values for all such properties.

Literal Probabilities:

  1. The type of a literal property can be either exact or probabilistic. If it is exact, exactly one instance of this literal property will be associated with each instance of the corresponding domain classes. If the type is probabilistic, instances of this literal property will be generated with integer values in the same way as it is for relationships.
  2. If "Use File" is selected, literal values will be taken from the file specified in the "File Location" field. This should be a text file with one value per line. If this is not selected, every instance will have the literal "literal value".

Data File Location:

  1. If "Use Disk" is selected, then the application will store the graph on disk as it is generated. This option should be used to generated very large graphs that will not fit completely in memory. "Temporary File Directory" specifies the directory in which to store the graph as it is being generated.
  2. "Generated Graph Location" specifies the location and name of the file in which to store the generated RDF graph. For example, "myGraph.rdf".

Generate Graph:

  1. Click the "Generate Graph" button to generate the graph. The progress of the application will be indicated by the progress bar in this panel.

Notes About Alpha Version:

  1. The application does not support adding the same "slot" to multiple classes.
  2. Changes made to the ontology will not be reflected in the GraphGenerator Tab. The project must be saved and then reopened or the GraphGenerator Tab removed and then added again to the Project->Configure menu, this is in the process of being fixed.
  3. For a given relationship, the participation of class instances in the relationship follows a uniform distribution. In the future, the tool will allow specification of other distributions.
  4. None of the class instances generated will be multi-classified into two or more classes, this functionality is also a planned addition.

This material is based upon work supported by the National Science Foundation under Grant No. IIS-0325464 titled "SemDis: Discovering Complex Relationships in Semantic Web". Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.