Matt Perry mperrycs.uga.edu
This tool can be used to generate large, high-quality
data sets for testing semantic web applications. It has been
implemented as a Protégé plugin. After creating an ontology schema
in Protégé, the tool allows you to specify parameters controlling
the relative distributions of instances of each class and property
type in addition to the total number of class and property instances
desired. The tool then generates an RDF instance graph with the
- Protégé 3.0 or higher
- Java 1.5 or higher
- Download and unzip the file "edu.uga.cs.lsdis.semdis.graphGeneration.plugin.zip"
- Copy the folder "edu.uga.cs.lsdis.semdis.graphGeneration.plugin"
into the plugins directory of Protégé.
- In Protégé open your project and go to Project->Configure and
select "GraphGenerator Tab". Now the tab should appear.
The parameters used for a given ontology can be saved to a file
for quick loading later. These files should be named with the
- An existing .properties file can be loaded with the "Load
Existing Properties" button
- The current settings can be saved with the "Save Current
- Namespace specifies the namespace for the project, for example
- Abbreviation specifies the abbreviation for this namespace,
for example "business"
- Number of Nodes specifies the total number of nodes to be
- Number of Edges specifies the total number of edges to be
Class Probability Parameters:
- An integer value is associated with each class. The
corresponding probability for generating an instance of this class
will be its value divided by the sum of the values for all such
Property Probability Parameters:
- An integer value is associated with each property type. The
corresponding probability for generating an instance of this
property will be its value divided by the sum of the values for
all such properties.
- The type of a literal property can be either exact or
probabilistic. If it is exact, exactly one instance of this
literal property will be associated with each instance of the
corresponding domain classes. If the type is probabilistic,
instances of this literal property will be generated with integer
values in the same way as it is for relationships.
- If "Use File" is selected, literal values will be taken from
the file specified in the "File Location" field. This should be a
text file with one value per line. If this is not selected, every
instance will have the literal "literal value".
Data File Location:
- If "Use Disk" is selected, then the application will store the
graph on disk as it is generated. This option should be used to
generated very large graphs that will not fit completely in
memory. "Temporary File Directory" specifies the directory in
which to store the graph as it is being generated.
- "Generated Graph Location" specifies the location and name of
the file in which to store the generated RDF graph. For example, "myGraph.rdf".
- Click the "Generate Graph" button to generate the graph. The
progress of the application will be indicated by the progress bar
in this panel.
Notes About Alpha Version:
- The application does not support adding the same "slot" to
- Changes made to the ontology will not be reflected in the
GraphGenerator Tab. The project must be saved and then reopened or
the GraphGenerator Tab removed and then added again to the
Project->Configure menu, this is in the process of being fixed.
- For a given relationship, the participation of class instances
in the relationship follows a uniform distribution. In the future,
the tool will allow specification of other distributions.
- None of the class instances generated will be multi-classified
into two or more classes, this functionality is also a planned