Nelson Rushton describes NewsExtract in his assignment number 1.
According to Rushton, the developer of this system, Erik T. Mueller, claims
that XMLNews is not sufficient.
XMLNews is a subset of NTIF a text standard using html to
provide some metadata.
The review explains why NewsML is not sufficient using an
example. It becomes clear that many details cannot be expressed in a machine
understandable way. There simply aren’t enough tags. NewsExtract provides an ontology
introducing a lot more concepts.
Some examples for that are given in the review.
According to the review NewsExtract also includes a natural
language parser. Thus the system also works which the data and not just the
metadata.
Looking at the web page, on
which Erik T. Mueller describes NewsExtract, it became clear to me that
NewsExtract uses the ontology to tag text files automatically using natural
language processing techniques.
Nelson Rushton mentions, that human proof reading of the automatically created file is still required, but that the author of the system claims that NewsExtract offers a great speed-up.