The objective of this assignment is to develop a search engine, based on the lectures
we have had on XML Parsing and Search engines.
We will use News ML Toolkit, from Reuters.
You will download NewsML feed from Reuters. Download them here.
The tasks are the following:
1. Put all the newl ML feeds in one folder
2. Index them, by crawling all documents in this folder by using the headline tag. You can use a parser to grab all headlines. Pull is recommended.
3. Use Lucene to perform the above indexing.
4. A command line application that would take in a search string as a parameter and return an html file name. This file should contain the search results ordered by time, the latest one first.
What to submit?
1. Source Code. Remember to submit java source code files. Also an ant script to compile.
2. Compiled class file.
3. A text document with all dependencies.
4. Dependency .jar files.
5. If your source code doesnt compile, but your class files work with the same classpath
settings, you will have to explain, how this happens?