Process Engineer Toolkit >
User's Guide >
Tools Reference >
Search Engine >
Overview
SearchEngine: OverviewTopics
This chapter explains the workings of the SearchEngine, and the iterative process of generating, and later, regenerating the final applet word database. The
purpose of the SearchEngine
The SearchEngine reads one or more HTML files, parses the words within
the markup tags, and then parses all linked HTML files. Each word
is checked for word removal and word reduction, and the resulting word list for
the HTML file is stored internally. When all the linked HTML
files have been parsed, the word database is constructed, together with the applet
tag for the HTML applet search page.
Though in theory this could be achieved the first time 'round the buoy', in practice, it is usually an iterative process. When compiling a new database, the parser may signal HTML syntax errors, which you may want to correct. There may be some non-text files linked, which the SearchEngine should be told not to parse, or sections of linked HTML documents which should be excluded. Finally, there may be filenames, acronyms, or other words which you may not wish to have appear in the database. The command line application performs the function above by typing:
java ruptools.SearchEngine -r search.response
-gw search The
SearchEngine options
The SearchEngine has a rather lengthy, but necessary list of options:
Options are separated by white space, so if you have a filename, or URL which contains a white space character, you must place that parameter in double quotes:
Dependency options
The resulting dependency list can be output to a file using:
The intermediate parsed data files are stored in the directory specified by:
if this argument is not specified the current working directory is used. These options are further explained in the chapter Building the dependency list. Word elimination options
The resulting word list can be output to a file using:
These options are further explained in the chapter Eliminating words. Applet generation options
The option are explained in the chapter Building the applet database. Since the SearchEngine acts on a series of options, these options can be placed for commodity, in one or more text files. In addition to reducing keystrokes, these files can also contain comments. The following is an extract from the response file used to build the database for this manual:
The SearchEngine parses a response file, ignoring all lines which do not begin with a hyphen as the first non-white space character. Any valid SearchEngine option can appear in a response file, invalid or illegal options produce an error message. The -r filename option itself can also appear in a response file, so that, for example, you can create standard dependency or word file exclusion filters, which can be used to generate multiple databases. Each option and its associated parameters must appear on a single, separate line of the response file. The SearchEngine can generate several output files, as well as HTML syntax error messages to the standard output device.Command line errorsThe dependency listThe word listThe applet HTML file |
![]() |
Rational Unified
Process
|