| java.lang.Object java.lang.Thread bdd.search.spider.Indexer
Indexer | public class Indexer extends Thread (Code) | | Written by Tim Macinta 1997
Distributed under the GNU Public License
(a copy of which is enclosed with the source).
The Indexer is a thread which can index URLs that have been
cached using the URLStatus class. Use the queueURL() method
to add cached URLs to the Indexer's list of URLs. Once the
start() method is called, the Indexer will start processing
URLs in its queue. More URLs can also be added after calling
start, in fact this may be the best way to use the Indexer.
Calling the stopWhenDone() method will cause the Indexer
thread to stop as soon as its queue empties.
|
Constructor Summary | |
public | Indexer(File working_dir, Crawler crawler, EnginePrefs prefs) "working_dir" should be a directory that only this
Indexer and a given Cralwer will be
accessing. |
Method Summary | |
void | addNewURLs(LinkExtractor urls) Adds new URLs to the crawler's queue. | void | cleanUp() Removes all the ".db" and ".tmp" files in the directory "working_dir". | void | merge(File file1, File file2, File target) Takes two search databases, "file1" and "file2", and merges their
contents with the results being placed in "target". | void | mergeDatabases(File temporary) Repeatedly attempts to merge "temporary" with other temporary
databases which have been merged the same number of times. | void | pipe(InputStream in, OutputStream out) Pipes "in" to "out" until "in" is exhausted then closes "in". | public void | queueURL(URLStatus url) Use this method to add a cached url to the Indexer. | void | replaceMainIndex() Completes the merging of all temporary databases and replaces the
main database with the final product. | public void | run() This is where the actual indexing is done. | public void | start() Starts the Indexer. | public void | stopWhenDone(boolean exit_when_done) Causes this Indexer to stop whenever it finishes indexing the URLs
in its queue. |
exit_when_done | boolean exit_when_done(Code) | | |
total_bytes | long total_bytes(Code) | | |
Indexer | public Indexer(File working_dir, Crawler crawler, EnginePrefs prefs)(Code) | | "working_dir" should be a directory that only this
Indexer and a given Cralwer will be
accessing. This means that if several Indexers are running
simultaneously, they should all be given different "working_dir"
directories. Also, no other threads should write to this
directory (except for the selected Crawler).
|
cleanUp | void cleanUp()(Code) | | Removes all the ".db" and ".tmp" files in the directory "working_dir".
|
merge | void merge(File file1, File file2, File target) throws IOException(Code) | | Takes two search databases, "file1" and "file2", and merges their
contents with the results being placed in "target". "file2" must
exist, but "file1" need not. If "file1" does not exist then
"file2" is copied to "target".
|
mergeDatabases | void mergeDatabases(File temporary) throws IOException(Code) | | Repeatedly attempts to merge "temporary" with other temporary
databases which have been merged the same number of times. In other
words, this method will first try to merge "temporary" with any
databases that haven't been merged yet. If that is successful,
this database will then be merged with any databases that have been
merged once. If that is successful, this database will then be
merged with any databases that have been merged twice... and
so on and so forth.
Databases are named based on the number of times they have been
merged. E.g., a file called "6.db" will have been merged six times while
a file called "9.db" will have been merged nine times. It is assumed
that the "temporary" file has not been merged at all.
|
queueURL | public void queueURL(URLStatus url)(Code) | | Use this method to add a cached url to the Indexer.
|
replaceMainIndex | void replaceMainIndex() throws IOException(Code) | | Completes the merging of all temporary databases and replaces the
main database with the final product.
|
run | public void run()(Code) | | This is where the actual indexing is done.
|
start | public void start()(Code) | | Starts the Indexer.
|
stopWhenDone | public void stopWhenDone(boolean exit_when_done)(Code) | | Causes this Indexer to stop whenever it finishes indexing the URLs
in its queue.
|
|
|