org.archive.crawler.selftest package
Provides the client-side aspect of the heritrix integration self test.
The selftest webapp is the repository for the serverside of the
intergration test. The integration self test is run from the command
line. Invocation makes the crawler go up against itself trawling the
selftest webapp. When done, the product -- arc and log files -- are
analyzed by code herein to verify test pass or fail. The integration
self test is the aggregation of multiple individual tests each testing a
particular crawler aspect. For example, the Robots test validates
the crawler's parse of robots.txt. Each test comprises a directory
under the selftest webapp named for the test into which we put the
server pages that express the scenario to test, and a class from this
package named for test webapp directory w/ a SelfTest suffix.
The selftest class verifies test success. Each selftest class subclasses
org.archive.crawler.selftest.SelfTestCase which is itself
a subclass of org.junit.TestCase ). All tests need to be
registered with the {@link org.archive.crawler.selftest.AllSelfTestCases}
class and must live in the org.archive.crawler.selftest package. The class
{@link org.archive.crawler.selftest.SelfTestCrawlJobHandler}
manages the running of selftest.
Run one test only by passing its name as the option value to the
selftest argument.
The first crop of self tests are
derived from tests developed by Parker Thompson < pt at archive dot org
>. See Tests.
These tests in turn look to have been derived from Testing Search Indexing
Systems1. Adding a Self Test TODO Related
Documentation TODO
|