net.matuschek.spider |
|
Java Source File Name | Type | Comment |
DefaultRobotExceptionHandler.java | Class | |
HashedMemoryTaskList.java | Class | Memory based implementation of the TaskList interface. |
InterruptProcessingRobotExceptionHandler.java | Class | InterruptProcessingRobotExceptionHandler
This is a concrete ExceptionHandler, which interupts the processing robot
by throwing a runtime exception. |
MemoryTaskList.java | Class | Memory based implementation of the TaskList interface. |
NoRobots.java | Class | Implements the Robot Exclusion Standard.
The basic idea of the Robot Exclusion Standard is that each web server
can set up a single file called "/robots.txt" which contains pathnames
that robots should not look at. |
RegExpRule.java | Class | |
RegExpURLCheck.java | Class | This URLChecker checks a URL using a list of regular expressions
that should be allowed or denied. |
RobotExceptionHandler.java | Interface | |
RobotTask.java | Class | The RobotTask implements a simple object that represents a task
for the web robot. |
TaskList.java | Interface | The TaskList is a generic interface for a storage for RobotTask
objects. |
URLCheck.java | Interface | This interface defines a simple method that allows another
object (usually this will be a WebRobot) to test, if a given
URL is acceptable (e.g. |
WebRobot.java | Class | |
WebRobotCallback.java | Interface | |