org.archive.crawler.datamodel |
|
Java Source File Name | Type | Comment |
CandidateURI.java | Class | A URI, discovered or passed-in, that may be scheduled.
When scheduled, a CandidateURI becomes a
CrawlURI made with the data contained herein. |
CandidateURITest.java | Class | Test CandidateURI serialization. |
Checkpoint.java | Class | Record of a specific checkpoint on disk. |
CoreAttributeConstants.java | Interface | CrawlURI attribute keys used by the core crawler
classes. |
CrawlHost.java | Class | Represents a single remote "host".
An host is a name for which there is a dns record or an IP-address. |
CrawlOrder.java | Class | Represents the 'root' of the settings hierarchy. |
CrawlServer.java | Class | Represents a single remote "server".
A server is a service on a host. |
CrawlSubstats.java | Class | Collector of statististics for a 'subset' of a crawl,
such as a server (host:port), host, or frontier group
(eg queue). |
CrawlURI.java | Class | Represents a candidate URI and the associated state it
collects as it is crawled.
Core state is in instance variables but a flexible
attribute list is also available. |
CrawlURITest.java | Class | |
CredentialStore.java | Class | Front door to the credential store. |
CredentialStoreTest.java | Class | Test add, edit, delete from credential store. |
FetchStatusCodes.java | Interface | Constant flag codes to be used, in lieu of per-protocol
codes (like HTTP's 200, 404, etc.), when network/internal/
out-of-band conditions occur. |
InstancePerThread.java | Interface | |
LocalizedError.java | Class | |
RobotsExclusionPolicy.java | Class | RobotsExclusionPolicy represents the actual policy adopted with
respect to a specific remote server, usually constructed from
consulting the robots.txt, if any, the server provided. |
RobotsHonoringPolicy.java | Class | RobotsHonoringPolicy represent the strategy used by the crawler
for determining how robots.txt files will be honored. |
Robotstxt.java | Class | Utility class for parsing 'robots.txt' format directives, into a list
of named user-agents and map from user-agents to disallowed paths. |
RobotstxtTest.java | Class | |
ServerCache.java | Class | Server and Host cache. |
ServerCacheTest.java | Class | |
UriUniqFilter.java | Interface | A UriUniqFilter passes URI objects to a destination
(receiver) if the passed URI object has not been previously seen. |