| org.archive.crawler.datamodel.UriUniqFilter
All known Subclasses: org.archive.crawler.util.SetBasedUriUniqFilter, org.archive.crawler.util.BdbUriUniqFilterTest, org.archive.crawler.util.BloomUriUniqFilterTest, org.archive.crawler.util.BenchmarkUriUniqFilters, org.archive.crawler.util.FPUriUniqFilterTest, org.archive.crawler.util.FPMergeUriUniqFilter,
UriUniqFilter | public interface UriUniqFilter (Code) | | A UriUniqFilter passes URI objects to a destination
(receiver) if the passed URI object has not been previously seen.
If already seen, the passed URI object is dropped.
For efficiency in comparison against a large history of
seen URIs, URI objects may not be passed immediately, unless
the addNow() is used or a flush() is forced.
author: gojomo version: $Date: 2005-12-16 03:10:54 +0000 (Fri, 16 Dec 2005) $, $Revision: 4036 $ |
Inner Class :public interface HasUriReceiver | |
Method Summary | |
public void | add(String key, CandidateURI value) Add given uri, if not already present. | public void | addForce(String key, CandidateURI value) Add given uri, all the way through to underlying destination, even
if already present.
(Sometimes a URI must be fetched, or refetched, for example when
DNS or robots info expires or the operator forces a refetch. | public void | addNow(String key, CandidateURI value) Immediately add uri. | public void | close() Close down any allocated resources. | public long | count() | public void | forget(String key, CandidateURI value) | public void | note(String key) Note item as seen, without passing through to receiver. | public long | pending() Count of items added, but not yet filtered in or out. | public long | requestFlush() Request that any pending items be added/dropped. | public void | setDestination(HasUriReceiver receiver) Receiver of uniq URIs.
Items that have not been seen before are pass through to this object.
Parameters: receiver - Object that will be passed items. | public void | setProfileLog(File logfile) Set a File to receive a log for replay profiling. |
add | public void add(String key, CandidateURI value)(Code) | | Add given uri, if not already present.
Parameters: key - Usually a canonicalized version of value .This is the key used doing lookups, forgets and insertions on thealready included list. Parameters: value - item to add. |
addForce | public void addForce(String key, CandidateURI value)(Code) | | Add given uri, all the way through to underlying destination, even
if already present.
(Sometimes a URI must be fetched, or refetched, for example when
DNS or robots info expires or the operator forces a refetch. A
normal add() or addNow() would drop the URI without forwarding
on once it is determmined to already be in the filter.)
Parameters: key - Usually a canonicalized version of uri .This is the key used doing lookups, forgets and insertions on thealready included list. Parameters: value - item to add. |
addNow | public void addNow(String key, CandidateURI value)(Code) | | Immediately add uri.
Parameters: key - Usually a canonicalized version of uri .This is the key used doing lookups, forgets and insertions on thealready included list. Parameters: value - item to add. |
close | public void close()(Code) | | Close down any allocated resources.
Makes sense calling this when checkpointing.
|
count | public long count()(Code) | | Count of already seen URIs. |
forget | public void forget(String key, CandidateURI value)(Code) | | Forget item was seen
Parameters: key - Usually a canonicalized version of an URI .This is the key used doing lookups, forgets and insertions on thealready included list. Parameters: value - item to add. |
note | public void note(String key)(Code) | | Note item as seen, without passing through to receiver.
Parameters: key - Usually a canonicalized version of an URI .This is the key used doing lookups, forgets and insertions on thealready included list. |
pending | public long pending()(Code) | | Count of items added, but not yet filtered in or out.
Some implementations may buffer up large numbers of pending
items to be evaluated in a later large batch/scan/merge with
disk files.
Count of items added not yet evaluated |
requestFlush | public long requestFlush()(Code) | | Request that any pending items be added/dropped. Implementors
may ignore the request if a flush would be too expensive/too
soon.
Number added. |
setDestination | public void setDestination(HasUriReceiver receiver)(Code) | | Receiver of uniq URIs.
Items that have not been seen before are pass through to this object.
Parameters: receiver - Object that will be passed items. Must implementHasUriReceiver interface. |
setProfileLog | public void setProfileLog(File logfile)(Code) | | Set a File to receive a log for replay profiling.
|
|
|