| org.archive.util.SurtPrefixSet
SurtPrefixSet | public class SurtPrefixSet extends TreeSet (Code) | | Specialized TreeSet for keeping a set of String prefixes.
Redundant prefixes (those that are themselves prefixed
by other set entries) are eliminated.
author: gojomo |
containsPrefixOf | public boolean containsPrefixOf(String s)(Code) | | Test whether the given String is prefixed by one
of this set's entries.
Parameters: s - True if contains prefix. |
convertAllPrefixesToDomains | public void convertAllPrefixesToDomains()(Code) | | Changes all prefixes so that they only enforce a general
domain (allowing subdomains).For prefixes that don't include
a ')', no change is necessary. For others, truncate everything
from the ')' onward. Additionally, truncate off "www," if it
appears.
|
convertAllPrefixesToHosts | public void convertAllPrefixesToHosts()(Code) | | Changes all prefixes so that they enforce an exact host. For
prefixes that already include a ')', this means discarding
anything after ')' (path info). For prefixes that don't include
a ')' -- domain prefixes open to subdomains -- add the closing
')' (or ",)").
|
getCandidateSurt | public static String getCandidateSurt(Object object)(Code) | | Calculate the SURT form URI to use as a candidate against prefixes
from the given Object (CandidateURI or UURI)
Parameters: object - CandidateURI or UURI SURT form of URI for evaluation, or null if unavailable |
importFrom | public void importFrom(Reader r)(Code) | | Read a set of SURT prefixes from a reader source; keep sorted and
with redundant entries removed.
Parameters: r - reader over file of SURT_format strings throws: IOException - |
importFromMixed | public void importFromMixed(Reader r, boolean deduceFromSeeds)(Code) | | Import SURT prefixes from a reader with mixed URI and SURT prefix
format.
Parameters: r - the reader to import the prefixes from Parameters: deduceFromSeeds - true to also import SURT prefixes impliedfrom normal URIs/hostname seeds |
importFromUris | public void importFromUris(Reader r)(Code) | | Parameters: r - Where to read from. |
main | public static void main(String[] args) throws IOException(Code) | | Allow class to be used as a command-line tool for converting
URL lists (or naked host or host/path fragments implied
to be HTTP URLs) to implied SURT prefix form.
Read from stdin or first file argument. Writes to stdout.
Parameters: args - cmd-line arguments: may include input file throws: IOException - |
prefixFromPlain | public static String prefixFromPlain(String u)(Code) | | Given a plain URI or hostname/hostname+path, deduce an implied SURT
prefix from it. Results may be unpredictable on strings that cannot
be interpreted as URIs.
UURI 'fixup' is applied to the URI that is built.
Parameters: u - URI or almost-URI to consider implied SURT prefix form |
|
|