| java.lang.Object org.archive.util.SURT
SURT | public class SURT (Code) | | Sort-friendly URI Reordering Transform.
Converts URIs of the form:
scheme://userinfo@domain.tld:port/path?query#fragment
...into...
scheme://(tld,domain,:port@userinfo)/path?query#fragment
The '(' ')' characters serve as an unambiguous notice that the so-called
'authority' portion of the URI ([userinfo@]host[:port] in http URIs) has
been transformed; the commas prevent confusion with regular hostnames.
This remedies the 'problem' with standard URIs that the host portion of a
regular URI, with its dotted-domains, is actually in reverse order from
the natural hierarchy that's usually helpful for grouping and sorting.
The value of respecting URI case variance is considered negligible: it
is vanishingly rare for case-variance to be meaningful, while URI case-
variance often arises from people's confusion or sloppiness, and they
only correct it insofar as necessary to avoid blatant problems. Thus
the usual SURT form is considered to be flattened to all lowercase, and
not completely reversible.
author: gojomo |
Method Summary | |
public static String | fromURI(String s) Utility method for creating the SURT form of the URI in the
given String.
By default, does not preserve casing. | public static String | fromURI(String s, boolean preserveCase) Utility method for creating the SURT form of the URI in the
given String. | public static void | main(String[] args) Allow class to be used as a command-line tool for converting
URL lists (or naked host or host/path fragments implied
to be HTTP URLs) to SURT form. |
BEGIN_TRANSFORMED_AUTHORITY | static String BEGIN_TRANSFORMED_AUTHORITY(Code) | | |
END_TRANSFORMED_AUTHORITY | static String END_TRANSFORMED_AUTHORITY(Code) | | |
TRANSFORMED_HOST_DELIM | static String TRANSFORMED_HOST_DELIM(Code) | | |
fromURI | public static String fromURI(String s)(Code) | | Utility method for creating the SURT form of the URI in the
given String.
By default, does not preserve casing.
Parameters: s - String URI to be converted to SURT form SURT form |
fromURI | public static String fromURI(String s, boolean preserveCase)(Code) | | Utility method for creating the SURT form of the URI in the
given String.
If it appears a bit convoluted in its approach, note that it was
optimized to minimize object-creation after allocation-sites profiling
indicated this method was a top source of garbage in long-running crawls.
Assumes that the String URI has already been cleaned/fixed (eg
by UURI fixup) in ways that put it in its crawlable form for
evaluation.
Parameters: s - String URI to be converted to SURT form Parameters: preserveCase - whether original case should be preserved SURT form |
main | public static void main(String[] args) throws IOException(Code) | | Allow class to be used as a command-line tool for converting
URL lists (or naked host or host/path fragments implied
to be HTTP URLs) to SURT form. Lines that cannot be converted
are returned unchanged.
Read from stdin or first file argument. Writes to stdout or
second argument filename
Parameters: args - cmd-line arguments throws: IOException - |
|
|