| java.lang.Object org.archive.crawler.extractor.Link
Link | public class Link implements Serializable(Code) | | Link represents one discovered "edge" of the web graph: the source
URI, the destination URI, and the type of reference (represented by the
context in which it was found).
As such, it is a suitably generic item to returned from generic
link-extraction utility code.
author: gojomo |
EMBED_HOP | final public static char EMBED_HOP(Code) | | embedded links necessary to render the page, like IMG/@SRC
|
EMBED_MISC | final public static String EMBED_MISC(Code) | | stand-in value for embeds without other context
|
JS_MISC | final public static String JS_MISC(Code) | | stand-in value for js-discovered urls without other context
|
NAVLINK_HOP | final public static char NAVLINK_HOP(Code) | | navigation links, like A/@HREF
|
NAVLINK_MISC | final public static String NAVLINK_MISC(Code) | | stand-in value for navlink urls without other context
|
PREREQ_HOP | final public static char PREREQ_HOP(Code) | | implied prerequisite links, like dns or robots
|
PREREQ_MISC | final public static String PREREQ_MISC(Code) | | stanf-in value for prerequisite without other context
|
REFER_HOP | final public static char REFER_HOP(Code) | | referral/redirect links, like header 'Location:' on a 301/302 response
|
SPECULATIVE_HOP | final public static char SPECULATIVE_HOP(Code) | | speculative/aggressively extracted links, perhaps embed or nav, as in javascript
|
SPECULATIVE_MISC | final public static String SPECULATIVE_MISC(Code) | | stand-in value for speculative/aggressively extracted urls without other context
|
Link | public Link(CharSequence source, CharSequence destination, CharSequence context, char hopType)(Code) | | Create a Link with the given fields.
Parameters: source - Parameters: destination - Parameters: context - Parameters: hopType - |
elementContext | public static CharSequence elementContext(CharSequence element, CharSequence attribute)(Code) | | Create a suitable XPath-like context from an element name and optional
attribute name.
Parameters: element - Parameters: attribute - CharSequence context |
getHopType | public char getHopType()(Code) | | char hopType |
|
|