| java.lang.Object websphinx.Region websphinx.Element websphinx.Link
All known Subclasses: websphinx.FormButton, websphinx.Form,
Field Summary | |
final public static int | GET Use the HTTP GET method to download this link. | final public static int | POST Use the HTTP POST method to access this link. | protected URL | url |
Constructor Summary | |
public | Link(Tag startTag, Tag endTag, URL base) Make a Link from a start tag and end tag and a base URL (for relative references). | public | Link(URL url) Make a Link from a URL. | public | Link(File file) Make a Link from a File. | public | Link(String href) Make a Link from a string URL. |
Method Summary | |
public static URL | FileToURL(File file) Convert a local filename to a URL. | public static File | URLToFile(URL url) Convert a file: URL to a filename appropriate to the
current system platform. | public void | discardContent() Eliminate all references to page content. | public void | disconnect() Disconnect this link from its downloaded page (throwing away the page). | public int | getDepth() Get depth of link in crawl. | public String | getDirectory() Get the directory part of the link, like "/home/dir/". | public URL | getDirectoryURL() Get the URL of a page's directory. | public static URL | getDirectoryURL(URL url) Get the URL of a page's directory. | public DownloadParameters | getDownloadParameters() Get the download parameters used for this link. | public String | getFile() Get the filename part of the link, which includes the pathname
and query but not the anchor reference. | public String | getFilename() Get the filename part of the link, like "index.html". | public String | getHost() Get the hostname of the link, like "www.cs.cmu.edu". | public int | getMethod() Get the method used to access this link. | public Page | getPage() Get the downloaded page to which the link points. | public URL | getPageURL() Get the URL of a page, omitting any anchor reference (like #ref). | public static URL | getPageURL(URL url) Get the URL of a page, omitting any anchor reference (like #ref). | public URL | getParentURL() Get the URL of a page's parent directory. | public static URL | getParentURL(URL url) Get the URL of a page's parent directory. | public int | getPort() Get the port number of the link. | public float | getPriority() Get the priority of the link in the crawl. | public String | getProtocol() Get the network protocol of the link, like "ftp" or "http". | public String | getQuery() Get the query part of the link. | public String | getRef() Get the anchor reference of the link, like "#ref". | public URL | getServiceURL() Get the URL of a Web service, omitting any query or anchor reference. | public static URL | getServiceURL(URL url) Get the URL of a Web service, omitting any query or anchor reference. | public int | getStatus() Get the status of the link. | public URL | getURL() Get the URL. | public static String | relativeTo(URL here, URL there) | public static String | relativeTo(URL here, String there) | public Tag | replaceHref(String newHref) Copy the link's start tag, replacing the URL. | public void | setDownloadParameters(DownloadParameters dp) Set the download parameters used for this link. | public void | setPage(Page page) Set the page corresponding to this link. | public void | setPriority(float priority) Set the priority of the link in the crawl. | public void | setStatus(int event) Set the status of the link. | public void | setText(String text) Set the tagless-text representation of this region. | public String | toDescription() Generate a human-readable description of the link. | public String | toText() Convert the region to tagless text. | public String | toURL() | public static String | toURLDelimiters(String path) | protected URL | urlFromHref(Tag tag, URL base) Construct the URL for a link element, from its start tag and a base URL (for relative references). |
GET | final public static int GET(Code) | | Use the HTTP GET method to download this link.
|
POST | final public static int POST(Code) | | Use the HTTP POST method to access this link.
|
Link | public Link(Tag startTag, Tag endTag, URL base) throws MalformedURLException(Code) | | Make a Link from a start tag and end tag and a base URL (for relative references).
The tags must be on the same page.
Parameters: startTag - Start tag of element Parameters: endTag - End tag of element Parameters: base - Base URL used for relative references |
Link | public Link(URL url)(Code) | | Make a Link from a URL.
|
FileToURL | public static URL FileToURL(File file) throws MalformedURLException(Code) | | Convert a local filename to a URL.
For example, if the filename is "C:\FOO\BAR\BAZ",
the resulting URL is "file:/C:/FOO/BAR/BAZ".
Parameters: file - File to convert URL corresponding to file |
URLToFile | public static File URLToFile(URL url) throws MalformedURLException(Code) | | Convert a file: URL to a filename appropriate to the
current system platform. For example, on MS Windows,
if the URL is "file:/FOO/BAR/BAZ", the resulting
filename is "\FOO\BAR\BAZ".
Parameters: url - URL to convert File corresponding to url exception: MalformedURLException - if url is not afile: URL. |
discardContent | public void discardContent()(Code) | | Eliminate all references to page content.
|
disconnect | public void disconnect()(Code) | | Disconnect this link from its downloaded page (throwing away the page).
|
getDepth | public int getDepth()(Code) | | Get depth of link in crawl.
depth of link from root (depth of roots is 0) |
getDirectory | public String getDirectory()(Code) | | Get the directory part of the link, like "/home/dir/".
Always starts and ends with '/'.
the directory portion of the link's URL |
getDirectoryURL | public URL getDirectoryURL()(Code) | | Get the URL of a page's directory.
the URL sans filename, query and anchor reference |
getDirectoryURL | public static URL getDirectoryURL(URL url)(Code) | | Get the URL of a page's directory.
the URL sans filename, query and anchor reference |
getDownloadParameters | public DownloadParameters getDownloadParameters()(Code) | | Get the download parameters used for this link. Default is null.
|
getFile | public String getFile()(Code) | | Get the filename part of the link, which includes the pathname
and query but not the anchor reference.
Equivalent to getURL().getFile().
the filename portion of the link's URL |
getFilename | public String getFilename()(Code) | | Get the filename part of the link, like "index.html".
Never contains '/'; may be the empty string.
the filename portion of the link's URL |
getHost | public String getHost()(Code) | | Get the hostname of the link, like "www.cs.cmu.edu".
the hostname portion of the link's URL |
getMethod | public int getMethod()(Code) | | Get the method used to access this link.
GET or POST. |
getPage | public Page getPage()(Code) | | Get the downloaded page to which the link points.
the Page object, or null if the page hasn't been downloaded. |
getPageURL | public URL getPageURL()(Code) | | Get the URL of a page, omitting any anchor reference (like #ref).
the URL sans anchor reference |
getPageURL | public static URL getPageURL(URL url)(Code) | | Get the URL of a page, omitting any anchor reference (like #ref).
the URL sans anchor reference |
getParentURL | public URL getParentURL()(Code) | | Get the URL of a page's parent directory.
the URL sans filename, query and anchor reference |
getParentURL | public static URL getParentURL(URL url)(Code) | | Get the URL of a page's parent directory.
the URL sans filename, query and anchor reference |
getPort | public int getPort()(Code) | | Get the port number of the link.
the port number of the link's URL, or -1 if no port numberis explicitly specified in the URL |
getPriority | public float getPriority()(Code) | | Get the priority of the link in the crawl.
|
getProtocol | public String getProtocol()(Code) | | Get the network protocol of the link, like "ftp" or "http".
the protocol portion of the link's URL |
getQuery | public String getQuery()(Code) | | Get the query part of the link.
Either starts with a '?', or is empty.
the query portion of the link's URL |
getRef | public String getRef()(Code) | | Get the anchor reference of the link, like "#ref".
Either starts with '#', or is empty.
the anchor reference portion of the link's URL |
getServiceURL | public URL getServiceURL()(Code) | | Get the URL of a Web service, omitting any query or anchor reference.
the URL sans query and anchor reference |
getServiceURL | public static URL getServiceURL(URL url)(Code) | | Get the URL of a Web service, omitting any query or anchor reference.
the URL sans query and anchor reference |
getStatus | public int getStatus()(Code) | | Get the status of the link. Possible values are defined in LinkEvent.
last event that happened to this link |
getURL | public URL getURL()(Code) | | Get the URL.
the URL of the link |
replaceHref | public Tag replaceHref(String newHref)(Code) | | Copy the link's start tag, replacing the URL. Note that the name of the attribute containing the URL
varies from tag to tag: sometimes it is called HREF, sometimes SRC, sometimes CODE, etc.
This method changes the appropriate attribute for this tag.
Parameters: newHref - New URL or relative reference; e.g. "http://www.cs.cmu.edu/" or "/foo/index.html". copy of this link's start tag with its URL attribute replaced. The copy is a region of a fresh page containing only the tag. |
setDownloadParameters | public void setDownloadParameters(DownloadParameters dp)(Code) | | Set the download parameters used for this link.
|
setPage | public void setPage(Page page)(Code) | | Set the page corresponding to this link.
Parameters: page - Page to which this link points |
setPriority | public void setPriority(float priority)(Code) | | Set the priority of the link in the crawl.
|
setStatus | public void setStatus(int event)(Code) | | Set the status of the link. Possible values are defined in LinkEvent.
Parameters: event - the event that just happened to this link |
setText | public void setText(String text)(Code) | | Set the tagless-text representation of this region.
Parameters: text - a string consisting of the text in the page contained by this region |
toDescription | public String toDescription()(Code) | | Generate a human-readable description of the link.
a description of the link, in the form "[url]". |
toText | public String toText()(Code) | | Convert the region to tagless text.
a string consisting of the text in the page contained by this region |
toURL | public String toURL()(Code) | | Convert the link's URL to a String
the URL represented as a string |
urlFromHref | protected URL urlFromHref(Tag tag, URL base) throws MalformedURLException(Code) | | Construct the URL for a link element, from its start tag and a base URL (for relative references).
Parameters: tag - Start tag of link, such as <A HREF="/foo/index.html">. Parameters: base - Base URL used for relative references URL to which the link points |
|
|