| Standard classifier, installed in every crawler by default.
On the entire page, this classifier sets the following labels:
- root: page is the root page of a Web site. For instance,
"http://www.digital.com/" and "http://www.digital.com/index.html" are both
marked as root, but "http://www.digital.com/about" is not.
Also sets one or more of the following labels on every link:
- hyperlink: link is a hyperlink (A, AREA, or FRAME tags) to another page on the Web (using http, file, ftp, or gopher protocols)
- image: link is an inline image (IMG).
- form: link is a form (FORM tag). A form generally requires some parameters to use.
- code: link points to code (APPLET, EMBED, or SCRIPT).
- remote: link points to a different Web server.
- local: link points to the same Web server.
- same-page: link points to the same page (e.g., by an anchor reference like "#top")
- sibling: a local link that points to a page in the same directory (e.g. "sibling.html")
- descendent: a local link that points downwards in the directory structure (e.g., "deep/deeper/deepest.html")
- ancestor: a link that points upwards in the directory structure (e.g., "../..")
|