| org.archive.crawler.extractor.Extractor org.archive.crawler.extractor.ExtractorCSS
ExtractorCSS | public class ExtractorCSS extends Extractor implements CoreAttributeConstants(Code) | | This extractor is parsing URIs from CSS type files.
The format of a CSS URL value is 'url(' followed by optional white space
followed by an optional single quote (') or double quote (") character
followed by the URL itself followed by an optional single quote (') or
double quote (") character followed by optional white space followed by ')'.
Parentheses, commas, white space characters, single quotes (') and double
quotes (") appearing in a URL must be escaped with a backslash:
'\(', '\)', '\,'. Partial URLs are interpreted relative to the source of
the style sheet, not relative to the document.
Source: www.w3.org
author: Igor Ranitovic |
CSS_BACKSLASH_ESCAPE | final static String CSS_BACKSLASH_ESCAPE(Code) | | |
CSS_URI_EXTRACTOR | final static String CSS_URI_EXTRACTOR(Code) | | CSS URL extractor pattern.
This pattern extracts URIs for CSS files
|
ExtractorCSS | public ExtractorCSS(String name)(Code) | | Parameters: name - |
extract | public void extract(CrawlURI curi)(Code) | | Parameters: curi - Crawl URI to process. |
|
|