| de.anomic.plasma.parser.Parser
All known Subclasses: de.anomic.plasma.parser.gzip.gzipParser, de.anomic.plasma.parser.bzip.bzipParser, de.anomic.plasma.parser.swf.swfParser, de.anomic.plasma.parser.ppt.pptParser, de.anomic.plasma.parser.sevenzip.sevenzipParser, de.anomic.plasma.parser.vcf.vcfParser, de.anomic.plasma.parser.AbstractParser, de.anomic.plasma.parser.odt.odtParser, de.anomic.plasma.parser.doc.docParser, de.anomic.plasma.parser.mimeType.mimeTypeParser, de.anomic.plasma.parser.ps.psParser, de.anomic.plasma.parser.rss.rssParser, de.anomic.plasma.parser.xls.xlsParser, de.anomic.plasma.parser.tar.tarParser, de.anomic.plasma.parser.rtf.rtfParser, de.anomic.plasma.parser.zip.zipParser, de.anomic.plasma.parser.pdf.pdfParser, de.anomic.plasma.parser.rpm.rpmParser,
Parser | public interface Parser (Code) | | This interface defines a list of methods that needs to be implemented
by each content parser class.
author: Martin Thelian version: $LastChangedRevision$ / $LastChangedDate$ |
Method Summary | |
public String[] | getLibxDependences() | public String | getName() | public Hashtable<String, String> | getSupportedMimeTypes() | public String | getVersion() | public plasmaParserDocument | parse(yacyURL location, String mimeType, String charset, byte[] source) | public plasmaParserDocument | parse(yacyURL location, String mimeType, String charset, File sourceFile) | public plasmaParserDocument | parse(yacyURL location, String mimeType, String charset, InputStream source) | public void | reset() This function should be called before reusing the parser object. | public void | setContentLength(long length) | public void | setLogger(serverLog log) |
MAX_KEEP_IN_MEMORY_SIZE | public static long MAX_KEEP_IN_MEMORY_SIZE(Code) | | |
getLibxDependences | public String[] getLibxDependences()(Code) | | Returns a list of library names that are needed by this parser |
getName | public String getName()(Code) | | Returns the name of the parser
parser name |
getSupportedMimeTypes | public Hashtable<String, String> getSupportedMimeTypes()(Code) | | Can be used to determine the MimeType(s) that are supported by the parser
a Hashtable containing a list of MimeTypes that are supported by the parser |
getVersion | public String getVersion()(Code) | | Returns the version number of the current parser
parser version number |
parse | public plasmaParserDocument parse(yacyURL location, String mimeType, String charset, byte[] source) throws ParserException, InterruptedException(Code) | | Parsing a document available as byte array
Parameters: location - the origin of the document Parameters: mimeType - the mimetype of the document Parameters: charset - the supposed charset of the document or null if unkown Parameters: source - the content byte array a plasmaParserDocument containing the extracted plain text of the documentand some additional metadata. throws: ParserException - if the content could not be parsed properly |
parse | public plasmaParserDocument parse(yacyURL location, String mimeType, String charset, File sourceFile) throws ParserException, InterruptedException(Code) | | Parsing a document stored in a
File Parameters: location - the origin of the document Parameters: mimeType - the mimetype of the document Parameters: charset - the supposed charset of the document or null if unkown Parameters: sourceFile - the file containing the content of the document a plasmaParserDocument containing the extracted plain text of the documentand some additional metadata. throws: ParserException - if the content could not be parsed properly |
reset | public void reset()(Code) | | This function should be called before reusing the parser object.
|
setContentLength | public void setContentLength(long length)(Code) | | |
setLogger | public void setLogger(serverLog log)(Code) | | Can be used to set the logger that should be used by the parser module
Parameters: log - the serverLog logger that should be used |
|
|