| java.lang.Object org.archive.crawler.extractor.ExtractorTool
ExtractorTool | public class ExtractorTool (Code) | | Run named extractors against passed ARC file.
This extractor tool runs suboptimally. It takes each ARC file record,
writes it to a new scratch file, and then it runs each listed
extractor against the scratch. It works in this manner because
extractors want CharSequence, being able to refer to characters
by absolute position, but ARCs are compressed streams. The work
to get a CharSequence on an underlying compressed stream has not
been done. Other issues are need to setup CrawlerSetting environment
so extractors can run.
author: stack version: $Date: 2006-09-26 23:47:15 +0000 (Tue, 26 Sep 2006) $, $Revision: 4671 $ |
|
|