Java Doc for WARCReader.java in  » Web-Crawler » heritrix » org » archive » io » warc » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Web Crawler » heritrix » org.archive.io.warc 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   org.archive.io.ArchiveReader
      org.archive.io.warc.WARCReader

WARCReader
public class WARCReader extends ArchiveReader implements WARCConstants(Code)
WARCReader. Go via WARCReaderFactory to get instance.
author:
   stack
version:
   $Date: 2006-11-27 18:03:03 -0800 (Mon, 27 Nov 2006) $ $Version$



Constructor Summary
 WARCReader()
    

Method Summary
protected  WARCRecordcreateArchiveRecord(InputStream is, long offset)
     Create new WARC record. Encapsulate housekeeping that has to do w/ creating new Record.
Parameters:
  is - InputStream to use.
Parameters:
  offset - Absolute offset into WARC file.
public static  voidcreateCDXIndexFile(String urlOrPath)
     Generate a CDX index file for an ARC file.
public  voiddump(boolean compress)
    
public  ArchiveReadergetDeleteFileOnCloseReader(File f)
    
public  StringgetDotFileExtension()
    
public  StringgetFileExtension()
    
protected  voidgotoEOR(ArchiveRecord record)
     Skip over any trailing new lines at end of the record so we're lined up ready to read the next.
protected  voidinitialize(String i)
    
public static  voidmain(String[] args)
     Command-line interface to WARCReader.
protected static  voidoutput(WARCReader reader, String format)
     Write out the arcfile.
protected  voidreadExpectedChar(InputStream is, int expected)
    


Constructor Detail
WARCReader
WARCReader()(Code)




Method Detail
createArchiveRecord
protected WARCRecord createArchiveRecord(InputStream is, long offset) throws IOException(Code)
Create new WARC record. Encapsulate housekeeping that has to do w/ creating new Record.
Parameters:
  is - InputStream to use.
Parameters:
  offset - Absolute offset into WARC file. A WARCRecord.
throws:
  IOException -



createCDXIndexFile
public static void createCDXIndexFile(String urlOrPath) throws IOException, java.text.ParseException(Code)
Generate a CDX index file for an ARC file.
Parameters:
  urlOrPath - The ARC file to generate a CDX index for
throws:
  IOException -
throws:
  java.text.ParseException -



dump
public void dump(boolean compress) throws IOException, java.text.ParseException(Code)



getDeleteFileOnCloseReader
public ArchiveReader getDeleteFileOnCloseReader(File f)(Code)



getDotFileExtension
public String getDotFileExtension()(Code)



getFileExtension
public String getFileExtension()(Code)



gotoEOR
protected void gotoEOR(ArchiveRecord record) throws IOException(Code)
Skip over any trailing new lines at end of the record so we're lined up ready to read the next.
Parameters:
  record -
throws:
  IOException -



initialize
protected void initialize(String i)(Code)



main
public static void main(String[] args) throws ParseException, IOException, java.text.ParseException(Code)
Command-line interface to WARCReader. Here is the command-line interface:
 usage: java org.archive.io.arc.WARCReader [--offset=#] ARCFILE
 -h,--help      Prints this message and exits.
 -o,--offset    Outputs record at this offset into arc file.

Outputs using a pseudo-CDX format as described here: CDX Legent and here Example. Legend used in below is: 'CDX b e a m s c V (or v if uncompressed) n g'. Hash is hard-coded straight SHA-1 hash of content.
Parameters:
  args - Command-line arguments.
throws:
  ParseException - Failed parse of the command line.
throws:
  IOException -
throws:
  java.text.ParseException -




output
protected static void output(WARCReader reader, String format) throws IOException, java.text.ParseException(Code)
Write out the arcfile.
Parameters:
  reader -
Parameters:
  format - Format to use outputting.
throws:
  IOException -
throws:
  java.text.ParseException -



readExpectedChar
protected void readExpectedChar(InputStream is, int expected) throws IOException(Code)



Fields inherited from org.archive.io.ArchiveReader
final public static int MAX_ALLOWED_RECOVERABLES(Code)(Java Doc)

Methods inherited from org.archive.io.ArchiveReader
protected void cdxOutput(boolean toFile) throws IOException(Code)(Java Doc)
protected void cleanupCurrentRecord() throws IOException(Code)(Java Doc)
public void close() throws IOException(Code)(Java Doc)
abstract protected ArchiveRecord createArchiveRecord(InputStream is, long offset) throws IOException(Code)(Java Doc)
protected ArchiveRecord currentRecord(ArchiveRecord currentRecord)(Code)(Java Doc)
abstract public void dump(boolean compress) throws IOException, java.text.ParseException(Code)(Java Doc)
public ArchiveRecord get(long offset) throws IOException(Code)(Java Doc)
public ArchiveRecord get() throws IOException(Code)(Java Doc)
protected ArchiveRecord getCurrentRecord()(Code)(Java Doc)
abstract public ArchiveReader getDeleteFileOnCloseReader(File f)(Code)(Java Doc)
abstract public String getDotFileExtension()(Code)(Java Doc)
abstract public String getFileExtension()(Code)(Java Doc)
public String getFileName()(Code)(Java Doc)
protected InputStream getIn()(Code)(Java Doc)
protected InputStream getInputStream(File f, long offset) throws IOException(Code)(Java Doc)
protected InputStream getInputStream()(Code)(Java Doc)
protected Logger getLogger()(Code)(Java Doc)
protected static Options getOptions()(Code)(Java Doc)
public String getReaderIdentifier()(Code)(Java Doc)
public String getStrippedFileName()(Code)(Java Doc)
public static String getStrippedFileName(String name, String dotFileExtension)(Code)(Java Doc)
protected static boolean getTrueOrFalse(String value)(Code)(Java Doc)
public String getVersion()(Code)(Java Doc)
abstract protected void gotoEOR(ArchiveRecord record) throws IOException(Code)(Java Doc)
protected void initialize(String i)(Code)(Java Doc)
public boolean isCompressed()(Code)(Java Doc)
public boolean isDigest()(Code)(Java Doc)
public boolean isStrict()(Code)(Java Doc)
public boolean isValid()(Code)(Java Doc)
public Iterator<ArchiveRecord> iterator()(Code)(Java Doc)
public void logStdErr(Level level, String message)(Code)(Java Doc)
protected boolean output(String format) throws IOException, java.text.ParseException(Code)(Java Doc)
public boolean outputRecord(String format) throws IOException(Code)(Java Doc)
protected static void outputRecord(ArchiveReader r, String format) throws IOException(Code)(Java Doc)
protected void rewind() throws IOException(Code)(Java Doc)
protected void setCompressed(boolean compressed)(Code)(Java Doc)
public void setDigest(boolean d)(Code)(Java Doc)
protected void setIn(InputStream in)(Code)(Java Doc)
protected void setReaderIdentifier(String i)(Code)(Java Doc)
public void setStrict(boolean s)(Code)(Java Doc)
protected void setVersion(String version)(Code)(Java Doc)
protected static String stripExtension(String name, String ext)(Code)(Java Doc)
public List validate() throws IOException(Code)(Java Doc)
public List validate(int noRecords) throws IOException(Code)(Java Doc)

Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.