Java Doc for BitStreamHPIndexWriter.java in  » Search-Engine » mg4j » it » unimi » dsi » mg4j » index » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Search Engine » mg4j » it.unimi.dsi.mg4j.index 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   it.unimi.dsi.mg4j.index.AbstractBitStreamIndexWriter
      it.unimi.dsi.mg4j.index.BitStreamHPIndexWriter

BitStreamHPIndexWriter
public class BitStreamHPIndexWriter extends AbstractBitStreamIndexWriter implements IndexWriter(Code)
Writes a bitstream-based high-performance index. The comments about offsets in the documentation of BitStreamIndexWriter apply here, too.

The difference between indices generated by this class and those generated by BitStreamIndexWriter lie in the level of interleaving. Indices generated by this class have positions in a separate stream (similarly to Lucene), and a compulsory skip structure (an extension of that used by a BitStreamIndexWriter ) that indexes both the main index file and the positions file. This can result in major performance improvement in the resolution of position-based operators (e.g., phrases) and in the evaluation of . Since the overhead due to the additional skip structure and to the separate positions stream is negligible, indices generated by this class are the default in MG4J.

Presently, indices generated by this class cannot carry payloads: you must use a BitStreamIndexWriter in that case. Moreover, only nonparametric indices can be used for positions (this limitation rules out Coding.GOLOMB , Coding.SKEWED_GOLOMB , and Coding.INTERPOLATIVE ).
author:
   Sebastiano Vigna
since:
   1.2


Inner Class :public static class TowerData

Field Summary
final protected static  intBEFORE_COUNT
     This value of BitStreamHPIndexWriter.state can be assumed only in indices that contain counts; it means that we are positioned just before the count for the current document record.
final protected static  intBEFORE_DOCUMENT_RECORD
     This value of BitStreamHPIndexWriter.state means that we are ready to call BitStreamHPIndexWriter.newDocumentRecord() .
final protected static  intBEFORE_FREQUENCY
     This value of BitStreamHPIndexWriter.state means that we are positioned at the start of an inverted list, and we should call BitStreamHPIndexWriter.writeFrequency(int) .
final protected static  intBEFORE_INVERTED_LIST
     This value of BitStreamHPIndexWriter.state means that we should call BitStreamHPIndexWriter.newInvertedList() .
final protected static  intBEFORE_PAYLOAD
     This value of BitStreamHPIndexWriter.state can be assumed only in indices that contain payloads; it means that we are positioned just before the payload for the current document record.
final protected static  intBEFORE_POINTER
     This value of BitStreamHPIndexWriter.state means that we just started a new document record, and we should call BitStreamHPIndexWriter.writeDocumentPointer(OutputBitStream,int) .
final protected static  intBEFORE_POSITIONS
     This value of BitStreamHPIndexWriter.state can be assumed only in indices that contain document positions; it means that we are positioned just before the position list of the current document record.
final public static  intDEFAULT_TEMP_BUFFER_SIZE
     The size of the buffer for the temporary file used to build an inverted list.
final protected static  intFIRST_UNUSED_STATE
     This is the first unused state.
protected  intb
     The parameter b for Golomb coding of pointers.
public  longbitsForEntryBitLengths
     The number of bits written for entry lenghts.
public  longbitsForPositionsOffsets
     The number of bits written for offsets in the file of positions.
public  longbitsForPositionsQuantumBitLengths
     The number of bits written for quantum lengths in the positions stream.
public  longbitsForQuantumBitLengths
     The number of bits written for quantum lengths.
protected  intcurrentDocument
     The current document pointer.
protected  intfrequency
     The number of document records that the current inverted list will contain.
protected  intlastDocument
     The last document pointer in the current list.
protected  intlog2b
     The parameter log2b for Golomb coding of pointers; it is the most significant bit of BitStreamHPIndexWriter.b .
public  intmaxCount
     The maximum number of positions in a document record so far.
public  longnumberOfBlocks
     The number of written blocks.
protected  OutputBitStreamobs
     The underlying index OutputBitStream .
protected  OutputBitStreampositions
     The underlying positions OutputBitStream .
public  intprevEntryBitLength
     An estimate on the number of bits occupied per tower entry in the last written cache, or -1 if no cache has been written for the current inverted list.
public  intprevPositionsQuantumBitLength
     An estimate on the number of bits occupied per quantum in the positions stream in the last written cache, or -1 if no cache has been written for the current inverted list.
public  intprevQuantumBitLength
     An estimate on the number of bits occupied per quantum in the last written cache, or -1 if no cache has been written for the current inverted list.
protected  intstate
     The current state of the writer.
final public  TowerDatatowerData
     The sum of all tower data computed so far.
protected  intwrittenDocuments
     The number of document records already written for the current inverted list.

Constructor Summary
public  BitStreamHPIndexWriter(CharSequence basename, int numberOfDocuments, boolean writeOffsets, int tempBufferSize, Map<Component, Coding> flags, int q, int h)
     Creates a new index writer, with the specified basename.
public  BitStreamHPIndexWriter(OutputBitStream obs, OutputBitStream positions, OutputBitStream offset, int numberOfDocuments, int tempBufferSize, Map<Component, Coding> flags, int q, int h)
     Creates a new index writer with payloads using the specified underlying OutputBitStream .

Method Summary
public  voidclose()
    
public  OutputBitStreamnewDocumentRecord()
    
public  longnewInvertedList()
    
public  voidprintStats(PrintStream stats)
    
public  Propertiesproperties()
    
public  intwriteDocumentPointer(OutputBitStream unused, int pointer)
    
public  intwriteDocumentPositions(OutputBitStream unused, int[] occ, int offset, int len, int docSize)
    
public  intwriteFrequency(int frequency)
    
public  intwritePayload(OutputBitStream out, Payload payload)
    
public  intwritePositionCount(OutputBitStream out, int count)
    
public  longwrittenBits()
    

Field Detail
BEFORE_COUNT
final protected static int BEFORE_COUNT(Code)
This value of BitStreamHPIndexWriter.state can be assumed only in indices that contain counts; it means that we are positioned just before the count for the current document record.



BEFORE_DOCUMENT_RECORD
final protected static int BEFORE_DOCUMENT_RECORD(Code)
This value of BitStreamHPIndexWriter.state means that we are ready to call BitStreamHPIndexWriter.newDocumentRecord() .



BEFORE_FREQUENCY
final protected static int BEFORE_FREQUENCY(Code)
This value of BitStreamHPIndexWriter.state means that we are positioned at the start of an inverted list, and we should call BitStreamHPIndexWriter.writeFrequency(int) .



BEFORE_INVERTED_LIST
final protected static int BEFORE_INVERTED_LIST(Code)
This value of BitStreamHPIndexWriter.state means that we should call BitStreamHPIndexWriter.newInvertedList() .



BEFORE_PAYLOAD
final protected static int BEFORE_PAYLOAD(Code)
This value of BitStreamHPIndexWriter.state can be assumed only in indices that contain payloads; it means that we are positioned just before the payload for the current document record.



BEFORE_POINTER
final protected static int BEFORE_POINTER(Code)
This value of BitStreamHPIndexWriter.state means that we just started a new document record, and we should call BitStreamHPIndexWriter.writeDocumentPointer(OutputBitStream,int) .



BEFORE_POSITIONS
final protected static int BEFORE_POSITIONS(Code)
This value of BitStreamHPIndexWriter.state can be assumed only in indices that contain document positions; it means that we are positioned just before the position list of the current document record.



DEFAULT_TEMP_BUFFER_SIZE
final public static int DEFAULT_TEMP_BUFFER_SIZE(Code)
The size of the buffer for the temporary file used to build an inverted list. Inverted lists shorter than this number of bytes will be directly rebuilt from the buffer, and never flushed to disk.



FIRST_UNUSED_STATE
final protected static int FIRST_UNUSED_STATE(Code)
This is the first unused state. Subclasses may start from this value to define new states.



b
protected int b(Code)
The parameter b for Golomb coding of pointers.



bitsForEntryBitLengths
public long bitsForEntryBitLengths(Code)
The number of bits written for entry lenghts.



bitsForPositionsOffsets
public long bitsForPositionsOffsets(Code)
The number of bits written for offsets in the file of positions.



bitsForPositionsQuantumBitLengths
public long bitsForPositionsQuantumBitLengths(Code)
The number of bits written for quantum lengths in the positions stream.



bitsForQuantumBitLengths
public long bitsForQuantumBitLengths(Code)
The number of bits written for quantum lengths.



currentDocument
protected int currentDocument(Code)
The current document pointer.



frequency
protected int frequency(Code)
The number of document records that the current inverted list will contain.



lastDocument
protected int lastDocument(Code)
The last document pointer in the current list.



log2b
protected int log2b(Code)
The parameter log2b for Golomb coding of pointers; it is the most significant bit of BitStreamHPIndexWriter.b .



maxCount
public int maxCount(Code)
The maximum number of positions in a document record so far.



numberOfBlocks
public long numberOfBlocks(Code)
The number of written blocks.



obs
protected OutputBitStream obs(Code)
The underlying index OutputBitStream .



positions
protected OutputBitStream positions(Code)
The underlying positions OutputBitStream .



prevEntryBitLength
public int prevEntryBitLength(Code)
An estimate on the number of bits occupied per tower entry in the last written cache, or -1 if no cache has been written for the current inverted list.



prevPositionsQuantumBitLength
public int prevPositionsQuantumBitLength(Code)
An estimate on the number of bits occupied per quantum in the positions stream in the last written cache, or -1 if no cache has been written for the current inverted list.



prevQuantumBitLength
public int prevQuantumBitLength(Code)
An estimate on the number of bits occupied per quantum in the last written cache, or -1 if no cache has been written for the current inverted list.



state
protected int state(Code)
The current state of the writer.



towerData
final public TowerData towerData(Code)
The sum of all tower data computed so far.



writtenDocuments
protected int writtenDocuments(Code)
The number of document records already written for the current inverted list.




Constructor Detail
BitStreamHPIndexWriter
public BitStreamHPIndexWriter(CharSequence basename, int numberOfDocuments, boolean writeOffsets, int tempBufferSize, Map<Component, Coding> flags, int q, int h) throws IOException(Code)
Creates a new index writer, with the specified basename. The index will be written on a file (stemmed with .index). If writeOffsets, also an offset file will be produced (stemmed with .offsets).
Parameters:
  basename - the basename.
Parameters:
  numberOfDocuments - the number of documents in the collection to be indexed.
Parameters:
  writeOffsets - if true, the offset file will also be produced.
Parameters:
  flags - a flag map setting the coding techniques to be used (see CompressionFlags).



BitStreamHPIndexWriter
public BitStreamHPIndexWriter(OutputBitStream obs, OutputBitStream positions, OutputBitStream offset, int numberOfDocuments, int tempBufferSize, Map<Component, Coding> flags, int q, int h) throws IOException(Code)
Creates a new index writer with payloads using the specified underlying OutputBitStream .
Parameters:
  obs - the underlying output bit stream.
Parameters:
  offset - the offset bit stream, or null if offsets should not be written.
Parameters:
  numberOfDocuments - the number of documents in the collection to be indexed.
Parameters:
  flags - a flag map setting the coding techniques to be used (see CompressionFlags).
throws:
  IOException -




Method Detail
close
public void close() throws IOException(Code)



newDocumentRecord
public OutputBitStream newDocumentRecord() throws IOException(Code)



newInvertedList
public long newInvertedList() throws IOException(Code)



printStats
public void printStats(PrintStream stats)(Code)



properties
public Properties properties()(Code)



writeDocumentPointer
public int writeDocumentPointer(OutputBitStream unused, int pointer) throws IOException(Code)



writeDocumentPositions
public int writeDocumentPositions(OutputBitStream unused, int[] occ, int offset, int len, int docSize) throws IOException(Code)



writeFrequency
public int writeFrequency(int frequency) throws IOException(Code)



writePayload
public int writePayload(OutputBitStream out, Payload payload) throws IOException(Code)



writePositionCount
public int writePositionCount(OutputBitStream out, int count) throws IOException(Code)



writtenBits
public long writtenBits()(Code)



Fields inherited from it.unimi.dsi.mg4j.index.AbstractBitStreamIndexWriter
public long bitsForCounts(Code)(Java Doc)
public long bitsForFrequencies(Code)(Java Doc)
public long bitsForPayloads(Code)(Java Doc)
public long bitsForPointers(Code)(Java Doc)
public long bitsForPositions(Code)(Java Doc)
protected Coding countCoding(Code)(Java Doc)
protected int currentTerm(Code)(Java Doc)
public Map<Component, Coding> flags(Code)(Java Doc)
protected Coding frequencyCoding(Code)(Java Doc)
final protected boolean hasCounts(Code)(Java Doc)
final protected boolean hasPayloads(Code)(Java Doc)
final protected boolean hasPositions(Code)(Java Doc)
final protected int numberOfDocuments(Code)(Java Doc)
protected long numberOfOccurrences(Code)(Java Doc)
protected long numberOfPostings(Code)(Java Doc)
protected Coding pointerCoding(Code)(Java Doc)
protected Coding positionCoding(Code)(Java Doc)

Methods inherited from it.unimi.dsi.mg4j.index.AbstractBitStreamIndexWriter
public void printStats(PrintStream stats)(Code)(Java Doc)

Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.