Java Doc for Combine.java in  » Search-Engine » mg4j » it » unimi » dsi » mg4j » tool » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Search Engine » mg4j » it.unimi.dsi.mg4j.tool 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   it.unimi.dsi.mg4j.tool.Combine

All known Subclasses:   it.unimi.dsi.mg4j.tool.Concatenate,  it.unimi.dsi.mg4j.tool.Paste,  it.unimi.dsi.mg4j.tool.Merge,
Combine
abstract public class Combine (Code)
Combines several indices.

Indices may be combined in several different ways. This abstract class contains code that is common to classes such as it.unimi.dsi.mg4j.tool.Merge or it.unimi.dsi.mg4j.tool.Concatenate : essentially, command line parsing, inded opening, and term list fusion is taken care of. Then, the template method Combine.combine(int) must write into Combine.indexWriter the combined inverted list, returning the resulting frequency.

Note that by combining a single index into a new one you can recompress an index with different compression parameters (which includes the possibility of eliminating positions or counts).

The subclasses of this class must implement Combine.combine(int) so that indices with different sets of features are combined keeping the largest set of features requested by the user. For instance, combining an index with positions and an index with counts, but no positions, should generate an index with counts but no positions.

Warning: a combination requires opening three files per input index, plus a few more files for the output index. If the combination process is interrupted by an exception claiming that there are too many open files, check how to increase the number of files you can open (usually, for instance on UN*X, there is a global and a per-process limit, so be sure to set both).

Read-once indices, readers, and distributed index combination

If the and involved in the combination are read-once (i.e., opening an index and reading once its contents sequentially causes each file composing the index to be read exactly once) then also it.unimi.dsi.mg4j.tool.Combine implementations should be read-once ( it.unimi.dsi.mg4j.tool.Concatenate , it.unimi.dsi.mg4j.tool.Merge and it.unimi.dsi.mg4j.tool.Paste are).

This means, in particular, that index combination can be performed from pipes, which in turn can be filled, for instance, with data coming from the network. In other words, albeit this class is theoretically based on a number of indices existing on a local disk, those indices can be substituted with suitable pipes filled with remote data without affecting the combination process. For instance, the following bash code creates three sets of pipes:

 for i in 0 1 2; do
 for e in frequencies globcounts index offsets properties sizes terms; do 
 mkfifo pipe$i.$e
 done
 done
 

Each pipe should be then filled with suitable data, for instance obtained from the net (assuming you have indices index0, index1 and index2 on example.com):

 for i in 0 1 2; do 
 for e in frequencies globcounts index offsets properties sizes terms; do 
 (ssh -x example.com cat index$i.$e >pipe$i.$e &)
 done
 done
 

Now all pipes will be filled with data from the corresponding remote files, and combining the indices pipe0, pipe1 and pipe2 will give the same result as combining index0, index1 and index2 on the remote system.
author:
   Sebastiano Vigna
since:
   1.0


Inner Class :final protected static class GammaCodedIntIterator extends AbstractIntIterator implements Closeable

Field Summary
final public static  intDEFAULT_BUFFER_SIZE
     The default buffer size.
final protected  int[]frequency
     For each index, the frequency of the current term (given that it is present).
final protected  booleanhasCounts
     Whether Combine.indexWriter has counts.
final protected  booleanhasPayloads
     Whether Combine.indexWriter has payloads.
final protected  booleanhasPositions
     Whether Combine.indexWriter has positions.
final protected  BitStreamIndex[]index
     The array of indices to be merged.
final protected  IndexIterator[]indexIterator
     An array of index iterators parallel to Combine.index (filled by concrete implementations).
final protected  IndexReader[]indexReader
     An array of index readers parallel to Combine.index .
protected  IndexWriterindexWriter
     The index writer for the merged index.
final protected  String[]inputBasename
     The array of input basenames.
protected  intmaxCount
     The maximum count in the merged index.
final protected  intnumIndices
     The number of indices to be merged.
final protected  intnumberOfDocuments
     The overall number of documents.
protected  longnumberOfOccurrences
     The overall number of occurrences.
protected  int[]position
     A cache for positions.
protected  int[]size
     The size of each document.
protected  ObjectHeapSemiIndirectPriorityQueue<MutableString>termQueue
     The queue containing terms.
protected  int[]usedIndex
     An array partially filled with the indices (as offsets in Combine.index ) participating to the merge process for the current term.

Constructor Summary
public  Combine(String outputBasename, String[] inputBasename, boolean metadataOnly, int bufferSize, Map<Component, Coding> writerFlags, boolean interleaved, boolean skips, int quantum, int height, int skipBufferSize, long logInterval)
    

Method Summary
abstract protected  intcombine(int numUsedIndices)
     Combines several indices.

When this method is called, exactly numUsedIndices entries of Combine.usedIndex contain, in increasing order, the indices containing inverted lists for the current term.

abstract protected  intcombineNumberOfDocuments()
     Combines the number of documents.
abstract protected  intcombineSizes()
     Combines size lists.
protected  BitStreamIndexgetIndex(CharSequence basename)
     Return a index with given basename, loaded with options suitable to perform the combination.

This basic implementation calls it.unimi.dsi.mg4j.index.Index.getInstance(CharSequencebooleanboolean) with all Boolean parameters set to false.

public static  voidmain(String[] arg)
    
public static  voidmain(String[] arg, Class<? extends Combine> combineClass)
    
public  voidrun()
    
protected  IntIteratorsizes(int numIndex)
     Returns an iterator on sizes.

The purpose of this method is to provide Combine.combineSizes() implementations with a way to access the size list from a disk file or from BitStreamIndex.sizes transparently. This mechanism is essential to ensure that size files are read exactly once.

The caller should check whether the returned object implements Closeable , and, in this case, invoke Closeable.close after usage.
Parameters:
  numIndex - the number of an index.


Field Detail
DEFAULT_BUFFER_SIZE
final public static int DEFAULT_BUFFER_SIZE(Code)
The default buffer size.



frequency
final protected int[] frequency(Code)
For each index, the frequency of the current term (given that it is present).



hasCounts
final protected boolean hasCounts(Code)
Whether Combine.indexWriter has counts.



hasPayloads
final protected boolean hasPayloads(Code)
Whether Combine.indexWriter has payloads.



hasPositions
final protected boolean hasPositions(Code)
Whether Combine.indexWriter has positions.



index
final protected BitStreamIndex[] index(Code)
The array of indices to be merged.



indexIterator
final protected IndexIterator[] indexIterator(Code)
An array of index iterators parallel to Combine.index (filled by concrete implementations).



indexReader
final protected IndexReader[] indexReader(Code)
An array of index readers parallel to Combine.index .



indexWriter
protected IndexWriter indexWriter(Code)
The index writer for the merged index.



inputBasename
final protected String[] inputBasename(Code)
The array of input basenames.



maxCount
protected int maxCount(Code)
The maximum count in the merged index.



numIndices
final protected int numIndices(Code)
The number of indices to be merged.



numberOfDocuments
final protected int numberOfDocuments(Code)
The overall number of documents.



numberOfOccurrences
protected long numberOfOccurrences(Code)
The overall number of occurrences.



position
protected int[] position(Code)
A cache for positions.



size
protected int[] size(Code)
The size of each document.



termQueue
protected ObjectHeapSemiIndirectPriorityQueue<MutableString> termQueue(Code)
The queue containing terms.



usedIndex
protected int[] usedIndex(Code)
An array partially filled with the indices (as offsets in Combine.index ) participating to the merge process for the current term.




Constructor Detail
Combine
public Combine(String outputBasename, String[] inputBasename, boolean metadataOnly, int bufferSize, Map<Component, Coding> writerFlags, boolean interleaved, boolean skips, int quantum, int height, int skipBufferSize, long logInterval) throws IOException, ConfigurationException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException(Code)




Method Detail
combine
abstract protected int combine(int numUsedIndices) throws IOException(Code)
Combines several indices.

When this method is called, exactly numUsedIndices entries of Combine.usedIndex contain, in increasing order, the indices containing inverted lists for the current term. Implementations of this method must combine the inverted list, save the total global count for the current term and return the resulting frequency.
Parameters:
  numUsedIndices - the number of valid entries in Combine.usedIndex. the frequency of the combined lists.




combineNumberOfDocuments
abstract protected int combineNumberOfDocuments()(Code)
Combines the number of documents. the number of documents of the combined index.



combineSizes
abstract protected int combineSizes() throws IOException(Code)
Combines size lists. the maximum size of a document in the combined index.
throws:
  IOException -



getIndex
protected BitStreamIndex getIndex(CharSequence basename) throws ConfigurationException, IOException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException(Code)
Return a index with given basename, loaded with options suitable to perform the combination.

This basic implementation calls it.unimi.dsi.mg4j.index.Index.getInstance(CharSequencebooleanboolean) with all Boolean parameters set to false. Subclasses can override this method to load more data.
Parameters:
  basename - an index basename. an index loaded with the correct options for the combining strategy.




main
public static void main(String[] arg) throws JSAPException, ConfigurationException, IOException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException(Code)



main
public static void main(String[] arg, Class<? extends Combine> combineClass) throws JSAPException, ConfigurationException, IOException, URISyntaxException, ClassNotFoundException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException(Code)



run
public void run() throws ConfigurationException, IOException(Code)



sizes
protected IntIterator sizes(int numIndex) throws FileNotFoundException(Code)
Returns an iterator on sizes.

The purpose of this method is to provide Combine.combineSizes() implementations with a way to access the size list from a disk file or from BitStreamIndex.sizes transparently. This mechanism is essential to ensure that size files are read exactly once.

The caller should check whether the returned object implements Closeable , and, in this case, invoke Closeable.close after usage.
Parameters:
  numIndex - the number of an index. an iterator on the sizes of the index.




Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.