Java Doc for DocumentsWriter.java in  » Net » lucene-connector » org » apache » lucene » index » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Net » lucene connector » org.apache.lucene.index 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   org.apache.lucene.index.DocumentsWriter

DocumentsWriter
final class DocumentsWriter (Code)
This class accepts multiple added documents and directly writes a single segment file. It does this more efficiently than creating a single segment per document (with DocumentWriter) and doing standard merges on those segments. When a document is added, its stored fields (if any) and term vectors (if any) are immediately written to the Directory (ie these do not consume RAM). The freq/prox postings are accumulated into a Postings hash table keyed by term. Each entry in this hash table holds a separate byte stream (allocated as incrementally growing slices into large shared byte[] arrays) for freq and prox, that contains the postings data for multiple documents. If vectors are enabled, each unique term for each document also allocates a PostingVector instance to similarly track the offsets & positions byte stream. Once the Postings hash is full (ie is consuming the allowed RAM) or the number of added docs is large enough (in the case we are flushing by doc count instead of RAM usage), we create a real segment and flush it to disk and reset the Postings hash. In adding a document we first organize all of its fields by field name. We then process field by field, and record the Posting hash per-field. After each field we flush its term vectors. When it's time to flush the full segment we first sort the fields by name, and then go field by field and sorts its postings. Threads: Multiple threads are allowed into addDocument at once. There is an initial synchronized call to getThreadState which allocates a ThreadState for this thread. The same thread will get the same ThreadState over time (thread affinity) so that if there are consistent patterns (for example each thread is indexing a different content source) then we make better use of RAM. Then processDocument is called on that ThreadState without synchronization (most of the "heavy lifting" is in this call). Finally the synchronized "finishDocument" is called to flush changes to the directory. Each ThreadState instance has its own Posting hash. Once we're using too much RAM, we flush all Posting hashes to a segment by merging the docIDs in the posting lists for the same term across multiple thread states (see writeSegment and appendPostings). When flush is called by IndexWriter, or, we flush internally when autoCommit=false, we forcefully idle all threads and flush only once they are all idle. This means you can call flush with a given thread even while other threads are actively adding/deleting documents. Exceptions: Because this class directly updates in-memory posting lists, and flushes stored fields and term vectors directly to files in the directory, there are certain limited times when an exception can corrupt this state. For example, a disk full while flushing stored fields leaves this file in a corrupt state. Or, an OOM exception while appending to the in-memory posting lists can corrupt that posting list. We call such exceptions "aborting exceptions". In these cases we must call abort() to discard all docs added since the last flush. All other exceptions ("non-aborting exceptions") can still partially update the index structures. These updates are consistent, but, they represent only a part of the document seen up until the exception was hit. When this happens, we immediately mark the document as deleted so that the document is always atomically ("all or none") added to the index.

Inner Class :final static class FieldMergeState
Inner Class :static class Num

Field Summary
final static  intBYTE_BLOCK_MASK
    
final static  intBYTE_BLOCK_NOT_MASK
    
final static  intBYTE_BLOCK_SHIFT
    
final static  intBYTE_BLOCK_SIZE
    
final static  intCHAR_BLOCK_MASK
    
final static  intCHAR_BLOCK_SHIFT
    
final static  intCHAR_BLOCK_SIZE
    
final static  intMAX_TERM_LENGTH
    
final static  intPOSTING_NUM_BYTE
    
 byte[]copyByteBuffer
    
final static  int[]levelSizeArray
    
 ListnewFiles
    
final static  int[]nextLevelArray
    
 NumberFormatnf
    
 longnumBytesAlloc
    
 longnumBytesUsed
    

Constructor Summary
 DocumentsWriter(Directory directory, IndexWriter writer)
    

Method Summary
synchronized  voidabort(AbortException ae)
     Called if we hit an exception when adding docs, flushing, etc.
 ListabortedFiles()
    
 booleanaddDocument(Document doc, Analyzer analyzer)
     Returns true if the caller (IndexWriter) should now flush.
 voidappendPostings(ThreadState.FieldData[] fields, TermInfosWriter termsOut, IndexOutput freqOut, IndexOutput proxOut)
    
synchronized  booleanbufferDeleteTerm(Term term)
    
synchronized  booleanbufferDeleteTerms(Term[] terms)
    
synchronized  voidclearBufferedDeletes()
    
synchronized  voidclearFlushPending()
    
synchronized  voidclose()
    
 StringcloseDocStore()
     Closes the current open doc stores an returns the doc store segment name.
 intcompareText(char[] text1, int pos1, char[] text2, int pos2)
    
 voidcopyBytes(IndexInput srcIn, IndexOutput destIn, long numBytes)
    
 voidcreateCompoundFile(String segment)
    
synchronized  Listfiles()
    
static  voidfillBytes(IndexOutput out, byte b, int numBytes)
    
synchronized  intflush(boolean closeDocStore)
    
synchronized  ListgetBufferedDeleteDocIDs()
    
synchronized  HashMapgetBufferedDeleteTerms()
    
synchronized  byte[]getByteBlock()
    
synchronized  char[]getCharBlock()
    
 intgetDocStoreOffset()
     Returns the doc offset into the shared doc store for the current buffered docs.
 StringgetDocStoreSegment()
     Returns the current doc store segment we are writing to.
 intgetMaxBufferedDeleteTerms()
    
 intgetMaxBufferedDocs()
    
synchronized  intgetNumBufferedDeleteTerms()
    
 intgetNumDocsInRAM()
     Returns how many docs are currently buffered in RAM.
synchronized  voidgetPostings(Posting[] postings)
    
 doublegetRAMBufferSizeMB()
    
 longgetRAMUsed()
    
 StringgetSegment()
     Get current segment name we are writing.
synchronized  ThreadStategetThreadState(Document doc, Term delTerm)
     Returns a free (idle) ThreadState that may be used for indexing this one document.
synchronized  booleanhasDeletes()
    
synchronized  booleanpauseAllThreads()
    
synchronized  voidrecycleByteBlocks(byte[][] blocks, int start, int end)
    
synchronized  voidrecycleCharBlocks(char[][] blocks, int numBlocks)
    
synchronized  voidrecyclePostings(Posting[] postings, int numPostings)
    
synchronized  voidresumeAllThreads()
    
synchronized  voidsetAborting()
    
synchronized  booleansetFlushPending()
     Set flushPending if it is not already set and returns whether it was set.
 voidsetInfoStream(PrintStream infoStream)
     If non-null, various details of indexing are printed here.
 voidsetMaxBufferedDeleteTerms(int maxBufferedDeleteTerms)
    
 voidsetMaxBufferedDocs(int count)
     Set max buffered docs, which means we will flush by doc count instead of by RAM usage.
 voidsetRAMBufferSizeMB(double mb)
     Set how much RAM we can use before flushing.
 StringtoMB(long v)
    
 booleanupdateDocument(Term t, Document doc, Analyzer analyzer)
    
 booleanupdateDocument(Document doc, Analyzer analyzer, Term delTerm)
    
 voidwriteNorms(String segmentName, int totalNumDoc)
     Write norms in the "true" segment format.

Field Detail
BYTE_BLOCK_MASK
final static int BYTE_BLOCK_MASK(Code)



BYTE_BLOCK_NOT_MASK
final static int BYTE_BLOCK_NOT_MASK(Code)



BYTE_BLOCK_SHIFT
final static int BYTE_BLOCK_SHIFT(Code)



BYTE_BLOCK_SIZE
final static int BYTE_BLOCK_SIZE(Code)



CHAR_BLOCK_MASK
final static int CHAR_BLOCK_MASK(Code)



CHAR_BLOCK_SHIFT
final static int CHAR_BLOCK_SHIFT(Code)



CHAR_BLOCK_SIZE
final static int CHAR_BLOCK_SIZE(Code)



MAX_TERM_LENGTH
final static int MAX_TERM_LENGTH(Code)



POSTING_NUM_BYTE
final static int POSTING_NUM_BYTE(Code)



copyByteBuffer
byte[] copyByteBuffer(Code)



levelSizeArray
final static int[] levelSizeArray(Code)



newFiles
List newFiles(Code)



nextLevelArray
final static int[] nextLevelArray(Code)



nf
NumberFormat nf(Code)



numBytesAlloc
long numBytesAlloc(Code)



numBytesUsed
long numBytesUsed(Code)




Constructor Detail
DocumentsWriter
DocumentsWriter(Directory directory, IndexWriter writer) throws IOException(Code)




Method Detail
abort
synchronized void abort(AbortException ae) throws IOException(Code)
Called if we hit an exception when adding docs, flushing, etc. This resets our state, discarding any docs added since last flush. If ae is non-null, it contains the root cause exception (which we re-throw after we are done aborting).



abortedFiles
List abortedFiles()(Code)



addDocument
boolean addDocument(Document doc, Analyzer analyzer) throws CorruptIndexException, IOException(Code)
Returns true if the caller (IndexWriter) should now flush.



appendPostings
void appendPostings(ThreadState.FieldData[] fields, TermInfosWriter termsOut, IndexOutput freqOut, IndexOutput proxOut) throws CorruptIndexException, IOException(Code)



bufferDeleteTerm
synchronized boolean bufferDeleteTerm(Term term) throws IOException(Code)



bufferDeleteTerms
synchronized boolean bufferDeleteTerms(Term[] terms) throws IOException(Code)



clearBufferedDeletes
synchronized void clearBufferedDeletes() throws IOException(Code)



clearFlushPending
synchronized void clearFlushPending()(Code)



close
synchronized void close()(Code)



closeDocStore
String closeDocStore() throws IOException(Code)
Closes the current open doc stores an returns the doc store segment name. This returns null if there are * no buffered documents.



compareText
int compareText(char[] text1, int pos1, char[] text2, int pos2)(Code)



copyBytes
void copyBytes(IndexInput srcIn, IndexOutput destIn, long numBytes) throws IOException(Code)
Copy numBytes from srcIn to destIn



createCompoundFile
void createCompoundFile(String segment) throws IOException(Code)
Build compound file for the segment we just flushed



files
synchronized List files()(Code)



fillBytes
static void fillBytes(IndexOutput out, byte b, int numBytes) throws IOException(Code)



flush
synchronized int flush(boolean closeDocStore) throws IOException(Code)
Flush all pending docs to a new segment



getBufferedDeleteDocIDs
synchronized List getBufferedDeleteDocIDs()(Code)



getBufferedDeleteTerms
synchronized HashMap getBufferedDeleteTerms()(Code)



getByteBlock
synchronized byte[] getByteBlock()(Code)



getCharBlock
synchronized char[] getCharBlock()(Code)



getDocStoreOffset
int getDocStoreOffset()(Code)
Returns the doc offset into the shared doc store for the current buffered docs.



getDocStoreSegment
String getDocStoreSegment()(Code)
Returns the current doc store segment we are writing to. This will be the same as segment when autoCommit is true.



getMaxBufferedDeleteTerms
int getMaxBufferedDeleteTerms()(Code)



getMaxBufferedDocs
int getMaxBufferedDocs()(Code)



getNumBufferedDeleteTerms
synchronized int getNumBufferedDeleteTerms()(Code)



getNumDocsInRAM
int getNumDocsInRAM()(Code)
Returns how many docs are currently buffered in RAM.



getPostings
synchronized void getPostings(Posting[] postings)(Code)



getRAMBufferSizeMB
double getRAMBufferSizeMB()(Code)



getRAMUsed
long getRAMUsed()(Code)



getSegment
String getSegment()(Code)
Get current segment name we are writing.



getThreadState
synchronized ThreadState getThreadState(Document doc, Term delTerm) throws IOException(Code)
Returns a free (idle) ThreadState that may be used for indexing this one document. This call also pauses if a flush is pending. If delTerm is non-null then we buffer this deleted term after the thread state has been acquired.



hasDeletes
synchronized boolean hasDeletes()(Code)



pauseAllThreads
synchronized boolean pauseAllThreads()(Code)



recycleByteBlocks
synchronized void recycleByteBlocks(byte[][] blocks, int start, int end)(Code)



recycleCharBlocks
synchronized void recycleCharBlocks(char[][] blocks, int numBlocks)(Code)



recyclePostings
synchronized void recyclePostings(Posting[] postings, int numPostings)(Code)



resumeAllThreads
synchronized void resumeAllThreads()(Code)



setAborting
synchronized void setAborting()(Code)



setFlushPending
synchronized boolean setFlushPending()(Code)
Set flushPending if it is not already set and returns whether it was set. This is used by IndexWriter to * trigger a single flush even when multiple threads are trying to do so.



setInfoStream
void setInfoStream(PrintStream infoStream)(Code)
If non-null, various details of indexing are printed here.



setMaxBufferedDeleteTerms
void setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms)(Code)



setMaxBufferedDocs
void setMaxBufferedDocs(int count)(Code)
Set max buffered docs, which means we will flush by doc count instead of by RAM usage.



setRAMBufferSizeMB
void setRAMBufferSizeMB(double mb)(Code)
Set how much RAM we can use before flushing.



toMB
String toMB(long v)(Code)



updateDocument
boolean updateDocument(Term t, Document doc, Analyzer analyzer) throws CorruptIndexException, IOException(Code)



updateDocument
boolean updateDocument(Document doc, Analyzer analyzer, Term delTerm) throws CorruptIndexException, IOException(Code)



writeNorms
void writeNorms(String segmentName, int totalNumDoc) throws IOException(Code)
Write norms in the "true" segment format. This is called only during commit, to create the .nrm file.



Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.