Java Doc for DocumentsWriter.java in » Net » lucene-connector » org » apache » lucene » index » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server geronimo
26.	EJB Server GlassFish
27.	EJB Server JBoss 4.2.1
28.	EJB Server resin 3.1.5
29.	ERP CRM Financial
30.	ESB
31.	Forum
32.	GIS
33.	Graphic Library
34.	Groupware
35.	HTML Parser
36.	IDE
37.	IDE Eclipse
38.	IDE Netbeans
39.	Installer
40.	Internationalization Localization
41.	Inversion of Control
42.	Issue Tracking
43.	J2EE
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Net
50.	Parser
51.	PDF
52.	Portal
53.	Profiler
54.	Project Management
55.	Report
56.	RSS RDF
57.	Rule Engine
58.	Science
59.	Scripting
60.	Search Engine
61.	Security
62.	Sevlet Container
63.	Source Control
64.	Swing Library
65.	Template Engine
66.	Test Coverage
67.	Testing
68.	UML
69.	Web Crawler
70.	Web Framework
71.	Web Mail
72.	Web Server
73.	Web Services
74.	Web Services apache cxf 2.0.1
75.	Web Services AXIS2
76.	Wiki Engine
77.	Workflow Engines
78.	XML
79.	XML UI

Java

Java Tutorial

Illustrator Tutorials

GIMP Tutorials

C# / C Sharp

C# / CSharp Tutorial

C# / CSharp Open Source

SQL Server / T-SQL Tutorial

Oracle PL / SQL

Oracle PL/SQL Tutorial

Flash / Flex / ActionScript

VBA / Excel / Access / Word

XML

XML Tutorial

Microsoft Office PowerPoint 2007 Tutorial

Microsoft Office Excel 2007 Tutorial

Microsoft Office Word 2007 Tutorial

Java Source Code / Java Documentation » Net » lucene connector » org.apache.lucene.index

Source Cross Reference

Class Diagram

Java Document (Java Doc)

java.lang .Object

org.apache.lucene.index .DocumentsWriter

DocumentsWriter
final class DocumentsWriter (Code)
	This class accepts multiple added documents and directly writes a single segment file. It does this more efficiently than creating a single segment per document (with DocumentWriter) and doing standard merges on those segments. When a document is added, its stored fields (if any) and term vectors (if any) are immediately written to the Directory (ie these do not consume RAM). The freq/prox postings are accumulated into a Postings hash table keyed by term. Each entry in this hash table holds a separate byte stream (allocated as incrementally growing slices into large shared byte[] arrays) for freq and prox, that contains the postings data for multiple documents. If vectors are enabled, each unique term for each document also allocates a PostingVector instance to similarly track the offsets & positions byte stream. Once the Postings hash is full (ie is consuming the allowed RAM) or the number of added docs is large enough (in the case we are flushing by doc count instead of RAM usage), we create a real segment and flush it to disk and reset the Postings hash. In adding a document we first organize all of its fields by field name. We then process field by field, and record the Posting hash per-field. After each field we flush its term vectors. When it's time to flush the full segment we first sort the fields by name, and then go field by field and sorts its postings. Threads: Multiple threads are allowed into addDocument at once. There is an initial synchronized call to getThreadState which allocates a ThreadState for this thread. The same thread will get the same ThreadState over time (thread affinity) so that if there are consistent patterns (for example each thread is indexing a different content source) then we make better use of RAM. Then processDocument is called on that ThreadState without synchronization (most of the "heavy lifting" is in this call). Finally the synchronized "finishDocument" is called to flush changes to the directory. Each ThreadState instance has its own Posting hash. Once we're using too much RAM, we flush all Posting hashes to a segment by merging the docIDs in the posting lists for the same term across multiple thread states (see writeSegment and appendPostings). When flush is called by IndexWriter, or, we flush internally when autoCommit=false, we forcefully idle all threads and flush only once they are all idle. This means you can call flush with a given thread even while other threads are actively adding/deleting documents. Exceptions: Because this class directly updates in-memory posting lists, and flushes stored fields and term vectors directly to files in the directory, there are certain limited times when an exception can corrupt this state. For example, a disk full while flushing stored fields leaves this file in a corrupt state. Or, an OOM exception while appending to the in-memory posting lists can corrupt that posting list. We call such exceptions "aborting exceptions". In these cases we must call abort() to discard all docs added since the last flush. All other exceptions ("non-aborting exceptions") can still partially update the index structures. These updates are consistent, but, they represent only a part of the document seen up until the exception was hit. When this happens, we immediately mark the document as deleted so that the document is always atomically ("all or none") added to the index.

Inner Class :final static class FieldMergeState

Inner Class :static class Num

Field Summary
final static int	BYTE_BLOCK_MASK
final static int	BYTE_BLOCK_NOT_MASK
final static int	BYTE_BLOCK_SHIFT
final static int	BYTE_BLOCK_SIZE
final static int	CHAR_BLOCK_MASK
final static int	CHAR_BLOCK_SHIFT
final static int	CHAR_BLOCK_SIZE
final static int	MAX_TERM_LENGTH
final static int	POSTING_NUM_BYTE
byte[]	copyByteBuffer
final static int[]	levelSizeArray
List	newFiles
final static int[]	nextLevelArray
NumberFormat	nf
long	numBytesAlloc
long	numBytesUsed

Constructor Summary
	DocumentsWriter(Directory directory, IndexWriter writer)

Method Summary
synchronized void	abort(AbortException ae) Called if we hit an exception when adding docs, flushing, etc.
List	abortedFiles()
boolean	addDocument(Document doc, Analyzer analyzer) Returns true if the caller (IndexWriter) should now flush.
void	appendPostings(ThreadState.FieldData[] fields, TermInfosWriter termsOut, IndexOutput freqOut, IndexOutput proxOut)
synchronized boolean	bufferDeleteTerm(Term term)
synchronized boolean	bufferDeleteTerms(Term[] terms)
synchronized void	clearBufferedDeletes()
synchronized void	clearFlushPending()
synchronized void	close()
String	closeDocStore() Closes the current open doc stores an returns the doc store segment name.
int	compareText(char[] text1, int pos1, char[] text2, int pos2)
void	copyBytes(IndexInput srcIn, IndexOutput destIn, long numBytes)
void	createCompoundFile(String segment)
synchronized List	files()
static void	fillBytes(IndexOutput out, byte b, int numBytes)
synchronized int	flush(boolean closeDocStore)
synchronized List	getBufferedDeleteDocIDs()
synchronized HashMap	getBufferedDeleteTerms()
synchronized byte[]	getByteBlock()
synchronized char[]	getCharBlock()
int	getDocStoreOffset() Returns the doc offset into the shared doc store for the current buffered docs.
String	getDocStoreSegment() Returns the current doc store segment we are writing to.
int	getMaxBufferedDeleteTerms()
int	getMaxBufferedDocs()
synchronized int	getNumBufferedDeleteTerms()
int	getNumDocsInRAM() Returns how many docs are currently buffered in RAM.
synchronized void	getPostings(Posting[] postings)
double	getRAMBufferSizeMB()
long	getRAMUsed()
String	getSegment() Get current segment name we are writing.
synchronized ThreadState	getThreadState(Document doc, Term delTerm) Returns a free (idle) ThreadState that may be used for indexing this one document.
synchronized boolean	hasDeletes()
synchronized boolean	pauseAllThreads()
synchronized void	recycleByteBlocks(byte[][] blocks, int start, int end)
synchronized void	recycleCharBlocks(char[][] blocks, int numBlocks)
synchronized void	recyclePostings(Posting[] postings, int numPostings)
synchronized void	resumeAllThreads()
synchronized void	setAborting()
synchronized boolean	setFlushPending() Set flushPending if it is not already set and returns whether it was set.
void	setInfoStream(PrintStream infoStream) If non-null, various details of indexing are printed here.
void	setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms)
void	setMaxBufferedDocs(int count) Set max buffered docs, which means we will flush by doc count instead of by RAM usage.
void	setRAMBufferSizeMB(double mb) Set how much RAM we can use before flushing.
String	toMB(long v)
boolean	updateDocument(Term t, Document doc, Analyzer analyzer)
boolean	updateDocument(Document doc, Analyzer analyzer, Term delTerm)
void	writeNorms(String segmentName, int totalNumDoc) Write norms in the "true" segment format.

Field Detail

BYTE_BLOCK_MASK
final static int BYTE_BLOCK_MASK(Code)

BYTE_BLOCK_NOT_MASK
final static int BYTE_BLOCK_NOT_MASK(Code)

BYTE_BLOCK_SHIFT
final static int BYTE_BLOCK_SHIFT(Code)

BYTE_BLOCK_SIZE
final static int BYTE_BLOCK_SIZE(Code)

CHAR_BLOCK_MASK
final static int CHAR_BLOCK_MASK(Code)

CHAR_BLOCK_SHIFT
final static int CHAR_BLOCK_SHIFT(Code)

CHAR_BLOCK_SIZE
final static int CHAR_BLOCK_SIZE(Code)

MAX_TERM_LENGTH
final static int MAX_TERM_LENGTH(Code)

POSTING_NUM_BYTE
final static int POSTING_NUM_BYTE(Code)

copyByteBuffer
byte[] copyByteBuffer(Code)

levelSizeArray
final static int[] levelSizeArray(Code)

newFiles
List newFiles(Code)

nextLevelArray
final static int[] nextLevelArray(Code)

nf
NumberFormat nf(Code)

numBytesAlloc
long numBytesAlloc(Code)

numBytesUsed
long numBytesUsed(Code)

Constructor Detail

DocumentsWriter
DocumentsWriter(Directory directory, IndexWriter writer) throws IOException(Code)

Method Detail

abort
synchronized void abort(AbortException ae) throws IOException(Code)
	Called if we hit an exception when adding docs, flushing, etc. This resets our state, discarding any docs added since last flush. If ae is non-null, it contains the root cause exception (which we re-throw after we are done aborting).

abortedFiles
List abortedFiles()(Code)

addDocument
boolean addDocument(Document doc, Analyzer analyzer) throws CorruptIndexException, IOException(Code)
	Returns true if the caller (IndexWriter) should now flush.

appendPostings
void appendPostings(ThreadState.FieldData[] fields, TermInfosWriter termsOut, IndexOutput freqOut, IndexOutput proxOut) throws CorruptIndexException, IOException(Code)

bufferDeleteTerm
synchronized boolean bufferDeleteTerm(Term term) throws IOException(Code)

bufferDeleteTerms
synchronized boolean bufferDeleteTerms(Term[] terms) throws IOException(Code)

clearBufferedDeletes
synchronized void clearBufferedDeletes() throws IOException(Code)

clearFlushPending
synchronized void clearFlushPending()(Code)

close
synchronized void close()(Code)

closeDocStore
String closeDocStore() throws IOException(Code)
	Closes the current open doc stores an returns the doc store segment name. This returns null if there are * no buffered documents.

compareText
int compareText(char[] text1, int pos1, char[] text2, int pos2)(Code)

copyBytes
void copyBytes(IndexInput srcIn, IndexOutput destIn, long numBytes) throws IOException(Code)
	Copy numBytes from srcIn to destIn

createCompoundFile
void createCompoundFile(String segment) throws IOException(Code)
	Build compound file for the segment we just flushed

files
synchronized List files()(Code)

fillBytes
static void fillBytes(IndexOutput out, byte b, int numBytes) throws IOException(Code)

flush
synchronized int flush(boolean closeDocStore) throws IOException(Code)
	Flush all pending docs to a new segment

getBufferedDeleteDocIDs
synchronized List getBufferedDeleteDocIDs()(Code)

getBufferedDeleteTerms
synchronized HashMap getBufferedDeleteTerms()(Code)

getByteBlock
synchronized byte[] getByteBlock()(Code)

getCharBlock
synchronized char[] getCharBlock()(Code)

getDocStoreOffset
int getDocStoreOffset()(Code)
	Returns the doc offset into the shared doc store for the current buffered docs.

getDocStoreSegment
String getDocStoreSegment()(Code)
	Returns the current doc store segment we are writing to. This will be the same as segment when autoCommit is true.

getMaxBufferedDeleteTerms
int getMaxBufferedDeleteTerms()(Code)

getMaxBufferedDocs
int getMaxBufferedDocs()(Code)

getNumBufferedDeleteTerms
synchronized int getNumBufferedDeleteTerms()(Code)

getNumDocsInRAM
int getNumDocsInRAM()(Code)
	Returns how many docs are currently buffered in RAM.

getPostings
synchronized void getPostings(Posting[] postings)(Code)

getRAMBufferSizeMB
double getRAMBufferSizeMB()(Code)

getRAMUsed
long getRAMUsed()(Code)

getSegment
String getSegment()(Code)
	Get current segment name we are writing.

getThreadState
synchronized ThreadState getThreadState(Document doc, Term delTerm) throws IOException(Code)
	Returns a free (idle) ThreadState that may be used for indexing this one document. This call also pauses if a flush is pending. If delTerm is non-null then we buffer this deleted term after the thread state has been acquired.

hasDeletes
synchronized boolean hasDeletes()(Code)

pauseAllThreads
synchronized boolean pauseAllThreads()(Code)

recycleByteBlocks
synchronized void recycleByteBlocks(byte[][] blocks, int start, int end)(Code)

recycleCharBlocks
synchronized void recycleCharBlocks(char[][] blocks, int numBlocks)(Code)

recyclePostings
synchronized void recyclePostings(Posting[] postings, int numPostings)(Code)

resumeAllThreads
synchronized void resumeAllThreads()(Code)

setAborting
synchronized void setAborting()(Code)

setFlushPending
synchronized boolean setFlushPending()(Code)
	Set flushPending if it is not already set and returns whether it was set. This is used by IndexWriter to * trigger a single flush even when multiple threads are trying to do so.

setInfoStream
void setInfoStream(PrintStream infoStream)(Code)
	If non-null, various details of indexing are printed here.

setMaxBufferedDeleteTerms
void setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms)(Code)

setMaxBufferedDocs
void setMaxBufferedDocs(int count)(Code)
	Set max buffered docs, which means we will flush by doc count instead of by RAM usage.

setRAMBufferSizeMB
void setRAMBufferSizeMB(double mb)(Code)
	Set how much RAM we can use before flushing.

toMB
String toMB(long v)(Code)

updateDocument
boolean updateDocument(Term t, Document doc, Analyzer analyzer) throws CorruptIndexException, IOException(Code)

updateDocument
boolean updateDocument(Document doc, Analyzer analyzer, Term delTerm) throws CorruptIndexException, IOException(Code)

writeNorms
void writeNorms(String segmentName, int totalNumDoc) throws IOException(Code)
	Write norms in the "true" segment format. This is called only during commit, to create the .nrm file.

Methods inherited from java.lang.Object

native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us

All other trademarks are property of their respective owners.