| java.lang.Object org.apache.lucene.index.IndexReader
All known Subclasses: org.apache.lucene.index.MultiReader, org.apache.lucene.index.DirectoryIndexReader, org.apache.lucene.index.FilterIndexReader, org.apache.lucene.index.ParallelReader,
IndexReader | abstract public class IndexReader (Code) | | IndexReader is an abstract class, providing an interface for accessing an
index. Search of an index is done entirely through this abstract interface,
so that any subclass which implements it is searchable.
Concrete subclasses of IndexReader are usually constructed with a call to
one of the static open() methods, e.g.
IndexReader.open(String) .
For efficiency, in this API documents are often referred to via
document numbers, non-negative integers which each name a unique
document in the index. These document numbers are ephemeral--they may change
as documents are added to and deleted from an index. Clients should thus not
rely on a given document having the same number between sessions.
An IndexReader can be opened on a directory for which an IndexWriter is
opened already, but it cannot be used to delete documents from the index then.
NOTE: for backwards API compatibility, several methods are not listed
as abstract, but have no useful implementations in this base class and
instead always throw UnsupportedOperationException. Subclasses are
strongly encouraged to override these methods, but in many cases may not
need to.
version: $Id: IndexReader.java 598462 2007-11-26 23:31:39Z dnaber $ |
Inner Class :final public static class FieldOption | |
Constructor Summary | |
protected | IndexReader(Directory directory) Legacy Constructor for backwards compatibility.
This Constructor should not be used, it exists for backwards
compatibility only to support legacy subclasses that did not "own"
a specific directory, but needed to specify something to be returned
by the directory() method. | protected | IndexReader() |
Method Summary | |
protected synchronized void | acquireWriteLock() Does nothing by default. | final public synchronized void | close() Closes files associated with this index. | final protected synchronized void | commit() Commit changes resulting from delete, undeleteAll, or
setNorm operations
If an exception is hit, then either no changes or all
changes will have been committed to the index
(transactional semantics). | protected synchronized void | decRef() Decreases the refCount of this IndexReader instance. | final public synchronized void | deleteDocument(int docNum) Deletes the document numbered docNum . | final public int | deleteDocuments(Term term) Deletes all documents that have a given term indexed.
This is useful if one uses a document field to hold a unique ID string for
the document. | public Directory | directory() Returns the directory associated with this index. | abstract protected void | doClose() Implements close. | abstract protected void | doCommit() Implements commit. | abstract protected void | doDelete(int docNum) Implements deletion of the document numbered docNum . | abstract protected void | doSetNorm(int doc, String field, byte value) Implements setNorm in subclass. | abstract protected void | doUndeleteAll() Implements actual undeleteAll() in subclass. | abstract public int | docFreq(Term t) Returns the number of documents containing the term t . | public Document | document(int n) Returns the stored fields of the n th
Document in this index. | abstract public Document | document(int n, FieldSelector fieldSelector) Get the
org.apache.lucene.document.Document at the n th position. | final protected void | ensureOpen() | final public synchronized void | flush() | public static long | getCurrentVersion(String directory) Reads version number from segments files. | public static long | getCurrentVersion(File directory) Reads version number from segments files. | public static long | getCurrentVersion(Directory directory) Reads version number from segments files. | abstract public Collection | getFieldNames(FieldOption fldOption) Get a list of unique field names that exist in this index and have the specified
field option information. | synchronized int | getRefCount() | abstract public TermFreqVector | getTermFreqVector(int docNumber, String field) Return a term frequency vector for the specified document and field. | abstract public void | getTermFreqVector(int docNumber, String field, TermVectorMapper mapper) Load the Term Vector into a user-defined data structure instead of relying on the parallel arrays of
the
TermFreqVector .
Parameters: docNumber - The number of the document to load the vector for Parameters: field - The name of the field to load Parameters: mapper - The TermVectorMapper to process the vector. | abstract public void | getTermFreqVector(int docNumber, TermVectorMapper mapper) Map all the term vectors for all fields in a Document
Parameters: docNumber - The number of the document to load the vector for Parameters: mapper - The TermVectorMapper to process the vector. | abstract public TermFreqVector[] | getTermFreqVectors(int docNumber) Return an array of term frequency vectors for the specified document.
The array contains a vector for each vectorized field in the document.
Each vector contains terms and frequencies for all terms in a given vectorized field.
If no such fields existed, the method returns null. | public int | getTermInfosIndexDivisor() For IndexReader implementations that use
TermInfosReader to read terms, this returns the
current indexDivisor. | public long | getVersion() Version number when this IndexReader was opened. | abstract public boolean | hasDeletions() | public boolean | hasNorms(String field) Returns true if there are norms stored for this field. | protected synchronized void | incRef() Increments the refCount of this IndexReader instance. | public static boolean | indexExists(String directory) Returns true if an index exists at the specified directory. | public static boolean | indexExists(File directory) Returns true if an index exists at the specified directory. | public static boolean | indexExists(Directory directory) Returns true if an index exists at the specified directory. | public boolean | isCurrent() Check whether this IndexReader is still using the
current (i.e., most recently committed) version of the
index. | abstract public boolean | isDeleted(int n) | public static boolean | isLocked(Directory directory) Returns true iff the index in the named directory is
currently locked. | public static boolean | isLocked(String directory) Returns true iff the index in the named directory is
currently locked. | public boolean | isOptimized() Checks is the index is optimized (if it has a single segment and
no deletions). | public static long | lastModified(String directory) Returns the time the index in the named directory was last modified.
Do not use this to check whether the reader is still up-to-date, use
IndexReader.isCurrent() instead. | public static long | lastModified(File fileDirectory) Returns the time the index in the named directory was last modified. | public static long | lastModified(Directory directory2) Returns the time the index in the named directory was last modified. | public static void | main(String[] args) Prints the filename and size of each file within a given compound file. | abstract public int | maxDoc() Returns one greater than the largest possible document number. | abstract public byte[] | norms(String field) Returns the byte-encoded normalization factor for the named field of
every document. | abstract public void | norms(String field, byte[] bytes, int offset) Reads the byte-encoded normalization factor for the named field of every
document. | abstract public int | numDocs() Returns the number of documents in this index. | public static IndexReader | open(String path) Returns an IndexReader reading the index in an FSDirectory in the named
path. | public static IndexReader | open(File path) Returns an IndexReader reading the index in an FSDirectory in the named
path. | public static IndexReader | open(Directory directory) Returns an IndexReader reading the index in the given Directory. | public static IndexReader | open(Directory directory, IndexDeletionPolicy deletionPolicy) Expert: returns an IndexReader reading the index in the given
Directory, with a custom
IndexDeletionPolicy . | public synchronized IndexReader | reopen() Refreshes an IndexReader if the index has changed since this instance
was (re)opened. | final public synchronized void | setNorm(int doc, String field, byte value) Expert: Resets the normalization factor for the named field of the named
document. | public void | setNorm(int doc, String field, float value) Expert: Resets the normalization factor for the named field of the named
document. | public void | setTermInfosIndexDivisor(int indexDivisor) For IndexReader implementations that use
TermInfosReader to read terms, this sets the
indexDivisor to subsample the number of indexed terms
loaded into memory. | public TermDocs | termDocs(Term term) Returns an enumeration of all the documents which contain
term . | abstract public TermDocs | termDocs() Returns an unpositioned
TermDocs enumerator. | public TermPositions | termPositions(Term term) Returns an enumeration of all the documents which contain
term . | abstract public TermPositions | termPositions() Returns an unpositioned
TermPositions enumerator. | abstract public TermEnum | terms() Returns an enumeration of all the terms in the index. | abstract public TermEnum | terms(Term t) Returns an enumeration of all terms starting at a given term. | final public synchronized void | undeleteAll() Undeletes all documents currently marked as deleted in this index. | public static void | unlock(Directory directory) Forcibly unlocks the index in the named directory. |
hasChanges | protected boolean hasChanges(Code) | | |
IndexReader | protected IndexReader(Directory directory)(Code) | | Legacy Constructor for backwards compatibility.
This Constructor should not be used, it exists for backwards
compatibility only to support legacy subclasses that did not "own"
a specific directory, but needed to specify something to be returned
by the directory() method. Future subclasses should delegate to the
no arg constructor and implement the directory() method as appropriate.
Parameters: directory - Directory to be returned by the directory() method See Also: IndexReader.directory() |
IndexReader | protected IndexReader()(Code) | | |
acquireWriteLock | protected synchronized void acquireWriteLock() throws IOException(Code) | | Does nothing by default. Subclasses that require a write lock for
index modifications must implement this method.
|
close | final public synchronized void close() throws IOException(Code) | | Closes files associated with this index.
Also saves any new deletions to disk.
No other methods should be called after this has been called.
throws: IOException - if there is a low-level IO error |
commit | final protected synchronized void commit() throws IOException(Code) | | Commit changes resulting from delete, undeleteAll, or
setNorm operations
If an exception is hit, then either no changes or all
changes will have been committed to the index
(transactional semantics).
throws: IOException - if there is a low-level IO error |
decRef | protected synchronized void decRef() throws IOException(Code) | | Decreases the refCount of this IndexReader instance. If the refCount drops
to 0, then pending changes are committed to the index and this reader is closed.
throws: IOException - in case an IOException occurs in commit() or doClose() |
directory | public Directory directory()(Code) | | Returns the directory associated with this index. The Default
implementation returns the directory specified by subclasses when
delegating to the IndexReader(Directory) constructor, or throws an
UnsupportedOperationException if one was not specified.
throws: UnsupportedOperationException - if no directory |
doCommit | abstract protected void doCommit() throws IOException(Code) | | Implements commit.
|
docFreq | abstract public int docFreq(Term t) throws IOException(Code) | | Returns the number of documents containing the term t .
throws: IOException - if there is a low-level IO error |
getCurrentVersion | public static long getCurrentVersion(String directory) throws CorruptIndexException, IOException(Code) | | Reads version number from segments files. The version number is
initialized with a timestamp and then increased by one for each change of
the index.
Parameters: directory - where the index resides. version number. throws: CorruptIndexException - if the index is corrupt throws: IOException - if there is a low-level IO error |
getCurrentVersion | public static long getCurrentVersion(File directory) throws CorruptIndexException, IOException(Code) | | Reads version number from segments files. The version number is
initialized with a timestamp and then increased by one for each change of
the index.
Parameters: directory - where the index resides. version number. throws: CorruptIndexException - if the index is corrupt throws: IOException - if there is a low-level IO error |
getCurrentVersion | public static long getCurrentVersion(Directory directory) throws CorruptIndexException, IOException(Code) | | Reads version number from segments files. The version number is
initialized with a timestamp and then increased by one for each change of
the index.
Parameters: directory - where the index resides. version number. throws: CorruptIndexException - if the index is corrupt throws: IOException - if there is a low-level IO error |
getFieldNames | abstract public Collection getFieldNames(FieldOption fldOption)(Code) | | Get a list of unique field names that exist in this index and have the specified
field option information.
Parameters: fldOption - specifies which field option should be available for the returned fields Collection of Strings indicating the names of the fields. See Also: IndexReader.FieldOption |
getRefCount | synchronized int getRefCount()(Code) | | |
getTermFreqVector | abstract public TermFreqVector getTermFreqVector(int docNumber, String field) throws IOException(Code) | | Return a term frequency vector for the specified document and field. The
returned vector contains terms and frequencies for the terms in
the specified field of this document, if the field had the storeTermVector
flag set. If termvectors had been stored with positions or offsets, a
TermPositionsVector is returned.
Parameters: docNumber - document for which the term frequency vector is returned Parameters: field - field for which the term frequency vector is returned. term frequency vector May be null if field does not exist in the specifieddocument or term vector was not stored. throws: IOException - if index cannot be accessed See Also: org.apache.lucene.document.Field.TermVector |
getTermFreqVector | abstract public void getTermFreqVector(int docNumber, String field, TermVectorMapper mapper) throws IOException(Code) | | Load the Term Vector into a user-defined data structure instead of relying on the parallel arrays of
the
TermFreqVector .
Parameters: docNumber - The number of the document to load the vector for Parameters: field - The name of the field to load Parameters: mapper - The TermVectorMapper to process the vector. Must not be null throws: IOException - if term vectors cannot be accessed or if they do not exist on the field and doc. specified. |
getTermFreqVector | abstract public void getTermFreqVector(int docNumber, TermVectorMapper mapper) throws IOException(Code) | | Map all the term vectors for all fields in a Document
Parameters: docNumber - The number of the document to load the vector for Parameters: mapper - The TermVectorMapper to process the vector. Must not be null throws: IOException - if term vectors cannot be accessed or if they do not exist on the field and doc. specified. |
getTermFreqVectors | abstract public TermFreqVector[] getTermFreqVectors(int docNumber) throws IOException(Code) | | Return an array of term frequency vectors for the specified document.
The array contains a vector for each vectorized field in the document.
Each vector contains terms and frequencies for all terms in a given vectorized field.
If no such fields existed, the method returns null. The term vectors that are
returned my either be of type TermFreqVector or of type TermPositionsVector if
positions or offsets have been stored.
Parameters: docNumber - document for which term frequency vectors are returned array of term frequency vectors. May be null if no term vectors have beenstored for the specified document. throws: IOException - if index cannot be accessed See Also: org.apache.lucene.document.Field.TermVector |
getTermInfosIndexDivisor | public int getTermInfosIndexDivisor()(Code) | | For IndexReader implementations that use
TermInfosReader to read terms, this returns the
current indexDivisor.
See Also: IndexReader.setTermInfosIndexDivisor See Also: |
getVersion | public long getVersion()(Code) | | Version number when this IndexReader was opened. Not implemented in the IndexReader base class.
throws: UnsupportedOperationException - unless overridden in subclass |
hasDeletions | abstract public boolean hasDeletions()(Code) | | Returns true if any documents have been deleted
|
hasNorms | public boolean hasNorms(String field) throws IOException(Code) | | Returns true if there are norms stored for this field.
|
incRef | protected synchronized void incRef()(Code) | | Increments the refCount of this IndexReader instance. RefCounts are used to determine
when a reader can be closed safely, i. e. as soon as no other IndexReader is referencing
it anymore.
|
indexExists | public static boolean indexExists(String directory)(Code) | | Returns true if an index exists at the specified directory.
If the directory does not exist or if there is no index in it.
false is returned.
Parameters: directory - the directory to check for an index true if an index exists; false otherwise |
indexExists | public static boolean indexExists(File directory)(Code) | | Returns true if an index exists at the specified directory.
If the directory does not exist or if there is no index in it.
Parameters: directory - the directory to check for an index true if an index exists; false otherwise |
indexExists | public static boolean indexExists(Directory directory) throws IOException(Code) | | Returns true if an index exists at the specified directory.
If the directory does not exist or if there is no index in it.
Parameters: directory - the directory to check for an index true if an index exists; false otherwise throws: IOException - if there is a problem with accessing the index |
isCurrent | public boolean isCurrent() throws CorruptIndexException, IOException(Code) | | Check whether this IndexReader is still using the
current (i.e., most recently committed) version of the
index. If a writer has committed any changes to the
index since this reader was opened, this will return
false , in which case you must open a new
IndexReader in order to see the changes. See the
description of the autoCommit
flag which controls when the
IndexWriter actually commits changes to the index.
Not implemented in the IndexReader base class.
throws: CorruptIndexException - if the index is corrupt throws: IOException - if there is a low-level IO error throws: UnsupportedOperationException - unless overridden in subclass |
isDeleted | abstract public boolean isDeleted(int n)(Code) | | Returns true if document n has been deleted
|
isLocked | public static boolean isLocked(Directory directory) throws IOException(Code) | | Returns true iff the index in the named directory is
currently locked.
Parameters: directory - the directory to check for a lock throws: IOException - if there is a low-level IO error |
isLocked | public static boolean isLocked(String directory) throws IOException(Code) | | Returns true iff the index in the named directory is
currently locked.
Parameters: directory - the directory to check for a lock throws: IOException - if there is a low-level IO error |
isOptimized | public boolean isOptimized()(Code) | | Checks is the index is optimized (if it has a single segment and
no deletions). Not implemented in the IndexReader base class.
true if the index is optimized; false otherwise throws: UnsupportedOperationException - unless overridden in subclass |
main | public static void main(String[] args)(Code) | | Prints the filename and size of each file within a given compound file.
Add the -extract flag to extract files to the current working directory.
In order to make the extracted version of the index work, you have to copy
the segments file from the compound index into the directory where the extracted files are stored.
Parameters: args - Usage: org.apache.lucene.index.IndexReader [-extract] <cfsfile> |
maxDoc | abstract public int maxDoc()(Code) | | Returns one greater than the largest possible document number.
This may be used to, e.g., determine how big to allocate an array which
will have an element for every document number in an index.
|
numDocs | abstract public int numDocs()(Code) | | Returns the number of documents in this index.
|
reopen | public synchronized IndexReader reopen() throws CorruptIndexException, IOException(Code) | | Refreshes an IndexReader if the index has changed since this instance
was (re)opened.
Opening an IndexReader is an expensive operation. This method can be used
to refresh an existing IndexReader to reduce these costs. This method
tries to only load segments that have changed or were created after the
IndexReader was (re)opened.
If the index has not changed since this instance was (re)opened, then this
call is a NOOP and returns this instance. Otherwise, a new instance is
returned. The old instance is not closed and remains usable.
Note: The re-opened reader instance and the old instance might share
the same resources. For this reason no index modification operations
(e. g.
IndexReader.deleteDocument(int) ,
IndexReader.setNorm(int,String,byte) )
should be performed using one of the readers until the old reader instance
is closed. Otherwise, the behavior of the readers is undefined.
You can determine whether a reader was actually reopened by comparing the
old instance with the instance returned by this method:
IndexReader reader = ...
...
IndexReader new = r.reopen();
if (new != reader) {
... // reader was reopened
reader.close();
}
reader = new;
...
throws: CorruptIndexException - if the index is corrupt throws: IOException - if there is a low-level IO error |
setTermInfosIndexDivisor | public void setTermInfosIndexDivisor(int indexDivisor) throws IllegalStateException(Code) | | For IndexReader implementations that use
TermInfosReader to read terms, this sets the
indexDivisor to subsample the number of indexed terms
loaded into memory. This has the same effect as
IndexWriter.setTermIndexInterval except that setting
must be done at indexing time while this setting can be
set per reader. When set to N, then one in every
N*termIndexInterval terms in the index is loaded into
memory. By setting this to a value > 1 you can reduce
memory usage, at the expense of higher latency when
loading a TermInfo. The default value is 1.
NOTE: you must call this before the term
index is loaded. If the index is already loaded,
an IllegalStateException is thrown.
throws: IllegalStateException - if the term index has already been loaded into memory |
termDocs | public TermDocs termDocs(Term term) throws IOException(Code) | | Returns an enumeration of all the documents which contain
term . For each document, the document number, the frequency of
the term in that document is also provided, for use in search scoring.
Thus, this method implements the mapping:
The enumeration is ordered by document number. Each document number
is greater than all that precede it in the enumeration.
throws: IOException - if there is a low-level IO error |
termPositions | public TermPositions termPositions(Term term) throws IOException(Code) | | Returns an enumeration of all the documents which contain
term . For each document, in addition to the document number
and frequency of the term in that document, a list of all of the ordinal
positions of the term in the document is available. Thus, this method
implements the mapping:
Term => <docNum, freq,
<pos1, pos2, ...
posfreq-1>
>*
This positional information facilitates phrase and proximity searching.
The enumeration is ordered by document number. Each document number is
greater than all that precede it in the enumeration.
throws: IOException - if there is a low-level IO error |
terms | abstract public TermEnum terms() throws IOException(Code) | | Returns an enumeration of all the terms in the index. The
enumeration is ordered by Term.compareTo(). Each term is greater
than all that precede it in the enumeration. Note that after
calling terms(),
TermEnum.next must be called
on the resulting enumeration before calling other methods such as
TermEnum.term .
throws: IOException - if there is a low-level IO error |
terms | abstract public TermEnum terms(Term t) throws IOException(Code) | | Returns an enumeration of all terms starting at a given term. If
the given term does not exist, the enumeration is positioned at the
first term greater than the supplied therm. The enumeration is
ordered by Term.compareTo(). Each term is greater than all that
precede it in the enumeration.
throws: IOException - if there is a low-level IO error |
unlock | public static void unlock(Directory directory) throws IOException(Code) | | Forcibly unlocks the index in the named directory.
Caution: this should only be used by failure recovery code,
when it is known that no other process nor thread is in fact
currently accessing this index.
|
|
|