it.unimi.dsi.mg4j.index

Java Source Code / Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server geronimo
26.	EJB Server GlassFish
27.	EJB Server JBoss 4.2.1
28.	EJB Server resin 3.1.5
29.	ERP CRM Financial
30.	ESB
31.	Forum
32.	GIS
33.	Graphic Library
34.	Groupware
35.	HTML Parser
36.	IDE
37.	IDE Eclipse
38.	IDE Netbeans
39.	Installer
40.	Internationalization Localization
41.	Inversion of Control
42.	Issue Tracking
43.	J2EE
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Net
50.	Parser
51.	PDF
52.	Portal
53.	Profiler
54.	Project Management
55.	Report
56.	RSS RDF
57.	Rule Engine
58.	Science
59.	Scripting
60.	Search Engine
61.	Security
62.	Sevlet Container
63.	Source Control
64.	Swing Library
65.	Template Engine
66.	Test Coverage
67.	Testing
68.	UML
69.	Web Crawler
70.	Web Framework
71.	Web Mail
72.	Web Server
73.	Web Services
74.	Web Services apache cxf 2.0.1
75.	Web Services AXIS2
76.	Wiki Engine
77.	Workflow Engines
78.	XML
79.	XML UI

Java

Java Tutorial

Illustrator Tutorials

GIMP Tutorials

C# / C Sharp

C# / CSharp Tutorial

C# / CSharp Open Source

SQL Server / T-SQL Tutorial

Oracle PL / SQL

Oracle PL/SQL Tutorial

Flash / Flex / ActionScript

VBA / Excel / Access / Word

XML

XML Tutorial

Microsoft Office PowerPoint 2007 Tutorial

Microsoft Office Excel 2007 Tutorial

Microsoft Office Word 2007 Tutorial

Java Source Code / Java Documentation » Search Engine » mg4j » it.unimi.dsi.mg4j.index

it.unimi.dsi.mg4j.index
MG4J: Managing Gigabytes for Java Index generation and access. This package contains the classes that handle index generation and access. The interval iterators defined in {@link it.unimi.dsi.mg4j.search} build upon the classes of this package to provide answer to queries using interval semantics, but it is also possible to access an index directly. You can easily build indices using the tools in {@link it.unimi.dsi.mg4j.tool}. Once an index has been built, it can be opened using an {@link it.unimi.dsi.mg4j.index.Index} object, which gathers metadata that is necessary to access the index. You do not create an {@link it.unimi.dsi.mg4j.index.Index} with a constructor: rather, you use the static factory {@link it.unimi.dsi.mg4j.index.Index#getInstance(CharSequence)} (or one of its variants) to create an instance. This is necessary so that different kind of indices can be treated transparently: for example, the factory may return a {@link it.unimi.dsi.mg4j.index.cluster.IndexCluster} if the index is actually a cluster, but you do not need to know that. From an {@link it.unimi.dsi.mg4j.index.Index}, you can easily obtain either an {@link it.unimi.dsi.mg4j.index.IndexReader}, which allows to scan sequentially or randomly the index. In turn from an {@link it.unimi.dsi.mg4j.index.IndexReader} you can obtain a {@link it.unimi.dsi.mg4j.index.IndexIterator} returning the documents containing a certain term and the position of the term within the document. But there is more: an {@link it.unimi.dsi.mg4j.index.IndexIterator} is a kind of {@link it.unimi.dsi.mg4j.search.DocumentIterator}, and {@link it.unimi.dsi.mg4j.search.DocumentIterator}s can be combined in several ways using the classes of the package {@link it.unimi.dsi.mg4j.search}: for instance, you can combine document iterators using AND/OR. Note that you can combine document iterators on different indices, but of course the operation is meaningful only if the two indices contain different information about the same document collection (e.g., title and main text). More importantly, if the index is full text (the default) for each document containing the term you can get interval iterators that return intervals representing extents of text satisfying the query: for instance, in case of an AND of two terms, the intervals will contain both terms. Structure of an inverted index An inverted index is made by a sequence of inverted lists (one inverted list for each term). Inverted lists are made by document records: each document record contains information about the occurrences of the term within a certain document. More precisely, each inverted list starts with a suitably encoded integer, called the frequency, which is the number of document records that will follow (i.e., the number of documents in which the term appears). After that, there are exactly as many document records as the frequency. Each document record is made by two parts: a suitably encoded integer, called the (document) pointer, which identifies the document within the collection; a (possibly empty) sequence of bits, called the data; the data have no special structure per se: the only assumption is that they are a self-delimiting bit sequence (i.e., one knows when the sequence is over). As a basic and fundamental implementation, the classes of this package provide methods that write and read document data in a default form. In this default structure, each document data is a suitable coding of a (strictly increasing) sequence of integers, that correspond to the positions where the term occurs within the document. The length of the sequence (i.e., the number of positions in at which the term appears) is called the count (it is also common to call it “within-document frequency”, but we find this usage confusing).
Java Source File Name	Type	Comment
AbstractBitStreamIndexWriter.java	Class	An abstract bitstream-based index writer, providing common variables and a basic AbstractBitStreamIndexWriter.printStats(PrintStream) implementation.
AbstractIndexIterator.java	Class	A very basic abstract implementation of an index interator, providing an obvious implementation of AbstractIndexIterator.term() , AbstractIndexIterator.id() , and of the .
AbstractIndexReader.java	Class	An abstract, implementation of an index reader.
AbstractPrefixMap.java	Class	An abstract implementation of a map from prefixes to term intervals.
AbstractTermMap.java	Class	An abstract implementation of a map from term to term indices.
BitStreamHPIndex.java	Class	A index. Implementing subclasses must provide access to the index bitstream (as it happens for a BitStreamIndex ) but also to the positions stream, both at and level. Wired implementations The standard readers associated to an instance of this class are of type BitStreamHPIndexReader . Nonetheless, it is possible to generate automatically sources for wired classes that work only for a particular set of codings and flags.
BitStreamHPIndexReader.java	Class	A bitstream-based for .
BitStreamHPIndexWriter.java	Class	Writes a bitstream-based high-performance index.
BitStreamIndex.java	Class	A index.
BitStreamIndexReader.java	Class	A bitstream-based .
BitStreamIndexWriter.java	Class	Writes a bitstream-based interleaved index. Offsets bit stream An inverted index may have an associated OutputBitStream of offsets: this file contains `T+1` integers, where `T` is the number of inverted lists (i.e., the number of terms), and the `i`-th entry is a suitable coding of the position in bits where the `i`-th inverted list starts (the last entry is actually the length, in bytes, of the inverted index file itself).
CachingOutputBitStream.java	Class	A special output bit stream with an additional method CachingOutputBitStream.buffer() that returns the internal buffer if the internal buffer contains all that has been written since the last call to OutputBitStream.position(long) position(0) . This bit stream is used every time that it is necessary to cache quickly a bit stream. By sizing the buffer size appropriately, most of the times data written to the stream will be readable directly from the buffer.
CompressionFlags.java	Class	A container for constants and enums related to index compression.
DiskBasedIndex.java	Class	A static container providing facilities to load an index based on data stored on disk. This class contains several useful static methods such as DiskBasedIndex.readOffsets(InputBitStream,int) and DiskBasedIndex.readSizes(InputBitStream,int) , and static factor methods such as DiskBasedIndex.getInstance(CharSequence,boolean,boolean,boolean,EnumMap) that take care of reading the properties associated to the index, identify the correct it.unimi.dsi.mg4j.index.Index implementation that should be used to load the index, and load the necessary data into memory.
DowncaseTermProcessor.java	Class	A term processor downcasing all characters.
FileHPIndex.java	Class	A file-based high-performance index.
FileIndex.java	Class	A file-based index.
Index.java	Class	An abstract representation of an index. Concrete subclasses of this class represent abstract index access information: for instance, the basename or IP address/port, flags, etc.
IndexIterator.java	Interface	An iterator over an inverted list. An index iterator scans the inverted list of an indexed term.
IndexIterators.java	Class	A class providing static methods and objects that do useful things with index iterators.
IndexReader.java	Interface	Provides access to an inverted index. An it.unimi.dsi.mg4j.index.Index contains global read-only metadata.
IndexWriter.java	Interface	An interface for classes that generate indices. Implementations of this interface are used to write inverted lists in sequential order, as follows: to create a new inverted list, you must call IndexWriter.newInvertedList() ; then, you must specified the frequency using IndexWriter.writeFrequency(int) ; the document records follow; before writing a new document record, you must call IndexWriter.newDocumentRecord() ; note that, all in all, the number of calls to IndexWriter.newDocumentRecord() must be equal to the frequency; for each document record, you must supply the information needed for the index you are building ( , , , and , in this order). IndexWriter.newDocumentRecord() returns an OutputBitStream that must be used to write the document-record data.
InMemoryHPIndex.java	Class	A BitStreamHPIndex index loaded in memory.
InMemoryIndex.java	Class	A local bitstream index loaded in memory.
MemoryMappedHPIndex.java	Class	A memory-mapped BitStreamHPIndex .
MemoryMappedIndex.java	Class	A local memory-mapped bistream index. Memory-mapped indices are created by mapping the index file into memory using a MappedByteBuffer .
MultiTermIndexIterator.java	Class	A virtual that merges several component index iterators. This class adds to it.unimi.dsi.mg4j.search.AbstractUnionDocumentIterator an interval interator generating the OR of the intervals returned for each of the documents involved. The main difference with an OrDocumentIterator built on the same array of component iterators is that this class implements IndexIterator and hence provides a MultiTermIndexIterator.count() (the sum of counts of those component iterators positioned on the current document) and a MultiTermIndexIterator.frequency() .
NullTermProcessor.java	Class	A term processor that accepts all terms and does not do any processing.
PrefixMap.java	Interface	A map from prefixes to term intervals (and possibly viceversa). Given a list of terms in lexicographic order numbered from 0, we can ask, given a prefix, which interval of terms starts with the given prefix.
SkipBitStreamIndexWriter.java	Class	Provides facilities to write skip inverted indices, that is, inverted indices with an additional skip structure.
TermMap.java	Interface	A map from terms to numbers (and possibly viceversa).
TermMaps.java	Class	A class providing static methods and objects that do useful things with and .
TermProcessor.java	Interface	A term processor, implementing term/prefix transformation and possibly term/prefix filtering. Index contruction requires sometimes modifications of the given terms: downcasing, stemming, and so on.
TooManyTermsException.java	Class	Thrown to indicate that a prefix query generated too many terms.

www.java2java.com | Contact Us

All other trademarks are property of their respective owners.

it.unimi.dsi.mg4j.index

Structure of an inverted index

Wired implementations

Offsets bit stream