| java.lang.Object it.unimi.dsi.mg4j.document.ZipDocumentCollectionBuilder
ZipDocumentCollectionBuilder | public class ZipDocumentCollectionBuilder (Code) | | A builder to create
ZipDocumentCollection s.
After creating an instance of this class, it is possible to add incrementally
new documents. Each document must be started with
ZipDocumentCollectionBuilder.startDocument(CharSequence,CharSequence) and ended with
ZipDocumentCollectionBuilder.endDocument() ; inside each document, each non-text field must be written by passing
an object to
ZipDocumentCollectionBuilder.nonTextField(Object) , whereas each text field must be
started with
ZipDocumentCollectionBuilder.startTextField() and ended with
ZipDocumentCollectionBuilder.endTextField() : inbetween, a call
to
ZipDocumentCollectionBuilder.add(MutableString,MutableString) must be made for each word/nonword pair retrieved
from the original collection. At the end,
ZipDocumentCollectionBuilder.close() returns a
it.unimi.dsi.mg4j.document.ZipDocumentCollection that must be serialised.
Alternatively, you can just call
ZipDocumentCollectionBuilder.build(DocumentSequence) and all the above will
be handled for you.
Each Zip entry corresponds to a document: the title is recorded in the comment field, whereas the
URI is written with
MutableString.writeSelfDelimUTF8(java.io.OutputStream) directly to the zipped output stream. When building an exact
subsequent word/nonword pairs are written in the same way, and
delimited by two empty strings. If the collection is not exact, just words are written,
and delimited by an empty string. Non-text fields are written directly to the zipped output stream.
|
ZipDocumentCollectionBuilder | public ZipDocumentCollectionBuilder(String zipFilename, DocumentFactory factory, boolean exact, ProgressLogger progressLogger) throws FileNotFoundException(Code) | | Creates a new zipped collection builder.
Parameters: zipFilename - the filename of the zip file. Parameters: factory - the factory of the base document sequence. Parameters: exact - true iff also non-words should be preserved. Parameters: progressLogger - a progress logger. |
add | public void add(MutableString word, MutableString nonWord) throws IOException(Code) | | Adds a word and a nonword to the current text field, provided that a text field has
but not yet
;
otherwise, doesn't do anything.
Usually, word e nonWord are just the result of a call
to
WordReader.next(MutableStringMutableString) .
Parameters: word - a word. Parameters: nonWord - a nonword. |
endTextField | public void endTextField() throws IOException(Code) | | Ends a new text field.
|
nonTextField | public void nonTextField(Object o) throws IOException(Code) | | Adds a non-text field.
Parameters: o - the content of the non-text field. |
startTextField | public void startTextField()(Code) | | Starts a new text field.
|
virtualField | public void virtualField(ObjectList<VirtualDocumentFragment> fragments) throws IOException(Code) | | Adds a virtual field.
Parameters: fragments - the virtual fragments to be added. |
|
|