| java.lang.Object com.lowagie.text.pdf.hyphenation.TernaryTree com.lowagie.text.pdf.hyphenation.HyphenationTree
HyphenationTree | public class HyphenationTree extends TernaryTree implements PatternConsumer(Code) | | This tree structure stores the hyphenation patterns in an efficient
way for fast lookup. It provides the provides the method to
hyphenate a word.
author: Carlos Villegas |
Method Summary | |
public void | addClass(String chargroup) Add a character class to the tree. | public void | addException(String word, ArrayList hyphenatedword) Add an exception to the tree. | public void | addPattern(String pattern, String ivalue) Add a pattern to the tree. | public String | findPattern(String pat) | protected byte[] | getValues(int k) | protected int | hstrcmp(char[] s, int si, char[] t, int ti) | public Hyphenation | hyphenate(String word, int remainCharCount, int pushCharCount) Hyphenate word and return a Hyphenation object.
Parameters: word - the word to be hyphenated Parameters: remainCharCount - Minimum number of characters allowedbefore the hyphenation point. Parameters: pushCharCount - Minimum number of characters allowed afterthe hyphenation point. | public Hyphenation | hyphenate(char[] w, int offset, int len, int remainCharCount, int pushCharCount) Hyphenate word and return an array of hyphenation points.
Parameters: w - char array that contains the word Parameters: offset - Offset to first character in word Parameters: len - Length of word Parameters: remainCharCount - Minimum number of characters allowedbefore the hyphenation point. Parameters: pushCharCount - Minimum number of characters allowed afterthe hyphenation point. | public void | loadSimplePatterns(InputStream stream) | protected int | packValues(String values) Packs the values by storing them in 4 bits, two values into a byte
Values range is from 0 to 9. | public void | printStats() | protected void | searchPatterns(char[] word, int index, byte[] il) Search for all possible partial matches of word starting
at index an update interletter values. | protected String | unpackValues(int k) |
stoplist | protected HashMap stoplist(Code) | | This map stores hyphenation exceptions
|
vspace | protected ByteVector vspace(Code) | | value space: stores the inteletter values
|
HyphenationTree | public HyphenationTree()(Code) | | |
addClass | public void addClass(String chargroup)(Code) | | Add a character class to the tree. It is used by
SimplePatternParser SimplePatternParser as callback to
add character classes. Character classes define the
valid word characters for hyphenation. If a word contains
a character not defined in any of the classes, it is not hyphenated.
It also defines a way to normalize the characters in order
to compare them with the stored patterns. Usually pattern
files use only lower case characters, in this case a class
for letter 'a', for example, should be defined as "aA", the first
character being the normalization char.
|
addException | public void addException(String word, ArrayList hyphenatedword)(Code) | | Add an exception to the tree. It is used by
SimplePatternParser SimplePatternParser class as callback to
store the hyphenation exceptions.
Parameters: word - normalized word Parameters: hyphenatedword - a vector of alternating strings andHyphen hyphen objects. |
addPattern | public void addPattern(String pattern, String ivalue)(Code) | | Add a pattern to the tree. Mainly, to be used by
SimplePatternParser SimplePatternParser class as callback to
add a pattern to the tree.
Parameters: pattern - the hyphenation pattern Parameters: ivalue - interletter weight values indicating thedesirability and priority of hyphenating at a given pointwithin the pattern. It should contain only digit characters.(i.e. '0' to '9'). |
getValues | protected byte[] getValues(int k)(Code) | | |
hstrcmp | protected int hstrcmp(char[] s, int si, char[] t, int ti)(Code) | | String compare, returns 0 if equal or
t is a substring of s
|
hyphenate | public Hyphenation hyphenate(String word, int remainCharCount, int pushCharCount)(Code) | | Hyphenate word and return a Hyphenation object.
Parameters: word - the word to be hyphenated Parameters: remainCharCount - Minimum number of characters allowedbefore the hyphenation point. Parameters: pushCharCount - Minimum number of characters allowed afterthe hyphenation point. a Hyphenation Hyphenation object representingthe hyphenated word or null if word is not hyphenated. |
hyphenate | public Hyphenation hyphenate(char[] w, int offset, int len, int remainCharCount, int pushCharCount)(Code) | | Hyphenate word and return an array of hyphenation points.
Parameters: w - char array that contains the word Parameters: offset - Offset to first character in word Parameters: len - Length of word Parameters: remainCharCount - Minimum number of characters allowedbefore the hyphenation point. Parameters: pushCharCount - Minimum number of characters allowed afterthe hyphenation point. a Hyphenation Hyphenation object representingthe hyphenated word or null if word is not hyphenated. |
packValues | protected int packValues(String values)(Code) | | Packs the values by storing them in 4 bits, two values into a byte
Values range is from 0 to 9. We use zero as terminator,
so we'll add 1 to the value.
Parameters: values - a string of digits from '0' to '9' representing theinterletter values. the index into the vspace array where the packed valuesare stored. |
printStats | public void printStats()(Code) | | |
searchPatterns | protected void searchPatterns(char[] word, int index, byte[] il)(Code) | | Search for all possible partial matches of word starting
at index an update interletter values. In other words, it
does something like:
for(i=0; i
But it is done in an efficient way since the patterns are
stored in a ternary tree. In fact, this is the whole purpose
of having the tree: doing this search without having to test
every single pattern. The number of patterns for languages
such as English range from 4000 to 10000. Thus, doing thousands
of string comparisons for each word to hyphenate would be
really slow without the tree. The tradeoff is memory, but
using a ternary tree instead of a trie, almost halves the
the memory used by Lout or TeX. It's also faster than using
a hash table
Parameters: word - null terminated word to match Parameters: index - start index from word Parameters: il - interletter values array to update |
Methods inherited from com.lowagie.text.pdf.hyphenation.TernaryTree | public void balance()(Code)(Java Doc) public Object clone()(Code)(Java Doc) public int find(String key)(Code)(Java Doc) public int find(char[] key, int start)(Code)(Java Doc) protected void init()(Code)(Java Doc) public void insert(String key, char val)(Code)(Java Doc) public void insert(char[] key, int start, char val)(Code)(Java Doc) protected void insertBalanced(String[] k, char[] v, int offset, int n)(Code)(Java Doc) public Enumeration keys()(Code)(Java Doc) public boolean knows(String key)(Code)(Java Doc) public void printStats()(Code)(Java Doc) public int size()(Code)(Java Doc) public static int strcmp(char[] a, int startA, char[] b, int startB)(Code)(Java Doc) public static int strcmp(String str, char[] a, int start)(Code)(Java Doc) public static void strcpy(char[] dst, int di, char[] src, int si)(Code)(Java Doc) public static int strlen(char[] a, int start)(Code)(Java Doc) public static int strlen(char[] a)(Code)(Java Doc) public void trimToSize()(Code)(Java Doc)
|
|
|