Java Doc for DefaultWordFinder.java in  » Content-Management-System » hippo-cms » nl » hippo » cms » spellchecking » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Content Management System » hippo cms » nl.hippo.cms.spellchecking 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   nl.hippo.cms.spellchecking.DefaultWordFinder

All known Subclasses:   nl.hippo.cms.spellchecking.XMLWordFinder,
DefaultWordFinder
public class DefaultWordFinder (Code)
A word finder for normal text documents, which searches text for sequences of words and text blocks.This class also defines common methods and behaviour for the various word finding subclasses.
See Also:   java.util.StringTokenizer
See Also:   java.text.BreakIterator
See Also:   TeXWordFinder
See Also:   XMLWordFinder
author:
   Bruno Martins
author:
   Jeroen Reijn


Field Summary
protected  intcurrentSegmentPos
     The index of the current segment in the input text.
protected  StringcurrentWord
     A string with the current word for the word finder.
protected  intcurrentWordPos
     The index of the current word in the input text.
protected  intnextSegmentPos
     The index of the next segment in the input text.
protected  StringnextWord
     A string with the word next to the current one.
protected  intnextWordPos
     The index of the next word in the input text.
protected  BreakIteratorsentenceIterator
     An iterator over the input text.
protected  booleansolveHardCases
     Solve the tokenization hard cases.
protected  booleanstartsSentence
     A boolean flag indicating if the current word marks the begining of a sentence.
protected  Stringtext
     The input text.

Constructor Summary
public  DefaultWordFinder(String inText)
     Constructor for DefaultWordFinder.
public  DefaultWordFinder()
     Constructor for DefaultWordFinder.

Method Summary
public  Stringcurrent()
     Returns the current word in the text.
public  StringcurrentNGram(int n)
     Returns the current word N-gram from the input.
public  StringcurrentSegment()
     Returns the current text segment from the input.
public  StringcurrentWordGram(int n)
     Returns the current word N-gram from the input.
public  StringgetText()
     Returns the text associated with this DefaultWordFinder.
public  booleanhasNext()
     Tests if there are more words available from the text.
protected  intignore(int index, char startIgnore)
     Ignore all characters from the text after the first occurence of a given character.
Parameters:
  index - A starting index for the text from where characters should be ignored
Parameters:
  startIgnore - The character that marks the begining of the sequence to be ignored.
protected  intignore(int index, char startIgnore, char endIgnore)
     Ignore all characters from the text between the first occurence of a given character and the next occurence of another given character.
Parameters:
  index - A starting index for the text from where characters should be ignored.
Parameters:
  startIgnore - The character that marks the begining of the sequence to be ignored.
Parameters:
  endIgnore - The character that marks the ending of the sequence to be ignored.
protected  intignore(int index, Character startIgnore, Character endIgnore)
     Ignore all characters from the text between the first occurence of a given character and the next occurence of another given character.
Parameters:
  index - A starting index for the text from where characters should be ignored.
Parameters:
  startIgnore - The character that marks the begining of the sequence to be ignored.
Parameters:
  endIgnore - The character that marks the ending of the sequence to be ignored, or nullif all the next characters from the text are to be ignored.
protected  intignore(int index, String startIgnore, String endIgnore)
     Ignore all characters from the text between the first occurence of a given String and the next occurence of another given String.
Parameters:
  index - A starting index for the text from where characters should be ignored.
Parameters:
  startIgnore - The String that marks the begining of the sequence to be ignored.
Parameters:
  endIgnore - The String that marks the ending of the sequence to be ignored.
protected static  booleanisWordChar(String text, int posn)
     Checks if the character at a given position in a String is part of a word. Special characters such as '.' or '-' are considered alphanumeric or not depending on the surrounding characters.
protected static  booleanisWordChar(char c)
     Checks if a given character is alphanumeric.
Parameters:
  c - The char to check.
public  StringlookAhead()
     Retuns the next word without advancing the tokenizer, cheking if the character separating both words is an empty space.
public  Stringnext()
     This method scans the text from the end of the last word, and returns a String corresponding to the next word.
public  StringnextSegment()
     Returns the next text segment from the input.
public  voidreplace(String newWord)
     Replaces the current word in the text.
public  voidreplaceBigram(String newBigram)
     Replaces the current bigram (current word and the next as returned by lookahead) in the text.
public  voidreplaceSegment(String newSegment)
     Replaces the current text segment.
public  voidsetText(String newText)
     Changes the text associates with this DefaultWordFinder.
public static  String[]splitNGrams(String text, int n)
     Splits a given String into an array with its constituent character n-grams.
Parameters:
  text - A String.
Parameters:
  n - Number of consecutive characters on the n-grams.
public static  String[]splitSegments(String text)
     Splits a given String into an array with its constituent text segments.
Parameters:
  text - A String.
public static  String[]splitWordGrams(String text, int n)
     Splits a given String into an array with its constituent word n-grams.
Parameters:
  text - A String.
Parameters:
  n - Number of consecutive words on the n-grams.
public static  String[]splitWords(String text)
     Splits a given String into an array with its constituent words.
Parameters:
  text - A String.
public  booleanstartsSentence()
     Checks if the current word marks the begining of a sentence.
public  StringtoString()
     Produces a string representation of this word finder by returning the associated text.

Field Detail
currentSegmentPos
protected int currentSegmentPos(Code)
The index of the current segment in the input text.



currentWord
protected String currentWord(Code)
A string with the current word for the word finder.



currentWordPos
protected int currentWordPos(Code)
The index of the current word in the input text.



nextSegmentPos
protected int nextSegmentPos(Code)
The index of the next segment in the input text.



nextWord
protected String nextWord(Code)
A string with the word next to the current one.



nextWordPos
protected int nextWordPos(Code)
The index of the next word in the input text.



sentenceIterator
protected BreakIterator sentenceIterator(Code)
An iterator over the input text.
See Also:   java.text.BreakIterator
See Also:   



solveHardCases
protected boolean solveHardCases(Code)
Solve the tokenization hard cases.



startsSentence
protected boolean startsSentence(Code)
A boolean flag indicating if the current word marks the begining of a sentence.



text
protected String text(Code)
The input text.




Constructor Detail
DefaultWordFinder
public DefaultWordFinder(String inText)(Code)
Constructor for DefaultWordFinder.
Parameters:
  inText - A String with the input text to tokenize.



DefaultWordFinder
public DefaultWordFinder()(Code)
Constructor for DefaultWordFinder.




Method Detail
current
public String current()(Code)
Returns the current word in the text. A String with the current word in the text.



currentNGram
public String currentNGram(int n)(Code)
Returns the current word N-gram from the input. An N-gram is defined as the character sequence between the current position and the next n characters.
Parameters:
  n - Number of consecutive characters on the n-grams. A String with the current word N-gram.



currentSegment
public String currentSegment()(Code)
Returns the current text segment from the input. A segment is defined as the character sequence between the current position and the next non-alphanumeric character, considering also white spaces. A String with the current text segment.



currentWordGram
public String currentWordGram(int n)(Code)
Returns the current word N-gram from the input. An N-gram is defined as the word sequence between the current position and the next n words.
Parameters:
  n - Number of consecutive words on the n-grams. A String with the current word N-gram.



getText
public String getText()(Code)
Returns the text associated with this DefaultWordFinder. A String with the text associated with this DefaultWordFinder.



hasNext
public boolean hasNext()(Code)
Tests if there are more words available from the text. true if and only if there is at least one word in thestring after the current position, and false otherwise.



ignore
protected int ignore(int index, char startIgnore)(Code)
Ignore all characters from the text after the first occurence of a given character.
Parameters:
  index - A starting index for the text from where characters should be ignored
Parameters:
  startIgnore - The character that marks the begining of the sequence to be ignored. the index in the text marking the begining of the ignored sequence, or -1 if nosequence was ignored (the supplied character does not occur in the text).



ignore
protected int ignore(int index, char startIgnore, char endIgnore)(Code)
Ignore all characters from the text between the first occurence of a given character and the next occurence of another given character.
Parameters:
  index - A starting index for the text from where characters should be ignored.
Parameters:
  startIgnore - The character that marks the begining of the sequence to be ignored.
Parameters:
  endIgnore - The character that marks the ending of the sequence to be ignored. the index in the text marking the begining of the ignored sequence, or -1 if nosequence was ignored (the supplied starting character does not occur in the text).



ignore
protected int ignore(int index, Character startIgnore, Character endIgnore)(Code)
Ignore all characters from the text between the first occurence of a given character and the next occurence of another given character.
Parameters:
  index - A starting index for the text from where characters should be ignored.
Parameters:
  startIgnore - The character that marks the begining of the sequence to be ignored.
Parameters:
  endIgnore - The character that marks the ending of the sequence to be ignored, or nullif all the next characters from the text are to be ignored. the index in the text marking the begining of the ignored sequence, or -1 if nosequence was ignored (the supplied starting character does not occur in the text).



ignore
protected int ignore(int index, String startIgnore, String endIgnore)(Code)
Ignore all characters from the text between the first occurence of a given String and the next occurence of another given String.
Parameters:
  index - A starting index for the text from where characters should be ignored.
Parameters:
  startIgnore - The String that marks the begining of the sequence to be ignored.
Parameters:
  endIgnore - The String that marks the ending of the sequence to be ignored. the index in the text marking the begining of the ignored sequence, or -1 if nosequence was ignored (the supplied starting String does not occur in the text).



isWordChar
protected static boolean isWordChar(String text, int posn)(Code)
Checks if the character at a given position in a String is part of a word. Special characters such as '.' or '-' are considered alphanumeric or not depending on the surrounding characters.
Parameters:
  text - The text String.
Parameters:
  posn - The position for the character in the String. true if the character at the given position is alphanumeric and false otherwise.



isWordChar
protected static boolean isWordChar(char c)(Code)
Checks if a given character is alphanumeric.
Parameters:
  c - The char to check. true if the given character is alphanumeric and false otherwise.



lookAhead
public String lookAhead()(Code)
Retuns the next word without advancing the tokenizer, cheking if the character separating both words is an empty space. This is usefull for getting BiGrams from the text. The next word in the text, or null.



next
public String next()(Code)
This method scans the text from the end of the last word, and returns a String corresponding to the next word. If there are no more words to return, it retuns a null String. the next word.



nextSegment
public String nextSegment()(Code)
Returns the next text segment from the input. A segment is defined as the character sequence between the current position and the next non-alphanumeric character, considering also white spaces.If there are no more segments to return, it retuns a null String. A String with the next text segment.



replace
public void replace(String newWord)(Code)
Replaces the current word in the text. After a call to this method, a call to current() returns the new word and a call to getText() returns the text supplied to this WordFinder with the current word replaced.
Parameters:
  newWord - A string with the replacement word.



replaceBigram
public void replaceBigram(String newBigram)(Code)
Replaces the current bigram (current word and the next as returned by lookahead) in the text. After a call to this method, a call to current() returns the Bigram and a call to getText() returns the text supplied to this WordFinder with the current Bigram replaced.
Parameters:
  newBigram - A string with the replacement Bigram.



replaceSegment
public void replaceSegment(String newSegment)(Code)
Replaces the current text segment. After a call to this method, a call to currentSegment() returns the new text segment and a call to getText() returns the text supplied to this WordFinder with the current segment replaced.
Parameters:
  newSegment - A String with the new text segment.



setText
public void setText(String newText)(Code)
Changes the text associates with this DefaultWordFinder.
Parameters:
  newText - The new String with the input text to tokenize.



splitNGrams
public static String[] splitNGrams(String text, int n)(Code)
Splits a given String into an array with its constituent character n-grams.
Parameters:
  text - A String.
Parameters:
  n - Number of consecutive characters on the n-grams. An array with the character n-grams extracted from the String.



splitSegments
public static String[] splitSegments(String text)(Code)
Splits a given String into an array with its constituent text segments.
Parameters:
  text - A String. An array with the text segments extracted from the String.



splitWordGrams
public static String[] splitWordGrams(String text, int n)(Code)
Splits a given String into an array with its constituent word n-grams.
Parameters:
  text - A String.
Parameters:
  n - Number of consecutive words on the n-grams. An array with the word n-grams extracted from the String.



splitWords
public static String[] splitWords(String text)(Code)
Splits a given String into an array with its constituent words.
Parameters:
  text - A String. An array with the words extracted from the String.



startsSentence
public boolean startsSentence()(Code)
Checks if the current word marks the begining of a sentence. true if the current word marks the begining ofa sentence and false otherwise.



toString
public String toString()(Code)
Produces a string representation of this word finder by returning the associated text.



Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.