Java Doc for BOCU.java in  » Internationalization-Localization » icu4j » com » ibm » icu » impl » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Internationalization Localization » icu4j » com.ibm.icu.impl 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   com.ibm.icu.impl.BOCU

BOCU
public class BOCU (Code)

Binary Ordered Compression for Unicode

Users are strongly encouraged to read the ICU paper on BOCU before attempting to use this class.

BOCU is used to compress unicode text into a stream of unsigned bytes. For many kinds of text the compression compares favorably to UTF-8, and for some kinds of text (such as CJK) it does better. The resulting bytes will compare in the same order as the original code points. The byte stream does not contain the values 0, 1, or 2.

One example of a use of BOCU is in com.ibm.icu.text.Collator.getCollationKey(String) for a RuleBasedCollator object with collation strength IDENTICAL. The result CollationKey will consist of the collation order of the source string followed by the BOCU result of the source string.

Unlike a UTF encoding, BOCU-compressed text is not suitable for random access.

Method: Slope Detection
Remember the previous code point (initial 0). For each code point in the string, encode the difference with the previous one. Similar to a UTF, the length of the byte sequence is encoded in the lead bytes. Unlike a UTF, the trail byte values may overlap with lead/single byte values. The signedness of the difference must be encoded as the most significant part.

We encode differences with few bytes if their absolute values are small. For correct ordering, we must treat the entire value range -10ffff..+10ffff in ascending order, which forbids encoding the sign and the absolute value separately. Instead, we split the lead byte range in the middle and encode non-negative values going up and negative values going down.

For very small absolute values, the difference is added to a middle byte value for single-byte encoded differences. For somewhat larger absolute values, the difference is divided by the number of byte values available, the modulo is used for one trail byte, and the remainder is added to a lead byte avoiding the single-byte range. For large absolute values, the difference is similarly encoded in three bytes. (Syn Wee, I need examples here.)

BOCU does not use byte values 0, 1, or 2, but uses all other byte values for lead and single bytes, so that the middle range of single bytes is as large as possible.

Note that the lead byte ranges overlap some, but that the sequences as a whole are well ordered. I.e., even if the lead byte is the same for sequences of different lengths, the trail bytes establish correct order. It would be possible to encode slightly larger ranges for each length (>1) by subtracting the lower bound of the range. However, that would also slow down the calculation. (Syn Wee, need an example).

For the actual string encoding, an optimization moves the previous code point value to the middle of its Unicode script block to minimize the differences in same-script text runs. (Syn Wee, need an example.)


author:
   Syn Wee Quek
since:
   release 2.2, May 3rd 2002




Method Summary
public static  intcompress(String source, byte buffer, int offset)
    
public static  intgetCompressionLength(String source)
     Return the number of bytes that compress() would write.



Method Detail
compress
public static int compress(String source, byte buffer, int offset)(Code)

Encode the code points of a string as a sequence of bytes, preserving lexical order.

The minimum size of buffer required for the compression can be preflighted by getCompressionLength(String).


Parameters:
  source - text source
Parameters:
  buffer - output buffer
Parameters:
  offset - to start writing to end offset where the writing stopped
See Also:   BOCU.getCompressionLength(String)
exception:
  ArrayIndexOutOfBoundsException - thrown if size of buffer is too small for the output.



getCompressionLength
public static int getCompressionLength(String source)(Code)
Return the number of bytes that compress() would write.
Parameters:
  source - text source string the length of the BOCU result
See Also:   BOCU.compress(String,byte[],int)



Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.