001: /*
002: * Flags.java: commonly used constants.
003: *
004: * Copyright (C) 2004 Heiko Blau
005: *
006: * This file belongs to the JTopas Library.
007: * JTopas is free software; you can redistribute it and/or modify it
008: * under the terms of the GNU Lesser General Public License as published by the
009: * Free Software Foundation; either version 2.1 of the License, or (at your
010: * option) any later version.
011: *
012: * This software is distributed in the hope that it will be useful, but WITHOUT
013: * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
014: * FITNESS FOR A PARTICULAR PURPOSE.
015: * See the GNU Lesser General Public License for more details.
016: *
017: * You should have received a copy of the GNU Lesser General Public License along
018: * with JTopas. If not, write to the
019: *
020: * Free Software Foundation, Inc.
021: * 59 Temple Place, Suite 330,
022: * Boston, MA 02111-1307
023: * USA
024: *
025: * or check the Internet: http://www.fsf.org
026: *
027: * Contact:
028: * email: heiko@susebox.de
029: */
030:
031: package de.susebox.jtopas;
032:
033: //-----------------------------------------------------------------------------
034: // Interface Flags
035: //
036:
037: /**
038: * The interface defines flags that are used by various classes during tokenizing.
039: * A flag can be set in three ways:
040: *<ul><li>
041: * Globally for a {@link TokenizerProperties} object: The setting affects all
042: * {@link Tokenizer} instances that share this <code>TokenizerProperties</code>
043: * object as well as the {@link TokenizerProperty} objects registered in this
044: * <code>TokenizerProperties</code> that do haven't set the flag locally.
045: *</li><li>
046: * Separately for a {@link Tokenizer} (see {@link Tokenizer#changeParseFlags}:
047: * A single <code>Tokenizer</code> will behave differently to the setting in
048: * the used {@link TokenizerProperties} object, but still follow the setting
049: * for a single {@link TokenizerProperty} object. Only a limited number of
050: * flags can be set for a <code>Tokenizer</code>, especially the flags that
051: * are "dynamic", applicable more for the tokenizing process than describing
052: * an attribute of a {@link TokenizerProperty}, e. g. {@link #F_COUNT_LINES}
053: * and {@link #F_KEEP_DATA}.
054: *</li><li>
055: * Specifically for a single {@link TokenizerProperty}: This setting affects
056: * only the handling of the property and overrules both settings for the
057: * {@link TokenizerProperties} that contains the property, and settings for a
058: * {@link Tokenizer} using the <code>TokenizerProperties</code> object. Only
059: * a limited number of flags can be set for a singel property including the
060: * descriptive flags like {@link #F_NO_CASE}, {@link #F_ALLOW_NESTED_COMMENTS}
061: * and {@link #F_SINGLE_LINE_STRING}.
062: *</li></ul>
063: *
064: * @see TokenizerProperties
065: * @author Heiko Blau
066: */
067: public interface Flags {
068:
069: /**
070: * When this flag is set globally for a {@link TokenizerProperties} instance
071: * (see {@link #setParseFlags}, input data is generally treated case-insensitive.
072: * Specific properties may still be treated case-sensitive. Set this flag set
073: * in the flag mask and cleared in the corresponding flags).
074: *<br>
075: * Implementation note: The flag should be applicable for both {@link TokenizerProperties}
076: * and {@link TokenizerProperty} instances. It should not to be used
077: * dynamically ({@link Tokenizer#changeParseFlags}).
078: */
079: public static final short F_NO_CASE = 0x0001;
080:
081: /**
082: * General compare operations are case-sensitive, that means 'A' equals 'A'
083: * but not 'a'. It is not nessecary to set this flag, since case-sensitive
084: * comparison is the default.
085: *<br>
086: * The flag was mainly used in conjunction with {@link #F_NO_CASE}. If
087: * <code>F_NO_CASE</code> is set via {@link TokenizerProperties#setParseFlags},
088: * <code>F_CASE</code> can be used for single properties where case-sensitivity
089: * is nessecary inspite of the global case-insensitivity.
090: *<br>
091: * If neither <code>F_CASE</code> nor <code>F_NO_CASE</code> is set, <code>F_CASE</code>
092: * is assumed. If both flags are set, <code>F_CASE</code> takes preceedence.
093: *<br>
094: * Implementation note: The flag should be applicable for both {@link TokenizerProperties}
095: * and {@link TokenizerProperty} instances. It should not to be used
096: * dynamically ({@link Tokenizer#changeParseFlags}).
097: *
098: * @deprecated for properties with a case handling different to the global
099: * settings of a {@link TokenizerProperties} instance use
100: * the constructor {@link TokenizerProperty(int, java.lang.String[], java.lang.Object, int, int)}
101: */
102: public static final short F_CASE = 0x0002;
103:
104: /**
105: * For performance and memory reasons, this flag is used to avoid copy operations
106: * for every token. The token image itself is not returned in a {@link Token}
107: * instance, only its position and length in the input stream.
108: *<br>
109: * Implementation note: The flag should be applicable for {@link TokenizerProperties},
110: * and {@link TokenizerProperty} instances. It should also be a dynamic flag
111: * that can be switched on and off during runtime using {@link Tokenizer#changeParseFlags}.
112: */
113: public static final short F_TOKEN_POS_ONLY = 0x0010;
114:
115: /**
116: * Set this flag to let a {@link Tokenizer} buffer all data. Usually, a tokenizer
117: * will apply a strategie to allocate only a reasonable amount of memory.
118: *<br>
119: * Implementation note: The flag should be applicable for {@link TokenizerProperties}
120: * and {@link Tokenizer} objects, but not for single {@link TokenizerProperty}
121: * instances. It could also be a dynamic flag that can be switched on and off
122: * during runtime of a tokenizer ({@link Tokenizer#changeParseFlags}), although
123: * it is generally set before parsing starts.
124: */
125: public static final short F_KEEP_DATA = 0x0020;
126:
127: /**
128: * Tells a {@link Tokenizer} to count lines and columns. The tokenizer may use
129: * {@link java.lang.System.getProperty}<code>("line.separator")</code> to
130: * obtain the end-of-line sequence or accept different line separator sequences
131: * for a better portability: single carriage return (Mac OS), single line feed
132: * (Unix), combination of carriage return and line feed (Windows OS).
133: *<br>
134: * Usually, the end-of-line characters '\r' and '\n' are whitespaces. If they
135: * are also part of one or more special sequences or pattern, it is
136: * <strong>NOT</strong> guaranteed that the line counting mechanism of a
137: * {@link Tokenizer} implementation finds these occurences. This is in order to
138: * maintain a good performance, since otherwise there would be a potential huge
139: * amount of unsuccessfull newline scans in these tokens. Consider defining
140: * special sequences for '\r', '\n' and '\r\n' alone and remove them from the
141: * whitespace set, if You cannot live with the described limitation.
142: *<br>
143: * Implementation note: The flag should be applicable for {@link TokenizerProperties}
144: * and {@link Tokenizer} objects, but not for single {@link TokenizerProperty}
145: * instances. It could also be a dynamic flag that can be switched on and off
146: * during runtime of a tokenizer, although it is generally set before parsing
147: * starts.
148: */
149: public static final short F_COUNT_LINES = 0x0040;
150:
151: /**
152: * Nested block comments are normally not allowed. This flag changes the
153: * default behaviour.
154: *<br>
155: * Implementation note: The flag should be applicable for both {@link TokenizerProperties}
156: * and {@link TokenizerProperty} instances. It should not to be used
157: * dynamically (as in versions of JTopas prior to 0.8).
158: */
159: public static final short F_ALLOW_NESTED_COMMENTS = 0x0080;
160:
161: /**
162: * Treat pattern the same way as whitespaces, separators or special sequences.
163: * Pattern of this type are recognized anywhere outside comments and strings.
164: * They terminate normal token. In fact, strings and comments could be
165: * described as free pattern.
166: *<br>
167: * Without this flag, pattern are treated in the same way as normal token.
168: * They are preceeded and followed by whitespaces, separators or special sequences.
169: *<br>
170: * Implementation note: The flag should be applicable for both {@link TokenizerProperties}
171: * and {@link TokenizerProperty} instances. It should not to be used
172: * dynamically.
173: */
174: public static final short F_FREE_PATTERN = 0x0100;
175:
176: /**
177: * Return simple whitespaces. These whitespaces are the ones set by
178: * {@link #setWhitespaces}. The flag is part of the composite mask
179: * {@link #F_RETURN_WHITESPACES}.
180: *<br>
181: * Implementation note: The flag should be applicable for {@link TokenizerProperties}
182: * and {@link Tokenizer}, but not for single {@link TokenizerProperty}
183: * instances. It is also a dynamic flag that can be switched on and off
184: * during runtime of a tokenizer (<strong>Note:</strong>: Flags for a single
185: * {@link TokenizerProperty} take precedence over other settings).
186: */
187: public static final short F_RETURN_SIMPLE_WHITESPACES = 0x0200;
188:
189: /**
190: * Return block comments. The flag is part of the composite mask
191: * {@link #F_RETURN_WHITESPACES}.
192: *<br>
193: * Implementation note: The flag should be applicable for <code>TokenizerProperties</code>,
194: * {@link Tokenizer} and for single {@link TokenizerProperty} instances. It is
195: * also a dynamic flag that can be switched on and off during runtime of a
196: * tokenizer (<strong>Note:</strong>: Flags for a single {@link TokenizerProperty}
197: * take precedence over other settings).
198: */
199: public static final short F_RETURN_BLOCK_COMMENTS = 0x0400;
200:
201: /**
202: * Return line comments. The flag is part of the composite mask
203: * {@link #F_RETURN_WHITESPACES}.
204: *<br>
205: * Implementation note: The flag should be applicable for <code>TokenizerProperties</code>,
206: * {@link Tokenizer} and for single {@link TokenizerProperty} instances. It is
207: * also a dynamic flag that can be switched on and off during runtime of a
208: * tokenizer (<strong>Note:</strong>: Flags for a single {@link TokenizerProperty}
209: * take precedence over other settings).
210: */
211: public static final short F_RETURN_LINE_COMMENTS = 0x0800;
212:
213: /**
214: * In many cases, parsers are not interested in whitespaces. If You are, use
215: * this value to force the tokenizer to return whitespace sequences and comments
216: * as a token. Per default, the flag is not set.
217: *<br>
218: * You can control the whitespace policy with finer granularity by using the
219: * flags {@link #F_RETURN_SIMPLE_WHITESPACES}, {@link #F_RETURN_BLOCK_COMMENTS}
220: * and {@link #F_RETURN_LINE_COMMENTS} either by setting it generally for
221: * a {@link TokenizerProperties} or a single {@link Tokenizer} object or even
222: * more specific for a single {@link TokenizerProperties}.
223: */
224: public static final short F_RETURN_WHITESPACES = F_RETURN_SIMPLE_WHITESPACES
225: + F_RETURN_BLOCK_COMMENTS + F_RETURN_LINE_COMMENTS;
226:
227: /**
228: * Per default, strings are all characters between and including a pair of
229: * string start and end sequences, regardless if there are line separators in
230: * between. This flag changes that behaviour for the <code>TokenizerProperties</code>
231: * instance in general or for a single string property.
232: *<br>
233: * Implementation note: The flag should be applicable for both <code>TokenizerProperties</code>
234: * and {@link TokenizerProperty} instances. It should not to be used
235: * dynamically.
236: */
237: public static final short F_SINGLE_LINE_STRING = 0x1000;
238:
239: /**
240: * By setting this flag for a {@link TokenizerProperties} instance, a
241: * {@link Tokenizer} or for a single property, a tokenizer returns not only
242: * the token images but also image parts (see {@link Token#getImageParts}).
243: *<br>
244: * Implementation note: The flag should be applicable for {@link TokenizerProperties},
245: * {@link Tokenizer} and for single {@link TokenizerProperty} instances.
246: */
247: public static final short F_RETURN_IMAGE_PARTS = 0x4000;
248: }
|