Read files in comma separated value format as outputted by the Microsoft
Excel Spreadsheet program.
More information about this class is available from ostermiller.org.
Excel CSV is a file format used as a portable representation of a database.
Each line is one entry or record and the fields in a record are separated by commas.
If field includes a comma or a new line, the whole field must be surrounded with double quotes.
When the field is in quotes, any quote literals must be escaped by two quotes ("").
Text that comes after quotes that have been closed but come before the next comma will be ignored.
Empty fields are returned as as String of length zero: "". The following line has three empty
fields and three non-empty fields in it. There is an empty field on each end, and one in the
middle. One token is returned as a space.
,second,, ,fifth,
Blank lines are always ignored. Other lines will be ignored if they start with a
comment character as set by the setCommentStart() method.
An example of how CVSLexer might be used:
ExcelCSVParser shredder = new ExcelCSVParser(System.in);
String t;
while ((t = shredder.nextValue()) != null){
System.out.println("" + shredder.lastLineNumber() + " " + t);
}
The CSV that Excel outputs differs from the
standard
in several respects:
- Leading and trailing whitespace is significant.
- A backslash is not a special character and is not used to escape anything.
- Quotes inside quoted strings are escaped with a double quote rather than a backslash.
- Excel may convert data before putting it in CSV format:
- Tabs are converted to a single space.
- New lines in the data are always represented as the UNIX new line. ("\n")
- Numbers that are greater than 12 digits may be represented in truncated
scientific notation form.
This parser does not attempt to fix these excel conversions, but users should be aware
of them.
See Also: com.Ostermiller.util.CSVParser author: Stephen Ostermiller http://ostermiller.org/contact.pl?regarding=Java+Utilities since: ostermillerutils 1.00.00 |