Java - Parsing CSV Files How To

Parsing CSV files is one of those routine tasks that can end up taking an inordinate amount of time for a programmer to handle, especially once you discover the CSV file is not clean. Often CSV files come with extraneous headers and footers like, page numbers and distribution information. This is especially true if they are user generated, from spreadsheet applications like Open Office, or are effectively screen scraps of reports from some mainframe or AS400 reporting application. Ideally you would like to throw the file back at the person sending it, and ask them to clean it up, but this is usually not possible, unfortunately :).

StringTokenizer not a good option for parsing CSV Files in Java

Beside there being "non-compliant" records, often the fields themselves contain the very characters that the CSV "standard"  uses to delimit records and strings.

"first' name","last,name","12,00.00","Distributes "cheap" childrens toys"

We have all seen these kind of csv files. This is especially true for names of any kind. One of the first classes one first reaches for is java.util.StringTokenizer but you soon find its limitations. Although StringTokenizer is a great convenience class, it is not robust enough for many real world situations.

Open Source Java CSV Parsers

Luckily, because parsing CSV files is such a common task there are many libraries out there that can assist by providing the boilerplate code for this task. The apache project tried to bring the various libraries together into one, generally accepted library but the project seems to have failed with the apache commons-csv parser library languishing in the sandbox since 2007!

 Luckily apache, graciously provides us with links to the parser libraries it is trying to unify:

As well as links to a library that is not part of the unification effort, namely Super CSV. Since the other libraries are part of apache now, there has not been much activity on them. It may well be that they are mature, after all there is only so much complexity in creating a CSV parser library; but I may use of the Super CSV library as it is at least active. When the apache commons-csv is finally stable I will probably switch to that, but until then I am using Super CSV.