Skip to content
Tatu Saloranta edited this page May 3, 2017 · 1 revision

CsvSchema

Most aspects of CSV parsing and generation are controlled by CsvSchema associated with the reader/parser and writer/generator. Schema defines two main kinds of things:

  1. Columns that the CSV document has, to allow for binding logical property name to/from actual physical column in CSV document (by index)
  2. Details of how contents within columns (cell values) are encoded

Column definitions

There are 3 ways to construct a CsvSchema instance with column mapping definition (and one way for "unmapped" reading):

  • Create schema based on a Java class
  • Build schema programmatically (explicit code)
  • Use the first line of CSV document to get the names (no types) for Schema

Columns from Java class

(TO BE WRITTEN)

Columns added explicitly (programmatic)

(TO BE WRITTEN)

Columns names from CSV header (first row)

Note: this considered a "simple" encoding feature.

(TO BE WRITTEN)

"Unmapped" (no columns)

The way to construct an "unmapped" schema is to:

CsvSchema unmapped = CsvSchema.emptySchema();

but this "schema" is also the default one configured to ObjectReaders and ObjectWriters if you do not explicitly specify a schema. So it is possible to simply omit the definition and read content like:

CsvMapper mapper = new CsvMapper();
mapper.enable(CsvParser.Feature.WRAP_AS_ARRAY);
// when wrapped as an array, we'll get array/List of arrays/Lists, for example:
String[][] rows = mapper.readValue(csvContent, String[][].class);

You can also use streaming read like so:

// NOTE: type given for 'readerFor' is type of individual row
MappingIterator<String[]> it = mapper.readerFor(String[].class)
    .readValues(csvSource);
while (it.hasNextValue()) {
  String[] row = it.nextValue();

Note that when using "unmapped" schema, content must be read as arrays/Lists of String or java.lang.Objects (String[], List<String> etc); no other types are accepted

Encoding properties

Following encoding/decoding properties are currently (2.6) defined:

  • On/off encoding features:
    • Use header? (default: disabled)
    • Comments allowed? (default: disabled)
    • Skip data row? (default: disabled)
  • Column separator (default: comma, ",")
    • Must be defined, can not be disablde
  • Array element separator (default: semicolon, ";")
    • Optional, may be disabled
  • Quote char (default: double quote, '"')
    • Optional, may be disabled
  • Escape char (default: not enabled)
    • Optional, may be disabled
  • Null value (default: empty String, "")

TO BE COMPLETED