This repository was archived by the owner on Jan 22, 2019. It is now read-only.

Description
following code-example can be tested with the attached file (test8.csv). The file is in ISO-8859 format and contains an UTF8 character, which is: é
File file = new File("test8.csv");
InputStream in = Files.newInputStream(file.toPath(), StandardOpenOption.READ);
CsvSchema schema = CsvSchema.emptySchema().withHeader();
CsvMapper mapper = new CsvMapper();
ObjectReader reader = mapper.readerFor(Map.class).with(schema);
MappingIterator<Map<String, String>> mappingIterator = reader.readValues(in);
while (mappingIterator.hasNextValue()) {
Map<String, String> line = mappingIterator.nextValue();
System.out.println(line);
}
mappingIterator.close();
the parsing crashes in line 152 at the call of "nextValue()". But the problematic UTF8 character is in line 185. So the parsing does not crash at the position of the problematic character but much earlier... (must be because of buffering?)
i just ask, because if the parsing would crash at the exact position of the UTF8 character, we may simple ignore this line and continue with the next line. But this way the parsing crashes earlier and can not be recovered/continued.
Following parse-exception is output:
java.io.CharConversionException: Invalid UTF-8 middle byte 0x65 (at char #4861, byte #3999): check content encoding, does not look like UTF-8
The problematic character in the file test8.csv can be found in VI-Editor with ":goto 4861"
test8.csv.zip