Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip csv line with just commas @csv #368

Open
GitXm123 opened this issue Jan 8, 2023 · 5 comments
Open

Skip csv line with just commas @csv #368

GitXm123 opened this issue Jan 8, 2023 · 5 comments
Labels

Comments

@GitXm123
Copy link

GitXm123 commented Jan 8, 2023

Hi,
I am using the jackson csv library to parse a Csv file. My end users use excel to generate the csv and sometimes they remove values form the entire row. This creates a csv line with just a empty commas.
Is it possible to skip such csv rows that have just commas and no values in it?

Note that in my case some csv column values could have new line character included in the data and hence I don't want to write a pre filter to programmatically check and remove the above-mentioned empty lines. The header row almost always has a new line character in some of the column names

Any help is appreciated.
TIA

@cowtowncoder
Copy link
Member

There is no setting to detect and remove lines consisting of just commas. Or, if I understand this correctly, all-whitespace values (possibly with linefeed characters). Latter requirement gets tricky as you note since simple pre-filtering cannot be used.

It would probably be necessary to do 2-phase processing: first reading entries as Map<String,Object>, and then removing entries that match criteria; and then using ObjectMapper.convertValue() to convert non-empty ones into target type.

@GitXm123
Copy link
Author

The requirement is to detect and remove lines consisting of just commas. TIA

@cowtowncoder
Copy link
Member

Ok. I will leave this open in case anyone has ideas, time & interest to implement support.
I don't see an immediately obvious way to achieve this, partly due to streaming/incremental nature of CSV parsing in the module. But I have learned there are often many ways to view the problem and hopefully someone has good ideas of how to go about it.

@djay-S
Copy link

djay-S commented Jun 20, 2023

Hi @GitXm123 ,
Can you share some sample files that we can use as a reference.

@tlahn
Copy link
Contributor

tlahn commented Nov 20, 2023

Hi,
if I correctly understand the first paragraph of the original request and #368 (comment) , I have the same requirement.

Example input:

ID,Color,Shape
1,red,circle
,,
2,blue,square

Resulting output should be a list of 2 entries - one for id 1 and one for id 2. The line containing only commas should not result in an output row.
This is similar to #15

@GitXm123 could you please confirm whether we have the same feature in mind? The second paragraph of your original post is about newlines in headers and data, but this may be another / additional feature.

tlahn added a commit to tlahn/jackson-dataformats-text that referenced this issue Nov 20, 2023
still very early experiment stage. not sure if all test cases
match desired results.

implementation missing, just fixed compile errors

- issue: FasterXML#368
- similar issue: FasterXML#15
- maybe code changes to gain insights: FasterXML@f44a320
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants