Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support N-x configuration for To option to skip trailer rows #356

Open
bluejack opened this issue Aug 5, 2022 · 8 comments
Open

Support N-x configuration for To option to skip trailer rows #356

bluejack opened this issue Aug 5, 2022 · 8 comments

Comments

@bluejack
Copy link

bluejack commented Aug 5, 2022

Summary

In csv-parse, the 'to' and 'to_line' options do not adequately support stripping trailing records. Recommend adding functionality that would either allow a syntax to these options, eg: 'T.1' to indicate stopping at end-1.

Motivation

We process some files that include a trailing "summary" record for the file. These are difficult to deal with.

Alternative

We could pre-process the file to strip these trailing rows, but it can be tricky if they contain fields with embedded newlines in field.

Draft

This can be easily implemented by dropping each parsed row into a holding queue; by pushing the configured "T-minus" row along the pipe rather than the current one, and simply dropping the queue at EOF.

Additional context

Add any other context or screenshots about the feature request here.

@wdavidw
Copy link
Member

wdavidw commented Aug 6, 2022

Could you provide a sample of what you expect, that would help the understanding.

@bluejack
Copy link
Author

bluejack commented Aug 8, 2022

sure. eg csv:

"Data","Data","Data","More Data","More and more data","Final data field"
"Rows:","1"

Instead of processing that trailer row, I want to skip it. So, I would want to configure the parser:

skip_trailing: 1

Does that make more sense?

@wdavidw
Copy link
Member

wdavidw commented Aug 8, 2022

Not really, do you want this:

const data = "a,b,c\nRows:1"'
const recors = parse(data, {skip_trailing: 1})
records.should.eql(["a","b","c"])

@bluejack
Copy link
Author

bluejack commented Aug 9, 2022

Yes! Thank you for reading my mind, since apparently my communication skills are weak!

@bluejack
Copy link
Author

bluejack commented Aug 9, 2022

If you think this is a viable idea, and you would like me to take a stab at it, I'm happy to submit a pull request later this week. If you don't think it's a good idea, or would rather do it yourself, I won't dive in.

@wdavidw
Copy link
Member

wdavidw commented Aug 10, 2022

Hum, I don't think that possible. The way csv-parse is created is to handle an unlimited number of records. This use-case involve knowing in advance how many records there are to skip the last n-records. This isn't scalable.

@bluejack
Copy link
Author

I'm already doing it outside your parser by buffering the N trailing rows to drop in a rotating queue and then dropping the queue on end-of-stream; i could implement that in the parser if you wanted the functionality.

@wdavidw
Copy link
Member

wdavidw commented Aug 11, 2022

Currious to see your code. Not sure that I want to make the parser more complex but let me look first at how you are doing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants