Support N-x configuration for To option to skip trailer rows #356

bluejack · 2022-08-05T20:53:38Z

Summary

In csv-parse, the 'to' and 'to_line' options do not adequately support stripping trailing records. Recommend adding functionality that would either allow a syntax to these options, eg: 'T.1' to indicate stopping at end-1.

Motivation

We process some files that include a trailing "summary" record for the file. These are difficult to deal with.

Alternative

We could pre-process the file to strip these trailing rows, but it can be tricky if they contain fields with embedded newlines in field.

Draft

This can be easily implemented by dropping each parsed row into a holding queue; by pushing the configured "T-minus" row along the pipe rather than the current one, and simply dropping the queue at EOF.

Additional context

Add any other context or screenshots about the feature request here.

wdavidw · 2022-08-06T19:44:42Z

Could you provide a sample of what you expect, that would help the understanding.

bluejack · 2022-08-08T18:21:34Z

sure. eg csv:

"Data","Data","Data","More Data","More and more data","Final data field"
"Rows:","1"

Instead of processing that trailer row, I want to skip it. So, I would want to configure the parser:

skip_trailing: 1

Does that make more sense?

wdavidw · 2022-08-08T18:32:32Z

Not really, do you want this:

const data = "a,b,c\nRows:1"'
const recors = parse(data, {skip_trailing: 1})
records.should.eql(["a","b","c"])

bluejack · 2022-08-09T17:36:34Z

Yes! Thank you for reading my mind, since apparently my communication skills are weak!

bluejack · 2022-08-09T21:44:54Z

If you think this is a viable idea, and you would like me to take a stab at it, I'm happy to submit a pull request later this week. If you don't think it's a good idea, or would rather do it yourself, I won't dive in.

wdavidw · 2022-08-10T23:16:42Z

Hum, I don't think that possible. The way csv-parse is created is to handle an unlimited number of records. This use-case involve knowing in advance how many records there are to skip the last n-records. This isn't scalable.

bluejack · 2022-08-10T23:34:52Z

I'm already doing it outside your parser by buffering the N trailing rows to drop in a rotating queue and then dropping the queue on end-of-stream; i could implement that in the parser if you wanted the functionality.

wdavidw · 2022-08-11T12:58:12Z

Currious to see your code. Not sure that I want to make the parser more complex but let me look first at how you are doing this.

bluejack added the enhancement label Aug 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support N-x configuration for To option to skip trailer rows #356

Support N-x configuration for To option to skip trailer rows #356

bluejack commented Aug 5, 2022

wdavidw commented Aug 6, 2022

bluejack commented Aug 8, 2022

wdavidw commented Aug 8, 2022

bluejack commented Aug 9, 2022

bluejack commented Aug 9, 2022

wdavidw commented Aug 10, 2022

bluejack commented Aug 10, 2022

wdavidw commented Aug 11, 2022

Support N-x configuration for To option to skip trailer rows #356

Support N-x configuration for To option to skip trailer rows #356

Comments

bluejack commented Aug 5, 2022

wdavidw commented Aug 6, 2022

bluejack commented Aug 8, 2022

wdavidw commented Aug 8, 2022

bluejack commented Aug 9, 2022

bluejack commented Aug 9, 2022

wdavidw commented Aug 10, 2022

bluejack commented Aug 10, 2022

wdavidw commented Aug 11, 2022