Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate headers in Parser, based upon presence of _fields, without changing _input (fixes issue #985) #989

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

landisdesign
Copy link

@landisdesign landisdesign commented Mar 24, 2023

Fixes issue #985.

Previously, headers were created in ParseHandle's processResults(), but also managed by Parser.parse() when baseIndex was falsy.

When headers were requested and baseIndex was false, parse() would modify _input without adjusting cursor, causing the parser to mistake where it needed to continue parsing. This would cause further rows to be shifted. In addition, baseIndex could be undefined or 0, depending on how it was called, causing this cursor shift to potentially happen twice.

This PR:

  • Updates _fields based upon the value of _results.meta.fields without changing _input
  • Uses _fields.length as the source of truth for whether or not headers should be generated when config.header is truthy
  • Consolidates header generation within Parser by passing _fields to the Parser constructor where the above test is performed
  • Uses the same parsing algorithms used for row generation for headers
  • Adds fields to results generated by Parser.parse() based upon whether or not headers are available

Additional impacts:

  • Because header generation is now performed within pushRow(), the header row is no longer returned as data by Parser.parse(). Instead, it is returned in meta.fields by returnable(). This also means we don't need to adjust config.preview to account for the header row.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to existing tests were required to reflect that:

  • Parser now requires an array as its second argument for its constructor
  • Parser.parse() no longer returns the header row as data

@beckyconning
Copy link

🎉

@beckyconning
Copy link

Hey @landisdesign

Can't thank you enough.

PapaParse basically didn't work from in a stream pipeline without this.

You are a life saver.

@beckyconning
Copy link

@mholt is there a different plan to resolve this issue? it seems quite pressing and this pr seems to resolve it nicely.

@mholt mholt requested a review from pokoli April 20, 2023 15:56
@mholt
Copy link
Owner

mholt commented Apr 20, 2023

As long as it works, and the tests pass, this is OK with me -- will probably defer to @pokoli for a closer look though. Ping me again if it isn't merged in a while.

Copy link
Collaborator

@pokoli pokoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes also look good to me, I added some comments to simplify the the tests a little bit. Could you have a look at it?

@@ -2018,6 +2027,51 @@ describe('Unparse Tests', function() {


var CUSTOM_TESTS = [
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add new tests at the end?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! Done.

{
description: "Pause and resume works with headers and duplicate fields (Regression Test for Bug #985)",
expected: [[
["Column 1", "Column 2", "Column 3", "Column 4", "Column 5"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to have 5 rows? Can we make the test simpler with less rows?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. I reduced it to two rows and four columns, enough to make sure we could test column value duplicates and that the headers didn't shift between pause/resume stages.

step: function(results, parser) {
if (results)
{
headerResults.push(results.meta.fields);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pusshing headers multiple times? Won't be better to just push when empty?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to double-check that multiple passes returned the same headers. It looked like the original code could have updated the headers on each pass, which seemed problematic. Perhaps that question makes sense as a separate test, or do you think it isn't worth the check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants