New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV parsing is broken in 5.4.1 #998
Comments
Hi @rpr-ableton Could you please describe which is the issue? |
The symptoms are:
Maybe to illustrate the point better, here's the output from the same script but using papaparse 5.3.2 :
|
Hi @rpr-ableton, We need to isolate the problems with minimal reproducible cases. I will need a minimum test scenario in order to be able to fix it. BTW: We have included some changes in master version which are not included on latest release. Could you please test with the master version? |
The issues are the same whether I'm using master or 5.4.1. This is the minimum CSV file to reproduce all errors I mentioned above:
This is one reproduces both the
I haven't managed to do anything smaller than this to generate the errors. Also it seems that the 2 errors mentioned in the second CSV are tightly connected as I can't reproduce one without the other. |
Any update on being able to address this issue? I've recently run into what sounds like a similar problem where a CSV gets parsed with |
Follow-up -- downgrading to papaparse 5.3.2 corrected the problem. |
The |
I am experiencing the same issue. Only occurred when using async iterator on stream. Using listeners works without issues. |
I also got some data consistency issues with parsing csv files in Node using the streaming mode and using "header: true" to get objects. While it works fine with version 5.3.2, starting from 5.4.0 papaparse would change the content of my parsed objects. As there is no issue using "header: false" I could narrow it down to the following commit c1cbe16 My guess is, that this code block tries to look for duplicate headers on each new streaming chunk, not just the first line of the first chunk, and thereby modifies not the header but actual data rows... My solution is to stick with version 5.3.2 for now. |
Thanks for the hint @pokoli ! However, after looking more into Papaparse's source code, the issue is more likely with streaming the parsed objects out of Papaparse. I could figure out that the central parse loop runs fine for the first 16 rows, but then gets paused. It resumes when the destination of my pipeline consumed the first 16 objects, but upon resuming the Here is a minimal working example, where I build a pipeline with a first function that generates one big string representing a tsv-table of 20 rows with import { Stream } from "stream"
import * as papa from "papaparse"
async function main() {
const pipeline = Stream.pipeline(
async function* () {
yield ["id1\tid2", ...Array.from({ length: 20 }, (v, i) => `${i}\t${i}`)].join("\n")
},
papa.parse(papa.NODE_STREAM_INPUT, { delimiter: "\t", newline: "\n", header: true }),
(err) => err && console.error(err)
)
for await (const entry of pipeline) {
console.log(entry)
}
}
main() While the example produces the expected output in version 5.3.2, starting from version 5.4.0 the output looks like this: { id1: '0', id2: '0' }
{ id1: '1', id2: '1' }
{ id1: '2', id2: '2' }
{ id1: '3', id2: '3' }
{ id1: '4', id2: '4' }
{ id1: '5', id2: '5' }
{ id1: '6', id2: '6' }
{ id1: '7', id2: '7' }
{ id1: '8', id2: '8' }
{ id1: '9', id2: '9' }
{ id1: '10', id2: '10' }
{ id1: '11', id2: '11' }
{ id1: '12', id2: '12' }
{ id1: '13', id2: '13' }
{ id1: '14', id2: '14' }
{ id1: '15', id2: '15' }
{ id1: '16', id2: '16_1' }
{ id1: '', id2: '17' }
{ id1: '18', id2: '18_1' }
{ id1: '', id2: '19' } |
Csv parsing still broken on NodeJS, using v5.4.1 with header: true. EDIT: The bug appears when using the stream mode like this: import { NODE_STREAM_INPUT, parse } from 'papaparse';
// Some read stream returned from fs.createdReadStream
const csvFileReadStream = ...
csvFileReadStream.pipe(parse(NODE_STREAM_INPUT, options)) Using parse(csvFileReadStream, options) looks to work fine (though Typescript complains on the 1st arg of the parse function but that's maybe some misconfiguration on our side). |
I'm experiencing exactly what @ulo described above. Streaming on 5.4.1 with Converting the stream to a string and parsing that works as expected. |
We were seeing an issue with the CSV being inconsistently parsed incorrectly and seem to be a regression in papaparse@5.4.1. Downgrading the version to fix the issue mholt/PapaParse#998
I tried the latest |
Hi everyone,
First of all, big thanks for this great library that's been a big help in my work 馃檶
I noticed when going from 5.3.2 to 5.4.1 that some CSV files can't be parsed successfully anymore. I'll attach below a small CSV, a nodejs script to reproduce the issue as well as the output.
example.csv
:script.js
:Output with papaparse 5.4.1:
Let me know if you need any additional information to better capture the issue.
Thanks a bunch.
The text was updated successfully, but these errors were encountered: