Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: progress callback + complete callback with the bulk of the parsed entries #1011

Open
TomasHubelbauer opened this issue Jul 11, 2023 · 0 comments

Comments

@TomasHubelbauer
Copy link

Hi, I am using Papa Parse with worker: true and step: to be able to parse a huge file off the main thread and with streaming so that the parsed data don't need to be held in memory all at once.

I am using Dexie to store the parsed entries in IndexedDB. Right now Papa Parse is not locking up the page because it is parsing in the web worker and streaming the results, but the IPC between the main thread and the worker thread and Dexie storing the entries one-by-one is locking the page.

The proper solution here I think is to not use worker: true for the implicit worker but instead come up with my own web worker and run Papa Parse in it in worker-less but streaming mode and run Dexie in the same worker so both parsing and storing happens off the main thread but since it is streaming the memory doesn't just grow until the whole file is parsed.

I am trying to push this solution off until I have addressed other aspects of my app though because making TypeScript and the Next build process aware of my web worker is a huge pain which will take a lot of time that I would like to avoid investing for now.

My compromise that I am switching to right now is to use worker: true but no step: so no streaming. I will pay the price of the whole parsed file being in memory and will marshal it between the worker and the main thread in one message using the complete: handler which will contain the entire dataset since there is no step:. Then I can store the data with Dexie in bulk instead of row-by-row.

However with this solution I lose the ability to display progress. Previously I used results.meta.cursor divided by the File instance size (or in case of a URL I did a HEAD request to find out the Content-Length first and then handed off the URL to Papa Parse).

I would like to float the idea of introducing a progress: callback which can only exist when step: is not provided. It would be called every now and then (doesn't have to be for every row but can be) and the main thread will then be able to use it to read cursor and do the progress reporting. This callback should carry no data and the complete: callback should still be the one to relay the data in the lack of step:.

LMK WDYT and whether I missed any way to make this possible with the existing API or made a wrong assumption about how Papa Parse works invalidating the need for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant