Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inflate to:"string" conversion from Uint8Array to UTF-16 string performance issues #228

Closed
jxu opened this issue Jun 30, 2021 · 7 comments
Closed

Comments

@jxu
Copy link

jxu commented Jun 30, 2021

Same issue reported here https://stackoverflow.com/questions/38145228/convertation-from-uint8array-to-utf-16-string-freezes-crashes-browser

To reproduce: try inflating a 100 MB gz file with option to: "string"

The solution is probably to do conversion in chunks

@puzrin
Copy link
Member

puzrin commented Jun 30, 2021

Please provide a minimal code sample how to reproduce.

Also, 100mb gzip of text is ~ 1GB string after unpack. Probably you are out of memory (JIT has some limits). You could use chunking and save result to blob, it's intended to work with such big data.

@jxu
Copy link
Author

jxu commented Jul 1, 2021

I should clarify, 100 MB uncompressed, 35 MB compressed. Idk if 100 MB will fit into JIT.

Here is the file I was working with https://github.com/jxu/Word2VecDemo/raw/6ee9741a5fd556b3aa6d4598d4881061588f3c9e/wordvecs50k.vec.gz

Here is some example code that runs in an async function, only due to fetch (replace with sync file load if you'd like). Nothing special is happening here

    const vecsResponse = await fetch("wordvecs50k.vec.gz");
    const vecsBlob = await vecsResponse.blob();
    const vecsBuf = await vecsBlob.arrayBuffer();
    const vecsUint8 = pako.inflate(vecsBuf, {to: "string"});

My workaround was to use TextDecoder(). Is there any reason the library doesn't use TextDecoder? The library already assumes modern browser support

@puzrin
Copy link
Member

puzrin commented Jul 1, 2021

Could you narrow down your example? Is it specific for that file or you can generate long string, then deflate and inflate back?

@jxu
Copy link
Author

jxu commented Jul 1, 2021

It applies to any large file. Here is an example that does not use any specific file:

s = [...Array(10**7)].map(() => Math.random().toString().substring(0,10)).join('') // generate 100M characters of fake floating point data
d = pako.deflate(s); // ~45M, takes a while but not a problem since I am not using it in my code
pako.inflate(d); // runs quickly
pako.inflate(d, {to:"string"}) // freezes browser for 30s
new TextDecoder().decode(pako.inflate(d)) // much faster

@jxu jxu changed the title Inflate conversion from Uint8Array to UTF-16 string freezes browser Inflate to:"string" conversion from Uint8Array to UTF-16 string performance issues Jul 1, 2021
@puzrin
Copy link
Member

puzrin commented Jul 1, 2021

Thanks for simplified example.

// freezes browser for 30s

So, problem is only with slower decoding speed? I could add TextDecoder call in buf2string() when available. This code was written before TD become stable.

@jxu
Copy link
Author

jxu commented Jul 1, 2021 via email

@rlidwka
Copy link
Member

rlidwka commented Jul 27, 2021

I've changed code to use TextDecoder whenever possible, thanks for reporting.

rlidwka added a commit that referenced this issue Jun 10, 2022
rlidwka added a commit that referenced this issue Jun 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants