-
-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inflate to:"string" conversion from Uint8Array to UTF-16 string performance issues #228
Comments
Please provide a minimal code sample how to reproduce. Also, 100mb gzip of text is ~ 1GB string after unpack. Probably you are out of memory (JIT has some limits). You could use chunking and save result to blob, it's intended to work with such big data. |
I should clarify, 100 MB uncompressed, 35 MB compressed. Idk if 100 MB will fit into JIT. Here is the file I was working with https://github.com/jxu/Word2VecDemo/raw/6ee9741a5fd556b3aa6d4598d4881061588f3c9e/wordvecs50k.vec.gz Here is some example code that runs in an async function, only due to fetch (replace with sync file load if you'd like). Nothing special is happening here const vecsResponse = await fetch("wordvecs50k.vec.gz");
const vecsBlob = await vecsResponse.blob();
const vecsBuf = await vecsBlob.arrayBuffer();
const vecsUint8 = pako.inflate(vecsBuf, {to: "string"}); My workaround was to use TextDecoder(). Is there any reason the library doesn't use TextDecoder? The library already assumes modern browser support |
Could you narrow down your example? Is it specific for that file or you can generate long string, then deflate and inflate back? |
It applies to any large file. Here is an example that does not use any specific file: s = [...Array(10**7)].map(() => Math.random().toString().substring(0,10)).join('') // generate 100M characters of fake floating point data
d = pako.deflate(s); // ~45M, takes a while but not a problem since I am not using it in my code
pako.inflate(d); // runs quickly
pako.inflate(d, {to:"string"}) // freezes browser for 30s
new TextDecoder().decode(pako.inflate(d)) // much faster |
Thanks for simplified example.
So, problem is only with slower decoding speed? I could add |
Yes, the problem is only for conversion to native JS UTF-16 string. The
TextDecoder feature is relatively new but should be supported by all modern
browsers now according to MDN (and there's always polyfill
https://github.com/anonyco/FastestSmallestTextEncoderDecoder).
I can try a PR if you'd like but I am not familiar at all with the codebase
or how to run any tests.
…On Thu, Jul 1, 2021, 5:58 AM Vitaly Puzrin ***@***.***> wrote:
Thanks for simplified example.
// freezes browser for 30s
So, problem is only with slower decoding speed? I could add TextDecoder
call when available. This code was written before TD become stable.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#228 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB46VXXCZSEFFSI4L7KG77TTVQ35LANCNFSM47RUZOAQ>
.
|
I've changed code to use TextDecoder whenever possible, thanks for reporting. |
Same issue reported here https://stackoverflow.com/questions/38145228/convertation-from-uint8array-to-utf-16-string-freezes-crashes-browser
To reproduce: try inflating a 100 MB gz file with option to: "string"
The solution is probably to do conversion in chunks
The text was updated successfully, but these errors were encountered: