Inflate to:"string" conversion from Uint8Array to UTF-16 string performance issues #228

jxu · 2021-06-30T06:32:22Z

Same issue reported here https://stackoverflow.com/questions/38145228/convertation-from-uint8array-to-utf-16-string-freezes-crashes-browser

To reproduce: try inflating a 100 MB gz file with option to: "string"

The solution is probably to do conversion in chunks

puzrin · 2021-06-30T15:32:29Z

Please provide a minimal code sample how to reproduce.

Also, 100mb gzip of text is ~ 1GB string after unpack. Probably you are out of memory (JIT has some limits). You could use chunking and save result to blob, it's intended to work with such big data.

jxu · 2021-07-01T02:01:57Z

I should clarify, 100 MB uncompressed, 35 MB compressed. Idk if 100 MB will fit into JIT.

Here is the file I was working with https://github.com/jxu/Word2VecDemo/raw/6ee9741a5fd556b3aa6d4598d4881061588f3c9e/wordvecs50k.vec.gz

Here is some example code that runs in an async function, only due to fetch (replace with sync file load if you'd like). Nothing special is happening here

    const vecsResponse = await fetch("wordvecs50k.vec.gz");
    const vecsBlob = await vecsResponse.blob();
    const vecsBuf = await vecsBlob.arrayBuffer();
    const vecsUint8 = pako.inflate(vecsBuf, {to: "string"});

My workaround was to use TextDecoder(). Is there any reason the library doesn't use TextDecoder? The library already assumes modern browser support

puzrin · 2021-07-01T03:09:37Z

Could you narrow down your example? Is it specific for that file or you can generate long string, then deflate and inflate back?

jxu · 2021-07-01T09:17:42Z

It applies to any large file. Here is an example that does not use any specific file:

s = [...Array(10**7)].map(() => Math.random().toString().substring(0,10)).join('') // generate 100M characters of fake floating point data
d = pako.deflate(s); // ~45M, takes a while but not a problem since I am not using it in my code
pako.inflate(d); // runs quickly
pako.inflate(d, {to:"string"}) // freezes browser for 30s
new TextDecoder().decode(pako.inflate(d)) // much faster

puzrin · 2021-07-01T09:58:35Z

Thanks for simplified example.

// freezes browser for 30s

So, problem is only with slower decoding speed? I could add TextDecoder call in buf2string() when available. This code was written before TD become stable.

jxu · 2021-07-01T10:24:48Z

Yes, the problem is only for conversion to native JS UTF-16 string. The TextDecoder feature is relatively new but should be supported by all modern browsers now according to MDN (and there's always polyfill https://github.com/anonyco/FastestSmallestTextEncoderDecoder). I can try a PR if you'd like but I am not familiar at all with the codebase or how to run any tests.

…

On Thu, Jul 1, 2021, 5:58 AM Vitaly Puzrin ***@***.***> wrote: Thanks for simplified example. // freezes browser for 30s So, problem is only with slower decoding speed? I could add TextDecoder call when available. This code was written before TD become stable. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#228 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB46VXXCZSEFFSI4L7KG77TTVQ35LANCNFSM47RUZOAQ> .

rlidwka · 2021-07-27T16:23:23Z

I've changed code to use TextDecoder whenever possible, thanks for reporting.

fix #228

jxu mentioned this issue Jun 30, 2021

Setup GitHub pages live demo touretzkyds/oldWordEmbeddingDemo#8

Closed

jxu changed the title ~~Inflate conversion from Uint8Array to UTF-16 string freezes browser~~ Inflate to:"string" conversion from Uint8Array to UTF-16 string performance issues Jul 1, 2021

rlidwka closed this as completed in 26dff4f Jul 27, 2021

rlidwka added a commit that referenced this issue Jun 10, 2022

Use TextEncoder and TextDecoder if available

2cb3f0b

fix #228

rlidwka added a commit that referenced this issue Jun 10, 2022

Use TextEncoder and TextDecoder if available

934a10c

fix #228

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inflate to:"string" conversion from Uint8Array to UTF-16 string performance issues #228

Inflate to:"string" conversion from Uint8Array to UTF-16 string performance issues #228

jxu commented Jun 30, 2021 •

edited

puzrin commented Jun 30, 2021

jxu commented Jul 1, 2021 •

edited

puzrin commented Jul 1, 2021

jxu commented Jul 1, 2021 •

edited

puzrin commented Jul 1, 2021 •

edited

jxu commented Jul 1, 2021 via email

rlidwka commented Jul 27, 2021

Inflate to:"string" conversion from Uint8Array to UTF-16 string performance issues #228

Inflate to:"string" conversion from Uint8Array to UTF-16 string performance issues #228

Comments

jxu commented Jun 30, 2021 • edited

puzrin commented Jun 30, 2021

jxu commented Jul 1, 2021 • edited

puzrin commented Jul 1, 2021

jxu commented Jul 1, 2021 • edited

puzrin commented Jul 1, 2021 • edited

jxu commented Jul 1, 2021 via email

rlidwka commented Jul 27, 2021

jxu commented Jun 30, 2021 •

edited

jxu commented Jul 1, 2021 •

edited

jxu commented Jul 1, 2021 •

edited

puzrin commented Jul 1, 2021 •

edited