Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Other ways to build blobs. #150

Open
jimmywarting opened this issue Apr 7, 2020 · 8 comments
Open

Other ways to build blobs. #150

jimmywarting opened this issue Apr 7, 2020 · 8 comments

Comments

@jimmywarting
Copy link
Owner

jimmywarting commented Apr 7, 2020

Do some of you remember the BlobBuilder where you could append chunks a bit at the time?
It might have been better at the time when you want to build large Blobs but was replaced by the Blob constructor for some reason.


Here is a document describing how blobs work in older chrome version. (don't know how outdated it is)
https://docs.google.com/presentation/d/1MOm-8kacXAon1L2tF6VthesNjXgx0fp5AP17L7XDPSM/edit

I wrote a answer on stackoverflow about how you could potentially write large blob with pointers
https://stackoverflow.com/questions/39253244/large-blob-file-in-javascript
meaning you write chunks to IndexedDB and then assemble all chunks into one large one.

I then later wrote a thing (PR #18) that would cache all chunks into IndexedDB and do all of this.
but it later got abandoned for some reason. maybe didn't what to make streamsaver more hacky then what it already is. or maybe it was paging. wasn't sure.

IndexedDB isn't the nicest or the fastest to work with.


Now I got two other theories of how you can build large blobs without using much memory.
First one is a bit simpler.

The first one is that if you fetch something and call response.blob() then you wouldn't necessary have to have everything in memory - it could as well be a pointer to a temporary file on the disk if it is very large.

It all started of from this question https://stackoverflow.com/questions/16846382/memory-usage-with-xhr-blob-responsetype-chrome (but yet again it's about chrome - not safari)

Now safari have support for fetch + readableStream so you could do something like this:

var rs = new ReadableStream({
  start (ctrl) {
    ctrl.enqueue(new Uint8Array([97,98,99])) // abc
    ctrl.close()
  }
})
var blob = await new Response(rs).blob()

Could this be a way to offload some of the memory to a temporary place on the disk? i don't know.

Now if that dose not solve it, what about using the Cache storage?

// Cache storage, second best storage for files (using request/response)
var temp = await caches.open('tmp')
var rs = new ReadableStream({
  start (ctrl) {
    ctrl.enqueue(new Uint8Array([97,98,99])) // abc
    ctrl.close()
  }
})
var req = new Request('filename.txt')
var res = new Response(rs)

// Save it to the cache
temp.put(req, res).then(() => {
  // done saving
  var res = await temp.match(req)
  var blob = await res.blob()
}, error => {
  // how to recover from this?
})

Will this do it? maybe, maybe not.

The second approach have two caveats. 1) browser has a storage limitation of how much you can store. so how do you recover from something that may fail? 2) it's only available with secure sites (https). many answered that most of you already use https in a poll (#90) so maybe that isn't an issue now with lets encrypt and other tools. but there is also ways around it using postMessages to a secure site.

Other resources suggest that the OS may paging memory to disk when memory runs out.

Paging is a method of writing and reading data from a secondary storage(Drive) for use in primary storage(RAM). When a computer runs out of RAM, the operating system (OS) will move pages of memory over to the computer's hard disk to free up RAM for other processes.

So is it really something we have to worry about? Guess we need to test with really large data first before trying to implement something. I know for a fact that my Mac OS is paging memory so i may not be able to crash the browser with lots of memory. Only way to find out what works best is to test things

@ivanjx
Copy link

ivanjx commented Jan 8, 2023

it is actually possible to build blobs from indexeddb. this code is tested with 500 MB files (with each indexeddb 'rows' containing 1 MB of data).

https://gitlab.com/ivanjx/web-p2p-share/-/blob/35e3653f8eb639a66f147a90d241af29e6bae3cb/web-p2p-share/wwwroot/js/ChunkStorage.js#L143-L191

@jimmywarting
Copy link
Owner Author

Hi @ivanjx

I did already know that.

I have written stuff to the IDB with a bit larger blobs, and once it was written to IDB then I would remove it immediately. it was some kind of magical way for me to move a memory blob into a blob backed up by the filesystem. And once the file got garbage collected by either closing the tab/browser then it would be deleted.

A reason why i have not adopted this solution is b/c of quota issues and particularly Safari and FF are picky about storing things when using private browser mode. so you did not get any access to IDB.

I was also afraid that removing the idb-blob after it have been written would cause it to not being readable afterwards.

await store.saveBlob( 'uuid', new Blob(['abc']) )
const blobFromDisc = await store.getBlob( 'uuid' )
await store.deleteBlob( 'uuid' )
blobFromDisc().text(success, fail) // will it work or not? (not sure, browser could do different things)

I think i remember that saving to idb first was a bit slower. you must essentually do:

  1. writing chunks to a temporary location on the disc
  2. once completed you must write a new file (by open a file descriptor (aka readable stream) from all temporary chunks on the disc to essentually concat everything into one file)
  3. after that it has to delete all the temporary files from the disc

So all in all it requires: writing many small files to disc, reading many small files and write it to one final destination; and then removing many small files. So it's really not so IO friendly.

i think this was the reason why i closed #18 (for being slow, not being readable after deletion, and qouta error, and access)

@jimmywarting
Copy link
Owner Author

opfs (aka whatwg-fs) is going to be the successor of handelingen files in the browser.
So it is better to create a readable / writable stream to opfs and then get it as a file and then save it using a download link

but even better would be to use the file system access and write directly to the disc.
but the forced atomic write operation is a deal breaker for some. Appending chunks to an already existing file requires copying file; append; and replace original file. so writing chunks at specific random locations is slow as hell when using stuff like webtorrent. opfs don't have this same restrictions.

@jimmywarting
Copy link
Owner Author

I have also suggested something similar to Blob.from({ ... }) to the FileAPI specification to allow creating Blob backed up by your own source. (may it be a remote cloud file for instance - or a Blob whose data isn't yet read)

Blob.from({
  size,
  type,
  stream() {},
  slice() {}
})

Read & comment on w3c/FileAPI#140 (comment) what you think and maybe give it a 👍 to race awareness.

You would basically be able to do something in lines of

const blob = Blob.from({
  size: 1024,
  type: 'image/png',
  slice (start, end) { ... },
  stream () {
    const { readable, writable } = new TransformStream()
    fetch('https://httpbin.org/image/png').then(res => {
      // use `res.body.tee()` if you want to cache the
      // response to not having to make another request
      // next time you try to read this same blob again.

      res.body.pipeTo(writable)
    })
    return readable
  }
})
const url = URL.createObjectURL(blob)
const link = document.createElement('a')
link.download = 'cat.png'
link.href = url
link.click()
URL.revokeObjectURL(url)

@jimmywarting
Copy link
Owner Author

jimmywarting commented Jan 8, 2023

With fetch-blob you are already able to create more arbitrary blobs with something in lines of new Blob([ blobLikeItem ]) where blobLikeItem could be a blob-like items originating from the file system. (Or whatever source you like). But the blobs are not extended or the same instances as of globalThis.Blob

this pkg is mostly only just for NodeJS and seems to be living it finals days now when NodeJS is looking into shipping something like await fs.getFile(path)

@ivanjx
Copy link

ivanjx commented Jan 8, 2023

thanks for the detailed response @jimmywarting

do you have examples of how to create, write, and read with opfs? im trying to google it but it gives me back the deprecated chrome's file system api instead.

@jimmywarting
Copy link
Owner Author

jimmywarting commented Jan 8, 2023

opfs is full of promises

const root = await navigator.storage.getDirectory()
const dirHandle = await root.getDirectoryHandle('subDir', { create: true })
const fileHandle = await dirHandle.getFileHandle('cat.png', { create: true })
const writable = await fileHandle.createWritable()
// can write blobs files, arraybuffer, typed arrays, strings, and pretty much whatever
const data = 'hi' 
await writable.write(data)
await writable.close()
const file = await fileHandle.getFile()
const content = await file.text()
console.log(content)
await root.removeEntry('subDir', { recursive: true })

there is also a sync access handle (but it's only available in web workers)

@jimmywarting
Copy link
Owner Author

I think if you search for "whatwg fs" or "file system access" on google then you will find more relevant information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants