Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blob URL store partitioning #153

Open
annevk opened this issue May 5, 2020 · 23 comments
Open

Blob URL store partitioning #153

annevk opened this issue May 5, 2020 · 23 comments
Labels
normative change privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. security-tracker Group bringing to attention of security, or tracked by the security Group but not needing response.

Comments

@annevk
Copy link
Member

annevk commented May 5, 2020

https://privacycg.github.io/storage-partitioning/ has some general background here and https://trac.torproject.org/projects/tor/ticket/15502 is much more specific.

@bakulf was thinking that we could restrict blob URL lookup to the agent cluster (in addition to origin, that is). The one tweak I would suggest to that is that navigating a top-level browsing context (including a noopener one) to a blob URL still ought to work.

Concretely, this would mean that if you have https://example.com/ open twice, in separate browsing context groups, any blob URLs they mint cannot be used by the other.

The one gotcha with the tweak I suggested is that the other could observe existence through a popup then. Now that's an attack that's unlikely to yield anything useful in practice, but we could break that too by forcing noopener or a version of COOP that never matches (and thus always creates a new browsing context group).

We suspect this to be web-compatible and are happy to try it out in Firefox.

cc @mkruisselbrink @hober @SubhamoyS

@annevk annevk added normative change privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. security-tracker Group bringing to attention of security, or tracked by the security Group but not needing response. labels May 5, 2020
@mkruisselbrink
Copy link
Collaborator

Limiting it like that doesn't seem too crazy to me, although I'm not entirely sure I understand the attack/threat model that defends against.

A blob URL can only be resolved when the page that created it is still alive, and when the page trying to fetch it is same origin with the original page. If that is the case, you might as well just BroadcastChannel to talk to the original page, rather than jumping through hoops with blob URLs?

Of course if storage partitioning blocks BroadcastChannel because of different top-level URLs things would be different, but in that case wouldn't it make more sense to partition blob URLs the same way as other communication mechanisms, rather than having them be partitioned even more?

@mkruisselbrink
Copy link
Collaborator

Having said all that, for MediaSource blob URLs chrome already limits them to their agent cluster. And scoping all blob URLs to their agent cluster certainly would make the implementation a whole lot simpler (well, the top-level navigation case would still be tricky). So as an implementer agent cluster seems reasonable, as a spec editor I question if that is the right scoping level.

@annevk
Copy link
Member Author

annevk commented May 22, 2020

The keying for storage seems like it will end up being dynamic (i.e., storage access). I think if we can avoid things being dynamically keyed that's preferable.

To expand on that, if the key were to change it would mean old blob URLs in that environment would no longer be accessible, which would be quite weird, especially since blob URLs also serialize some origins into their identifier. And also, it seems that if two browsing context groups needed to share a blob they could/would share the object instead of the URL.

@smaug----
Copy link

It would be a bit odd inconsistency if sharing objects worked, but sharing blob URLs didn't.

@mkruisselbrink
Copy link
Collaborator

That doesn't seem that different from today? Blob objects can be shared across origins, while blob URLs can only be resolved by same origin originators. As such there are already plenty of cases where sharing a blob URL doesn't work but sharing a blob, and having the receiver create a blob URL from it does work.

@smaug----
Copy link

That is somewhat different inconsistency tough, since it is about cross-origin.

@annevk
Copy link
Member Author

annevk commented Aug 14, 2020

It seems that blob objects could always be widely shared and that does not change. They essentially have no scope. Blob URLs had a scope that was undefined to some extent (see #135). This proposes to scope blob URLs more clearly, to the agent cluster they are created in as well as any new top-level browsing contexts (which we'd ideally force noopener on).

@wanderview
Copy link
Member

I don't understand why we would do this instead of just making blob URLs only loadable by contexts with the same StorageKey. It would seem better to lean on the partitioning we are pursuing in other APIs instead of introducing a new kind of isolation.

@annevk
Copy link
Member Author

annevk commented Jun 25, 2021

Apart from the issues discussed above that would not work well for top-level navigations to partitioned blob: URLs.

@wanderview
Copy link
Member

Apart from the issues discussed above that would not work well for top-level navigations to partitioned blob: URLs.

What are the issues above? I looked at the links but it was not obvious to me. Dynamic keying? Not all browsers are proposing to do that.

And I don't see why its problematic for top-level navigations. Can you explain that? If we have a StorageKey for the blob we can propagate that to the context. Just like how we are going to have partitioned service workers if they are registered with a partitioned StorageKey.

But even if we wanted to prevent blob URLs from partitioned StorageKeys from being navigated to, I don't see why we need to block a context with the same StorageKey as the blob from loading it as a subresource.

@asutherland
Copy link

asutherland commented Jun 25, 2021

An upside of limiting Blob URL's to agent clusters is that it limits the potential for browser compat issues related to races for resolving Blob URLs against the Blob URLs being revoked. Right now resolve a Blob URL references the Blob URL Store like it's something that's synchronously accessible across all agent clusters. (I believe this leads to the situation described in #157 where Chrome has to use sync IPC when creating Blobs.)

That could alternately be addressed by specifying Blob resolution and Blob URL Store manipulations in a more multi-process-aware way. The previously referenced #157 would likely entail this because of the integration with storage.

@mkruisselbrink
Copy link
Collaborator

(kind of aside, but Chrome doesn't actually need to use sync IPCs when creating Blobs anymore, at least not for any web-exposed API. We still do because some internal and chrome extension usage of blobs might otherwise have race conditions).

Blob URL creation however does indeed need sync IPCs in particular to solve situations like: 1) agent1 registers a blob URL; 2) agent1 postMessages said URL to some other agent2; 3) agent2 tries to resolve the blob URL. With chrome's IPC there is no guarantee that the registration in step 1 arrives in the browser process before the attempt to resolve from step 3 arrives. As you say, if all blob URL registrations were scoped to the agent cluster they are created in this would no longer be an issue, and we could possibly eliminate these as sync IPCs.

So with my implementer hat on I think scoping by agent cluster makes a lot of sense. I don't have a good idea how web compatible that would be, but it shouldn't be too hard to collect metrics for that.

With my spec editor hat on, I'm not sure what makes more sense; scoping by storage key seems like it would solve all the same issues, while probably being more web compatible. On the other hand it does feel a bit weird to me to use storage key for this as blob URLs don't really seem like anything storage related to me (but then we're also using storage key for things like broadcast channel, so that isn't much of an argument). Also I'd say that anything that drives down usage of blob URLs might be a good thing, so being as restrictive as we can while not breaking too much does seem attractive. Overall I don't feel particularly strongly either way.

@wanderview
Copy link
Member

FWIW, if there are other reasons to restrict blob URL loading to same agent, I'm not objecting to that. But lets be clear that's the reason its being done. I just want to avoid conflating agent isolation with storage partitioning because they are conceptually different things.

@arichiv
Copy link
Member

arichiv commented Jul 19, 2021

Heads up that I'm examining the potential breakage this would cause if implemented in chrome: https://bugs.chromium.org/p/chromium/issues/detail?id=1224926

@annevk
Copy link
Member Author

annevk commented Jul 20, 2021

@wanderview agreed, though I would add that agent cluster isolation does obviate the need for partitioning.

(To explain the scenario above: if A embeds B and B mints a blob URL, you'd want the user to be able to copy that URL and navigate to it. It's not clear to me how that would work if lookup for blob URLs would use the storage key.)

@wanderview
Copy link
Member

Sure, but the first post explaining why you want to do agent clustering isolation uses 3rd party partitioning as its primary and only motivation. An explainer laying out the additional motivations would be useful for clarifying why we are attempting a bigger hammer than needed for 3rd party partitioning.

(To explain the scenario above: if A embeds B and B mints a blob URL, you'd want the user to be able to copy that URL and navigate to it. It's not clear to me how that would work if lookup for blob URLs would use the storage key.)

(Just to answer this, but not suggesting we need to do this if agent cluster isolation is preferred for other reasons, but:)

A blob URL has a uuid in it, correct? That can be used to look up the storage key associated with the blob and apply it to the context created by navigation. This seems just like how a browser has to use the uuid to lookup the origin of the blob based on the URL today.

@mkruisselbrink
Copy link
Collaborator

A blob URL has a uuid in it, correct? That can be used to look up the storage key associated with the blob and apply it to the context created by navigation. This seems just like how a browser has to use the uuid to lookup the origin of the blob based on the URL today.

I think we're all pretty much describing the same thing, just in a different manner. We want to change the "map" we look up blob URLs in to not only be keyed on blob URL/UUID, but also be keyed on storage key or agent cluster. However for navigations we still need to be able to look up an entry in this map while ignoring the second part of the key, i.e. we'll still need the existing blob URL -> blob map as well.

@mkruisselbrink
Copy link
Collaborator

FWIW, we landed metrics in Chrome a while ago to determine potential breakage if we'd partition blob URLs by agent clusters: https://www.chromestatus.com/metrics/feature/timeline/popularity/3963. So about ~0.1% of page loads might be broken if blob URLs are tied to an agent cluster.

@annevk
Copy link
Member Author

annevk commented Oct 21, 2021

@mkruisselbrink does that exclude top-level navigations to blob URLs? As those could result in a new agent cluster but would nonetheless remain working.

@arichiv
Copy link
Member

arichiv commented Oct 21, 2021

@mkruisselbrink
Copy link
Collaborator

3963 is only for subresources, 3964 would be for top-level navigations where the initiators agent cluster doesn't match the blob URLs agent cluster (not sure if we would block those as well or not)

@annevk
Copy link
Member Author

annevk commented Oct 21, 2021

Interesting, I don't think we ran into any issues with subresources. We did ran into issues with top-level navigations as we broke those and have yet to try again. (As you note in #153 (comment) those should work.)

@annevk
Copy link
Member Author

annevk commented Jun 1, 2022

Discussing this with @artines1 again today he pointed out that we hit a problem in Firefox with an A nests (sandboxed) B scenario and B then creating a blob URL and attempting to download it. For some reason that doesn't use the agent cluster of B. He'll look into it a bit more.

(If that ends up being a blocker I suppose we might be stuck with the "storage key" here, which is unfortunate given the IPC calls that would be needed still in certain scenarios. What's less of a concern these days is unpartitioning as all browsers seem aligned on always partitioning storage.)

webkit-commit-queue pushed a commit to sysrqb/WebKit that referenced this issue Aug 23, 2023
https://bugs.webkit.org/show_bug.cgi?id=260035
rdar://problem/113705298

Reviewed by Alex Christensen and Sihui Liu.

Public blob URLs are only accessible from same-origin dcuments, but access is
not restricted by the top-level origin. This means that Blob URLs can be used
as a cross-origin tracking mechanism within iframes. In this patch we partition
public blob URLs within the Blob Registry by top-level origin. This
partitioning is controlled by a feature flag that is disabled by default.

I took a few approaches at solving this. The most difficult challenge was
finding a solution that allowed retrieving BlobData using a public blob URL
from WKWebView APIs. In that case, the relevant top document may not be
obvious, or may not exist. As a result, the design of this partitioning is more
like access control rather than adding another key into the hashmap.

Two alternative designs I considered include creating a second hashmap that is
keyed by <URL, SecurityOriginData> and we lookup the BlobData in that map if we
have a SecurityOriginData, otherwise we use the unpartitioned map. Or, we
create a new map from URL -> SecurityOriginData where we can lookup the
associated top origin SecurityOriginData if we don't already know it. However,
both of these options are more complex than the chosen implementation, and
neither of them seemed safer.

This change also enforces a noopener policy on new windows when the top origin
of the opener is cross-origin with the blob's security origin. This is a
mitigation that was discussed in the blob URL storage partitioning issue [0]
with cross-engine support, and that seemed reasonable to me.

[0] w3c/FileAPI#153

* LayoutTests/TestExpectations:
* LayoutTests/http/tests/local/blob/download-blob-from-iframe-expected.txt: Added.
* LayoutTests/http/tests/local/blob/download-blob-from-iframe.html: Added.
* LayoutTests/http/tests/local/blob/navigate-blob-expected.txt: Added.
* LayoutTests/http/tests/local/blob/navigate-blob.html: Added.
* LayoutTests/http/tests/local/blob/resources/broadcast-channel-proxy.html: Added.
* LayoutTests/http/tests/local/blob/resources/iframe-creating-or-downloading-blob.html: Added.
* LayoutTests/http/tests/local/blob/resources/iframe-for-creating-and-navigating-to-blob.html: Added.
* LayoutTests/http/tests/local/blob/resources/main-frame-with-iframe-creating-or-navigating-to-blob.html: Added.
* LayoutTests/http/tests/local/blob/resources/main-frame-with-iframe-downloading-blob.html: Added.
* LayoutTests/http/tests/security/blob-null-url-location-origin-expected.txt:
* LayoutTests/http/tests/security/blob-null-url-location-origin.html:
* LayoutTests/http/tests/security/cross-origin-blob-transfer-expected.txt: Added.
* LayoutTests/http/tests/security/cross-origin-blob-transfer.html: Added.
* LayoutTests/http/tests/security/resources/iframe-cross-origin-blob-transfer.html: Added.
* LayoutTests/http/tests/security/top-level-unique-origin2.https.html:
* LayoutTests/platform/gtk-wk2/http/tests/local/blob/download-blob-from-iframe-expected.txt: Added.
* LayoutTests/platform/mac-wk1/TestExpectations:
* Source/WTF/Scripts/Preferences/UnifiedWebPreferences.yaml:
* Source/WebCore/fileapi/BlobURL.cpp:
(WebCore::BlobURL::isInternalURL):
* Source/WebCore/fileapi/BlobURL.h:
* Source/WebCore/fileapi/ThreadableBlobRegistry.cpp:
(WebCore::ThreadableBlobRegistry::registerInternalFileBlobURL):
(WebCore::ThreadableBlobRegistry::registerInternalBlobURL):
(WebCore::ThreadableBlobRegistry::registerInternalBlobURLOptionallyFileBacked):
(WebCore::ThreadableBlobRegistry::registerInternalBlobURLForSlice):
(WebCore::isInternalBlobURL): Deleted.
* Source/WebCore/loader/FrameLoader.cpp:
(WebCore::FrameLoader::loadURL):
(WebCore::FrameLoader::loadPostRequest):
(WebCore::createWindow):
* Source/WebCore/platform/network/BlobRegistryImpl.cpp:
(WebCore::BlobRegistryImpl::registerBlobURLOptionallyFileBacked):
(WebCore::BlobRegistryImpl::unregisterBlobURL):
(WebCore::BlobRegistryImpl::getBlobDataFromURL const):
(WebCore::BlobRegistryImpl::addBlobData):
(WebCore::BlobRegistryImpl::registerBlobURLHandle):
(WebCore::BlobRegistryImpl::unregisterBlobURLHandle):
* Source/WebCore/platform/network/BlobRegistryImpl.h:

Canonical link: https://commits.webkit.org/267172@main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
normative change privacy-tracker Group bringing to attention of Privacy, or tracked by the Privacy Group but not needing response. security-tracker Group bringing to attention of security, or tracked by the security Group but not needing response.
Projects
None yet
Development

No branches or pull requests

6 participants