Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite a lot of the internals of Blobs to make reading more explicit. #154

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mkruisselbrink
Copy link
Collaborator

@mkruisselbrink mkruisselbrink commented May 21, 2020

This redefines how blobs work by adding internal slots and hooks to
support Blob objects backed by various sources of data. By doing so this
tries to make more explicit how various edge cases behave, as well as
making it clear that blobs are immutable.

This fixes #75, fixes #99, and fixes #47.
It also lays the ground work to address w3c/webappsec-clear-site-data#49.
This also fixes #71.

For normative changes, the following tasks have been completed:

  • Modified Web platform tests (link to pull request)

Implementation commitment:


Preview | Diff

@mkruisselbrink
Copy link
Collaborator Author

This at least matches how chrome implements blobs backed by files, and how slicing and/or combining those blobs with other blobs works. I haven't verified yet if all browsers actually fully agree on this though.

There also is still plenty of room for improvement here, and of course follow-up work to have HTML actually call out to the new "create a file backed File object" to create files in drag&drop and upload algorithms.

@annevk do you have time to review this? not urgent, since I won't have much time in the near future to follow up anyway, but definitely would welcome your input about both the high-level idea of how this tries to define how things work, as well as details.

This redefines how blobs work by adding internal slots and hooks to
support Blob objects backed by various sources of data. By doing so this
tries to make more explicit how various edge cases behave, as well as
making it clear that blobs are immutable.

This fixes #75, fixes #99, and fixes #47.
It also lays the ground work to address w3c/webappsec-clear-site-date#49.

This also fixes #71.
Copy link
Member

@inexorabletash inexorabletash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In progress. Made it to line ~400 or so

</div>

<div algorithm="blob deserialize">
Their [=deserialization step=] (the <dfn>blob deserialization steps</dfn>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "step" → "steps"

</dl>
1. Let |read state| be a new [=bytes blob read state=].
1. If |offset| is larger than |snapshot state|[<code><a for="bytes blob snapshot state">"data"</a></code>]'s [=byte sequence/length=],
set |read state|.[=bytes blob read state/bytes=] to an empty [=byte sequence=].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I don't see the struct.name syntax called out in Infra. Do you know if there's plan to add it?

@annevk
Copy link
Member

annevk commented May 22, 2020

It would help a lot to have an enumeration of the edge cases this attempts to cover as well as some kind of overview of the design to solve them.

Copy link
Member

@inexorabletash inexorabletash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably didn't catch everything, but I made it to the end!

To <dfn lt="process blob parts|processing blob parts">process blob parts</dfn> given a sequence of {{BlobPart}}'s |parts|
and {{BlobPropertyBag}} |options|,
To <dfn lt="process blob parts|processing blob parts">process blob parts</dfn>
given a sequence of {{BlobPart}}'s |blobParts| and {{BlobPropertyBag}} |options|,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: {{BlobPart}}s (plural s, not a possessive s)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe list rather than sequence? (I'm hazy on when to switch from IDL terms to Infra terms)


1. For each |element| in |parts|:
1. Let |bytes| be an empty [=byte sequence=].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This algorithm concatenates adjacent non-blob parts rather than just ending up with more parts. Although this probably matches implementations, it doesn't seem to be observable and makes the algorithm more complicated. Is there a good reason for it?

@@ -361,16 +458,179 @@ run the following steps:
1. If |element| is a {{BufferSource}}, <a lt="get a copy of the buffer source">get
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not new, but would be more readable with "Otherwise, if ..."

@@ -361,16 +458,179 @@ run the following steps:
1. If |element| is a {{BufferSource}}, <a lt="get a copy of the buffer source">get
a copy of the bytes held by the buffer source</a>, and append those bytes to |bytes|.

1. If |element| is a {{Blob}},
append the bytes it represents to |bytes|.
1. If |element| is a {{Blob}}:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Otherwise, if ..."

:: A number, representing the byte offset in the remaining blob parts
from which to start returning data.
: <dfn>nested blob data</dfn>
:: `undefined` or a [=blob data description=]. This is `undefined` unless otherwise specified.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we typically style undefined (or null, true, false) as code ?

(Likely not consistent across specs)

let |d| be set to the {{FilePropertyBag/lastModified}} dictionary member.
If it is not provided,
set |d| to the current date and time
with constructed files, mandating UTF-16 lessens ambiquity when file names are converted to <a>byte sequences</a>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling: should be "ambiguity"

2. Let |n| be a new string of the same size as the {{fileName}} argument to the constructor.
Copy every character from {{fileName}} to |n|,
1. Let |n| be a new string of the same size as |fileName|.
1. Copy every character from |fileName| to |n|,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not new: use "code point" not "character" here and elsewhere.

with constructed files, mandating UTF-16 lessens ambiquity when file names are converted to <a>byte sequences</a>.

1. If |options|.{{FilePropertyBag/lastModified}} member is provided:
1. Let |d| be |options|.{{FilePropertyBag/lastModified}} dictionary member.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consistency: "member" or "dictionary member"

1. If |options|.{{FilePropertyBag/lastModified}} member is provided:
1. Let |d| be |options|.{{FilePropertyBag/lastModified}} dictionary member.
1. Otherwise:
1. Let |d| be the current date and time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current date / time (as millis since epoch) comes up several times in the spec. Factor it out into a definition.

1. If |options|.{{FilePropertyBag/lastModified}} member is provided:
1. Let |d| be |options|.{{FilePropertyBag/lastModified}} dictionary member.
1. Otherwise:
1. Let |d| be the current date and time
represented as the number of milliseconds since the <a>Unix Epoch</a>
(which is the equivalent of <code>Date.now()</code> [[ECMA-262]]).

Note: Since ECMA-262 {{Date}} objects convert to <code>long long</code> values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this into the domintro ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess File constructor doesn't have a domintro yet. Add one? Blob too?

That'd be a good place to call out (in giant blinking text...) that the first argument is a array. :)

Copy link

@pwnall pwnall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting my current comments, since @inexorabletash made it to the end before me. Happy to take a second look after revisions, if you'd like me to.


</div>

Issue(w3c/webappsec-clear-site-data#49): The actual storage API |serialized| was persisted in will need a way of
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The storage API used to persist |serialized| needs a way to

</div>

Issue(w3c/webappsec-clear-site-data#49): The actual storage API |serialized| was persisted in will need a way of
modifying the read algorithms for deserialized blobs. I.e. a Blob that was
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. -> E.g., or For example,

(I think the IndexedDB sentence here is an example of the above, not an elaboration.)


1. Set |serialized|.\[[SnapshotState]] to |value|'s [=snapshot state=].
A {{Blob}} has an associated <dfn export for=Blob>\[[data]]</dfn> internal slot,
a [=blob data description=].
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal slot, which holds/is a [=blob data description=] ?

(I'm trying to avoid a reparse. Without the extra words, my first parse of the sentence is that there is an internal slot and a blob data description.)

Issue(w3c/webappsec-clear-site-data#49): The actual storage API |serialized| was persisted in will need a way of
modifying the read algorithms for deserialized blobs. I.e. a Blob that was
deserialized from IndexedDB should start throwing in its read steps after
clear-site-data clears all IndexedDB data. Somehow let StructuredDeserialize
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow is redundant

modifying the read algorithms for deserialized blobs. I.e. a Blob that was
deserialized from IndexedDB should start throwing in its read steps after
clear-site-data clears all IndexedDB data. Somehow let StructuredDeserialize
pass along a hook from the storage API to here?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? -> .

### Byte Sequence backed blobs ### {#byte-sequence-backed-blobs}

The [=snapshot state=] for a byte sequence backed blob contains a
<dfn for="bytes blob snapshot state"><code>"data"</code></dfn> member, a [=byte sequence=].
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Disclaimer: I'm not versed in spec-ese) I would have expected (map) entry instead of member here.

The [=snapshot state=] ... has a "data" entry that stores a [=byte sequence=]. ?


</div>

A <dfn>bytes blob read state</dfn> is a [=struct=] conssting of
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conssting -> consisting (typo)

or a [=struct=] that has a single <dfn for="bytes blob read state">bytes</dfn> member, which is a [=byte sequence=]

where each part could be a blob itself.

The [=snapshot state=] for a multipart blob contains a
<dfn for="multipart blob snapshot state"><code>"parts"</code></dfn> member,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above regarding member vs (map) entry


1. For each |element| in |blobParts|:
1. If |element| is a {{USVString}}, run the following substeps:

1. Let |s| be |element|.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not introduced by this PR)

I think it would be nice to replace |s| with |str|. This would avoid [=UTF-8 encoding=] |s| which reads like it could be a possessive pronoun.

### Byte Sequence backed blobs ### {#byte-sequence-backed-blobs}

The [=snapshot state=] for a byte sequence backed blob contains a
<dfn for="bytes blob snapshot state"><code>"data"</code></dfn> member, a [=byte sequence=].
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about renaming the "data" map entry to "bytes" to avoid the naming conflict with the [[data]] internal slot?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants