Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DESIGN] dataflow for composability (#515) #645

Merged
merged 6 commits into from
May 20, 2024

Conversation

catamorphism
Copy link
Collaborator

@catamorphism catamorphism commented Feb 14, 2024

This proposed design doc addresses an issue titled "inspecting formattable values", which is really about dataflow through the formatter and structuring it to make function calls compose with each other.

Another PR, #646 , shows how the spec would change if this design doc was accepted. I made changes to the design doc after making the changes to the formatting spec in #646, and didn't have time to update the formatting spec accordingly (yet), so while #646 uses some different terms, it should still serve to give a sense of what the spec would look like if this design doc was accepted.

This is not a finished design doc by any means, but I'm hoping to get a thumbs-down or thumbs-up on the idea before I polish it any further.

This proposed design doc addresses an issue titled
"inspecting formattable values", which is really about dataflow
through the formatter and structuring it to make function calls
compose with each other.
catamorphism added a commit to catamorphism/message-format-wg that referenced this pull request Feb 14, 2024
This change updates the formatting spec to reflect the
changes proposed in unicode-org#645. It should not be merged as-is.
It also uses slightly different terms than the design doc in unicode-org#645,
but should serve to give a sense of what the spec would look
like if the "composability" design doc was accepted.
catamorphism added a commit to catamorphism/message-format-wg that referenced this pull request Feb 14, 2024
This change updates the formatting spec to reflect the
changes proposed in unicode-org#645. It should not be merged as-is.
It also uses slightly different terms than the design doc in unicode-org#645,
but should serve to give a sense of what the spec would look
like if the "composability" design doc was accepted.
@catamorphism catamorphism marked this pull request as ready for review February 14, 2024 02:00
catamorphism added a commit to catamorphism/message-format-wg that referenced this pull request Feb 14, 2024
This change updates the formatting spec to reflect the
changes proposed in unicode-org#645. It should not be merged as-is.
It also uses slightly different terms than the design doc in unicode-org#645,
but should serve to give a sense of what the spec would look
like if the "composability" design doc was accepted.
@eemeli
Copy link
Collaborator

eemeli commented Feb 14, 2024

I am not comfortable attempting to review this design doc and the accompanying spec changes during this week. The changes they propose are extensive, but at the same time it's not clear to me if they actually change any externally observable behaviour.

As in #515, the doc starts from a premise that

Custom formatting functions should be able to inspect the raw value and formatting options of their arguments.

but does not discuss why access specifically the a "raw" value (and options?) is required or beneficial, as opposed to a value and options. I am concerned that such a starting point limits the expressibility of e.g. messages such as:

.input {$names :list}
.local $head = {$names :slice start=0 end=2}
.local $tail = {$names :slice start=2}
.match {$head :count} {$tail :count}
0   *   {{No-one liked this}}
one *   {{{$head} liked this}}
*   0   {{{$head} liked this}}
*   one {{{$head :list type=unit} and {$tail :count} other person liked this}}
*   *   {{{$head :list type=unit} and {$tail :count} other people liked this}}

where a complex input value (a list of names) is used to construct other complex values, on which further operations (determining the item count and adding list formatting options) result in selection and formatting.

It's entirely possible that the above works fine with the proposed text, but this current week does not provide sufficient time for making that determination for this particular case, or for other complex messages.

I'm also a little surprised by the number of new interfaces that are introduced. For the JS implementation, I found it sufficient to have a single MessageValue interface representing what the current spec text refers to as a "resolved value".

Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(chair hat off)

I tend to agree with @eemeli. Actually, I'd go further.

This is great thinking and shows off the fact that you have been writing an implementation. However, I think it's too specific. This would require implementations to write their internals in specific ways (and we'd have to create tests to prove that the implementation had followed the spec). Since what we care about are outputs, our guidance can and should be limited to external appearances.

Most of the guidance here seems targeted at behavior we could specify for the built-in functions (e.g. whether options are transitive between declarations) rather than normative definitions.

Let's discuss today and see where others in the WG land.

Comment on lines +94 to +99
But without `:number` being able to access the previously passed options,
the two fragments won't be equivalent.
This requires `:number` to return a value that encodes
the options that were passed in, the value that was passed in,
and the formatted result;
not just the formatted result.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider the counter example, in which the user does NOT want transitivity?

.input $datetime = {|2024-02-14T01:23:45Z| :datetime}
.local $date = {$datetime :date}
.local $time = {$datetime :time}
.local $a = {|1.234 :number maximumFractionDigits=2 maximumDecimalDigits=0}
.local $b = {$a :integer}
{{This prints 1.23: {$b}.{$a} (Yes, its perverse)}}

@stasm
Copy link
Collaborator

stasm commented Feb 15, 2024

@aphillips:

However, I think it's too specific. This would require implementations to write their internals in specific ways (and we'd have to create tests to prove that the implementation had followed the spec).

I actually think this is rather agnostic, which is why it may also seem like it introduces a lot of new interfaces, as @eemeli observed.


I gave @catamorphism feedback on an earlier draft of this PR, and I'd like to have more time to review this iteration, but I won't be able to get to it this week.

I'd love to be able to continue discussing this post-LDML 45. Perhaps this is part of the "implementer's feedback" that we're looking for?


In the meantime and in the 45 timeframe, would it make sense to review #646 looking for changes which would introduce incompatibilities to the current spec? If all #646 does is clarify concepts without changing the observable formatting behavior, then I think it should be acceptable to continue this work for possible inclusion in the final spec?

@aphillips aphillips added normative specification LDML46 Items that must be first for post-tech preview (LDML46) and removed Agenda+ labels Feb 15, 2024
@catamorphism
Copy link
Collaborator Author

I'll plan on reading through the formatting spec to look for anything that would make it impossible to adopt this proposal later. (Busy week for me so I'm not sure when, but I'll do my best.)

- Define the structure passed in as an argument to a custom formatting function.
- Define the structure that a custom formatting function should return.
- Maintain the options passed into the callee as a _separate_ argument to the
formatter, to avoid confusion. (See Example 4 below.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had somewhat different thoughts.

.input {$v1 :f1 o1=a...}
.local $v2 {$v1 :f2 o1=a o2=b}

  • f1 == f2. For the same function (or aliases with different options), do a merge on the option map (with the later ones winning). This could be extended to alias functions (the same under the covers, with different option settings).
  • f1 ≠ f2. For different functions, it gets tricky. I think we need to require it to be specified in the registry.

Example:
.input {$v1 :number o1:a}
.local $v2 {$v1 :date o1=a o2=b}

The registry could specify that :date can handle a resolved :number expression in the following way.

  • The value of the :date expression is the number converted to a double value and interpreted as the number of seconds since 1970-01-01T00:00:00Z.
  • The number operands are discarded except for o3=x, which is mapped to o9 in the following way: ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@macchiati noted:

I had somewhat different thoughts.

My thoughts have evolved here also. I think it isn't the function or isn't just the function that determines whether an option is inherited. I think options should be specified as inheritable (or not).

Consider:

.input {$d :datetime year=numeric month=long day=numeric}
.local $t = {$d :datetime hour=numeric minute=numeric}
{{The event is on {$d} at {$t}.}}

These are the same input variables and the same function. You don't want to inherit skeleton/fields.

Now consider:

.input {$d :datetime dateStyle=long timeZone=$userZone numberingSystem=Latn}
.local $t = {$d :datetime timeStyle=long}
{{The event is on {$d} at {$t}.}}

Again, you don't want to inherit the styles. But you do want to inherit the coerced time zone and the numbering system (because it makes more sense to inherit it).

The registry could specify that :date can handle a resolved :number expression in the following way.

The registry can already specify that. And implementations can already do that, as we allow implementation defined types. The :number options won't matter to :date (etc.) and not be inherited.

Note that :datetime, :date, and :time are not the same function (f1 ≠ f2) but probably inherit/share some fields when "composed" as in the examples above (replace :datetime with :date and :time for example). The current registry doesn't do this because the potentially shared fields were pushed down into the RGI bucket for 45.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it depends on the example. In the following, I do want to inherit.

.input {$d :datetime year=numeric month=numeric day=numeric}
.local $d2 = {$d :datetime month=long}
{{The event is on {$d} at {$d2}.}}

What I think would be cleaner would be to have options like suppress=date, suppress=time. These cause a host of options to be set to none. Then the following would be very clear as to what is to happen, especially for the poor reader who doesn't want to have to consult the registry for each attribute to find out whether it inherits or not.

.input {$d :datetime year=numeric month=long day=numeric}
.local $t = {$d :datetime suppress=date hour=numeric minute=numeric}
{{The event is on {$d} at {$t}.}}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have time right now to comment in depth, but I really want to discourage, again, the use of the word "inherit" here, since options are not related to object-oriented inheritance. Even using it metaphorically is bound to cause confusion in people who aren't in this discussion now but may join later.

I think the right thing to do instead is to define data types that standard functions return, and exhaustively specify the required and optional fields in those data types; if an option doesn't appear there, then it can't be used in composing functions. (And then, custom function writers get a way to define these types for themselves.)

There is no escaping that we're talking about describing dynamic data in a static way, so this is like defining a "data model" for runtime values to make it practical to understand and describe how functions interoperate. In such a model, options are passed as components of a data structure; not inherited.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@catamorphism You're right. The word inherit probably shouldn't be used in the spec.

@macchiati

.local $d2 = {$d :datetime month=long}

Actually, that's ambiguous. It's not clear if your intention for $d2 is to produce January or January 1. From my point of view, if you mess with the format, it's the former (or you would have added the day option). Here's another place skeletons are the friendlier interface.

What I think would be cleaner would be to have options like suppress=date, suppress=time.

This is what the functions :date and :time do 😜

especially for the poor reader who doesn't want to have to consult the registry for each attribute to find out whether it inherits or not.

This is a concern. But here in the design doc we should capture the use cases for options. We should also capture the use cases for operand mutation. For example, given my given name as input for $name:

.input {$name :text-transform operation=uppercase}
.local $foo = {$name :truncate length=5 ellipsis=true}
{{Your name is {$foo}.}}

Does that print "Your name is ADDIS..." or "Your name is Addis..." or what?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that print "Your name is ADDIS..." or "Your name is Addis..." or what?

I think that must depend on the :text-transform and :truncate implementations, and how their authors intend for the values to compose. I think we must allow for each function to make choices when determining their resolved values about the "value" and "options" they present.

Something very similar is achievable even with our most basic built-in functions, and without options:

.local $x = {4.2e1 :number}
{{{$x :string}}}

If :number presents its input directly as its value, that should format as "4.2e1", but if it presents a numeric value, then that should format as "42". I do not think enforcing the first on a spec level makes much sense.

Similarly, a "text-transform" could effectively consume its operation=uppercase option when constructing its value, or consider it a string formatting option. So either of these are conceivable in the resolved value:

  1. Value 'Addison', options { operation: 'uppercase' }
  2. Value 'ADDISON', options {}

As we've discussed in a variety of contexts, not all options are really the same, so I'd say that enabling this freedom is the right choice.

@aphillips aphillips added the design Design principles, decisions label Apr 15, 2024
@aphillips aphillips changed the title Add design doc for dataflow for composability (#515) [DESIGN] dataflow for composability (#515) Apr 15, 2024
and the structure of resolved values is left completely
implementation-specific.

Providing a mechanism for custom formatters to inspect more
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@mihnita
Copy link
Collaborator

mihnita commented May 12, 2024

I think this document focuses too much (only?) on the idea of the same function merging options or not.

But I don't think that is very interesting, or useful.
It only saves some typing.
And I don't think that using variations of the same parameter (with different options) are that common.

What is the most interesting is composing functions by "chaining" them.
Functions that take one type + options and return a different type (or the same type), with new, or possibly modified options.

That would allow (for example) to do transformations on parameters.

Take a person and return date of birth.
Take a date and return days since that date.

Take a string and return a transformed version of it (changing case, or grammatical form).
Or return the original string, but with extra info attached as option (for example result of a grammatical analysis).

@catamorphism
Copy link
Collaborator Author

@mihnita That's fair; the document does focus a lot on composing options because that's the problem that comes up first with built-in functions. (Composing number with datetime, or vice versa, isn't too interesting.) There is an example in #753 ("Example B1") of what you're talking about. Since #753 is intended to land before this PR lands, do you think that's enough or would you prefer to see more examples like that?

@aphillips aphillips merged commit 76a676c into unicode-org:main May 20, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design principles, decisions LDML46 Items that must be first for post-tech preview (LDML46) normative specification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants