Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax map shape validation to allow for other key types than "string" #2111

Open
Baccata opened this issue Jan 24, 2024 · 4 comments
Open

Relax map shape validation to allow for other key types than "string" #2111

Baccata opened this issue Jan 24, 2024 · 4 comments
Labels
feature-request A feature should be added or improved.

Comments

@Baccata
Copy link

Baccata commented Jan 24, 2024

Screenshot 2024-01-24 at 18 18 26

Currently smithy maps can only take string shapes as keys. Considering the protocol-agnostic nature of smithy, this is somewhat arbitrary and prevents its usage in some legitimate usecases.

Intuitively, I guess this limitation is driven by the fact that it makes sense in the context of AWS protocols and tools. In smithy itself, having arbitrary shapes as map keys may not play well with the current canonical "object-node" representation of maps (representation used by traits in particular, but also other things).

I personally think the limitation solvable :

  1. protocol specific validators could be created to enforce the same limitation in the context of AWS protocols
  2. the canonical object-node representation of maps could be used for a lot more keys than just strings (numerics in particular)
  3. for more complex shapes, a canonical representation of [{"key" : ???, "value: ???}] could be used.
  4. Alternatively to 3, the constraint on map keys could be relaxed to incorporate strings/numerics/booleans/timestamps/bytes/enums/intEnums but exclude structures/unions/documents/blobs.
@mtdowling mtdowling added the feature-request A feature should be added or improved. label Feb 2, 2024
@mtdowling
Copy link
Member

I can expand on some of the reasons behind this limitation.

Maps right now support string and enum keys. I wouldn't characterize this limitation as arbitrary, but rather a deliberate simplification in the model that in turn simplifies code generators (e.g., no need to worry about making every generated type hashable) and protocols (like you mentioned).

There hasn't been a lot of demand or use cases over the years (inside or outside of Amazon) to expand maps to support other key types. I'd expect that if such a problem occurred, the model could adopt a list of structs approach to avoid any protocol or programming language pitfalls. If a generator really needed to make this convert to a map in a target programming language, I suppose a generator could introduce their own custom traits to facilitate this as well.

Supporting all types as map keys (e.g., structures, unions, etc) isn't something we'd do since that brings in new requirements to code generators that every generated type can be used in their programming language's map type (this is a protocol agnostic concern and unrelated to what protocols AWS supports).

Types that aren't easily comparable or hashable like floats, doubles, documents (since it's sort of arbitrary), and probably timestamps won't work across languages for map keys either (e.g., Rust doesn't let you use floats as a map key).

So that leaves potentially adding support for non-float numbers (byte, short, integer, long, bigInteger, bigDecimal), intEnum, blob, and boolean.

  • numbers: anyone building a JSON protocol would either have to forbid map keys of numbers, serialize numbers as strings, or define some kind of automatic strategy to flip from using a true JSON object to using a list of objects. Serializing them as strings isn't as clear-cut as it sounds since you'd have to deal with all the different ways numbers can be written and maybe define a canonical format (e.g., { "1": "a", "01": "b", "1e0": "c" }). I'd also worry about the ramifications of suddenly requiring that bigInteger or bigDecimal have to be hashable now. Sure, protocols can forbid these keys, but these kinds of protocol limitations often come back to bite us over time as teams add support for new protocols.
  • blob: I'm unsure about the utility here, but I'm more worried about whether the choices people have made to-date about code generating blobs in their programming languages are compatible with their map data structures.
  • boolean: while it's probably readily hashable in most programming languages, I've not seen any demand for boolean keys that makes me think code generators should need to worry about it.

@Baccata
Copy link
Author

Baccata commented Feb 7, 2024

Hey Michael, thank you very much for providing all this context, this is very helpful.

Supporting all types as map keys (e.g., structures, unions, etc) isn't something we'd do since that brings in new requirements to code generators that every generated type can be used in their programming language's map type (this is a protocol agnostic concern and unrelated to what protocols AWS supports).

Types that aren't easily comparable or hashable like floats, doubles, documents (since it's sort of arbitrary), and probably timestamps won't work across languages for map keys either (e.g., Rust doesn't let you use floats as a map key).

I think this argument is sensible : being mindful of the various languages semantics and trying to cater to the lowest common denominator in what programming languages allow for, in the context of an IDL, makes sense. The proto IDL has similiar limitations in place.

This leaves us with the rest :

So that leaves potentially adding support for non-float numbers (byte, short, integer, long, bigInteger, bigDecimal), intEnum, blob, and boolean.

For the sake of the debate : let's put aside blobs and booleans.

Numbers

numbers: anyone building a JSON protocol would either have to forbid map keys of numbers, serialize numbers as strings, or define some kind of automatic strategy to flip from using a true JSON object to using a list of objects. Serializing them as strings isn't as clear-cut as it sounds since you'd have to deal with all the different ways numbers can be written and maybe define a canonical format (e.g., { "1": "a", "01": "b", "1e0": "c" }).

This is where the main point of contention lies, I think : these are protocol-specific arguments used to support why a protocol-agnostic IDL shouldn't necessarily support a specific feature. And although these are good explanations as to why you'd not want to support numbers as map keys in the context of AWS protocols, I'd argue that it's the protocol designer's prerogative to decide whether to support certain shapes or not, and what the serialisation/routing semantics should be when the shapes are supported.

Sure, protocols can forbid these keys, but these kinds of protocol limitations often come back to bite us over time as teams add support for new protocols.

I'm not quite sure how to respond to that to be honest. I can certainly empathise with the maintenance burden (which this very discussion is contributing to, sorry about that 😅), but I don't think that it makes the request illegitimate, in particular if it puts smithy in a better place to capture the semantics of some binary protocols (see next section for a motivating example).

Moreover, there is a precedent, in that the smithy.api@protocolDefinition trait has a noInlineDocumentSupport property. The presence of document in the set of "simple types" is, I think, a brilliant decision, and it been a shame to not include the concept of document in the IDL because they don't have "natural" translation in XML protocols.

(Full disclosure, I think the noInlineDocumentSupport property is a miss : a selector would have been much more flexible : some protocols could decide that Blobs should not be used ...)

I'd also worry about the ramifications of suddenly requiring that bigInteger or bigDecimal have to be hashable now.

I don't know enough about their representation in non-jvm languages to comment over this. Probably safe to ignore for now.

Context of this issue

To elaborate on the relevance of the issue, here's some context/motivation : we have elected to support gRPC as a protocol in our internal smithy-related offering (both at an IDL level and in our runtime-libraries). This allows applications to offer different means of interactions for callers at a very reduced cost to our service engineers, who'd be able to enable gRPC in their applications with a couple lines of code (as opposed to using separate code-generators depending on the protocol). Overall, using smithy to describe gRPC interactions and protobuf serialisation works pretty well, and really fits in the "smithy as a source of truth" paradigm that my team is pushing for in our organisation.

However, one thing that has appeared recently is a desire for teams to capture "passthrough data". For instance, in the context of JSON protocol, the requirement is essentially to provide a mean to capture "unknown fields" from received JSON objects. We can easily amend our protocol with an additional @unkownJsonFields trait (name TBD), that could annotate a map, signaling to the deserialiser that any JSON fields it doesn't recognise should be stored in the map rather than discarded. Here's what it'd look like :

map OtherInfo {
  key: String
  value: Document
}

structure Person {
  @required
  firstName: String
  @required
  lastName: String

  /// This captures any inlined unkown field, as opposed to expecting a `otherInfo` json field
  @unkownJsonFields
  otherInfo: OtherInfo
}

This would allow for capturing the entirety of the a json payload like {"firstName" : "John", "lastName": "Doe", age: 34}, which is useful for some service that need to operate
of a subset of a payload but passthrough the rest.

Now this is all good and well for JSON. However, in the context of Protobuf, the equivalent of OtherInfo type would be :

map OtherInfo {
  key: Integer
  value: Blob
}

Which is currently impossible due to the limitation that map keys should be strings.

The case of retro-fitting existing protocols into Smithy.

I think where Smithy shines is that it allows for the retro-fitting of existing protocols.

It is exemplified by our alloy library, which captures various bits of semantics that come from common REST/JSON APIs patterns that the official smithy Prelude (smithy.api) doesn't capture. Thins like alternative JSON encodings for unions, or some ad-hoc constraints that can be described in openapi (uuid format) that we wanted to capture when moving towards smithy.

Another example is the Build Server Protocol, which was initially described as a bunch of markdown documents, which now takes a smithy specification as a source of truth for documentation and bespoke code-generators alike. The BSP is essentially an "extension" of the LSP, and therefore some bespoke jsonRPC-related protocol semantics were needed to be captured.

I know of some other cases of companies that are using smithy as a source of truth for some complex data pipelines involving both Google's BigQuery tables and ElasticSearch tables.

My point here is that Smithy really shines at capturing semantics of existing ad-hoc protocols in the wild, and that the flexibility of its core semantics participate to this. It's something that I think is worth leaning into, as it participates to the language becoming more popular in the industry.

Allowing for supporting integer/long map keys would allow for exhaustively capturing protobuf semantics (and probably the semantics of other binary protocols), which is valuable in the context of my organisation, but probably also in the context of the wider ecosystem.

mtdowling added a commit that referenced this issue Feb 21, 2024
This commit adds a new meta-trait called `constrainShapes` that is
used to constrain the closure of shapes that can be used when a trait
is applied to a shape.

This can be used on a protocol definition trait to disallow the use of
certain kinds of shapes that are incompatible with the protocol
(e.g., some AWS protocols that use XML don't support document types).
This trait is a replacement for the `noInlineDocument` property of
the `protocolDefinition` trait, but generalizes the concept to work
with any kind of shape or trait.

While this could have been acheived with metadata based validators
before, introducing a trait makes it easier and more performant to
constrain shapes. For example:

```
metadata validators = [
    {
        name: "EmitEachSelector"
        id: "MyProtocol.QueryBoolean"
        configuration: {
            selector: "service [trait|smithy.example#xProto] $service(*) ~> member :test(> document)"
            messageTemplate: "xProto found on @{var|service} does not support document types"
        }
    }
]
```

The advantage of a trait is that you get good validation messages for
free, more performant scans of models when multiple constraints are
needed (no need to crawl shapes multiple times), and the validation
is reusable since it can be applied and shared via mixins.

Partially addresses #2111
mtdowling added a commit that referenced this issue Feb 21, 2024
This commit adds a new meta-trait called `constrainShapes` that is
used to constrain the closure of shapes that can be used when a trait
is applied to a shape.

This can be used on a protocol definition trait to disallow the use of
certain kinds of shapes that are incompatible with the protocol
(e.g., some AWS protocols that use XML don't support document types).
This trait is a replacement for the `noInlineDocument` property of
the `protocolDefinition` trait, but generalizes the concept to work
with any kind of shape or trait.

While this could have been acheived with metadata based validators
before, introducing a trait makes it easier and more performant to
constrain shapes. For example:

```
metadata validators = [
    {
        name: "EmitEachSelector"
        id: "MyProtocol.NoDocuments"
        configuration: {
            selector: "service [trait|smithy.example#xProto] $service(*) ~> member :test(> document)"
            messageTemplate: "xProto found on @{var|service} does not support document types"
        }
    }
]
```

Becomes:

```
@trait(selector: "service")
@protocolDefinition
@constrainShapes(
    NoDocuments: {
       selector: "member :test(> document)"
       message: "This protocol does not support document types"
    }
)
structure xProto {}
```

The advantage of a trait is that you get good validation messages for
free, more performant scans of models when multiple constraints are
needed (no need to crawl shapes multiple times), and the validation
is reusable since it can be applied and shared via mixins.

Partially addresses #2111
mtdowling added a commit that referenced this issue Feb 21, 2024
This commit adds a new meta-trait called `constrainShapes` that is
used to constrain the closure of shapes that can be used when a trait
is applied to a shape.

This can be used on a protocol definition trait to disallow the use of
certain kinds of shapes that are incompatible with the protocol
(e.g., some AWS protocols that use XML don't support document types).
This trait is a replacement for the `noInlineDocument` property of
the `protocolDefinition` trait, but generalizes the concept to work
with any kind of shape or trait.

While this could have been acheived with metadata based validators
before, introducing a trait makes it easier and more performant to
constrain shapes. For example:

```
metadata validators = [
    {
        name: "EmitEachSelector"
        id: "MyProtocol.NoDocuments"
        configuration: {
            selector: "service [trait|smithy.example#xProto] $service(*) ~> member :test(> document)"
            messageTemplate: "xProto found on @{var|service} does not support document types"
        }
    }
]
```

Becomes:

```
@trait(selector: "service")
@protocolDefinition
@constrainShapes(
    "MyProtocol.NoDocuments": {
       selector: "member :test(> document)"
       message: "This protocol does not support document types"
    }
)
structure xProto {}
```

The advantage of a trait is that you get good validation messages for
free, more performant scans of models when multiple constraints are
needed (no need to crawl shapes multiple times), and the validation
is reusable since it can be applied and shared via mixins.

To prove out the concept, several MQTT trait validators were
converted from Java code to in-model constraints.

Partially addresses #2111
@mtdowling
Copy link
Member

Thanks for laying out your use case.

I think expanding maps to support integer and long keys isn't too controversial. Do you have a hard requirement for long right now or would just integer suffice?

As a baby step of progress, I was inspired by your fair critique of noInlineDocumentSupport and added the ability to constrain shape closures based on the presence of a trait. This would deprecate noInlineDocumentSupport and be used to automatically validate that a service with a protocol trait adheres to whatever limitations or requirements a protocol might have: #2156.

@Baccata
Copy link
Author

Baccata commented Feb 22, 2024

Do you have a hard requirement for long right now or would just integer suffice?

At this point in time, I'm only interested in integers, but I think allowing for longs too would allow for the smithy metamodel to fully supersede the proto metamodel, which is a nice thing to be able to say (ie anything that can be described in proto could be described in smithy ... except for maps that use booleans as keys but who realistically does that 🤷 ). I'm trusting your judgement.

Added the ability to constrain shape closures based on the presence of a trait.

This is really nice, kudos !

mtdowling added a commit that referenced this issue Feb 23, 2024
This commit adds a new meta-trait called `constrainShapes` that is
used to constrain the closure of shapes that can be used when a trait
is applied to a shape.

This can be used on a protocol definition trait to disallow the use of
certain kinds of shapes that are incompatible with the protocol
(e.g., some AWS protocols that use XML don't support document types).
This trait is a replacement for the `noInlineDocument` property of
the `protocolDefinition` trait, but generalizes the concept to work
with any kind of shape or trait.

While this could have been acheived with metadata based validators
before, introducing a trait makes it easier and more performant to
constrain shapes. For example:

```
metadata validators = [
    {
        name: "EmitEachSelector"
        id: "MyProtocol.NoDocuments"
        configuration: {
            selector: "service [trait|smithy.example#xProto] $service(*) ~> member :test(> document)"
            messageTemplate: "xProto found on @{var|service} does not support document types"
        }
    }
]
```

Becomes:

```
@trait(selector: "service")
@protocolDefinition
@constrainShapes(
    "MyProtocol.NoDocuments": {
       selector: "member :test(> document)"
       message: "This protocol does not support document types"
    }
)
structure xProto {}
```

The advantage of a trait is that you get good validation messages for
free, more performant scans of models when multiple constraints are
needed (no need to crawl shapes multiple times), and the validation
is reusable since it can be applied and shared via mixins.

To prove out the concept, several MQTT trait validators were
converted from Java code to in-model constraints.

Partially addresses #2111
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved.
Projects
None yet
Development

No branches or pull requests

2 participants