diff --git a/design/src/rfcs/overview.md b/design/src/rfcs/overview.md index b4224e8132..7cb76e7401 100644 --- a/design/src/rfcs/overview.md +++ b/design/src/rfcs/overview.md @@ -34,3 +34,4 @@ - [RFC-0022: Error Context and Compatibility](./rfc0022_error_context_and_compatibility.md) - [RFC-0023: Evolving the new service builder API](./rfc0023_refine_builder.md) - [RFC-0024: RequestID](./rfc0024_request_id.md) +- [RFC-0025: Constraint traits](./rfc0025_constraint_traits.md) diff --git a/design/src/rfcs/rfc0025_constraint_traits.md b/design/src/rfcs/rfc0025_constraint_traits.md new file mode 100644 index 0000000000..a60b24e574 --- /dev/null +++ b/design/src/rfcs/rfc0025_constraint_traits.md @@ -0,0 +1,397 @@ +RFC: Constraint traits +====================== + +> Status: Implemented. +> +> See the description of [the PR][builders-of-builders] that laid the +> foundation for the implementation of constraint traits for a complete +> reference. See the [Better Constraint Violations] RFC too for subsequent +> improvements to this design. +> +> See the uber [tracking issue] for pending work. + +[builders-of-builders]: https://github.com/awslabs/smithy-rs/pull/1342 +[tracking issue]: https://github.com/awslabs/smithy-rs/issues/1401 + +[Constraint traits] are used to constrain the values that can be provided for a +shape. + +For example, given the following Smithy model, + +```smithy +@length(min: 18) +Integer Age +``` + +the integer `Age` must take values greater than or equal to 18. + +Constraint traits are most useful when enforced as part of _input_ model +validation to a service. When a server receives a request whose contents +deserialize to input data that violates the modeled constraints, the operation +execution's preconditions are not met, and as such rejecting the request +without executing the operation is expected behavior. + +Constraint traits can also be applied to operation output member shapes, but +the expectation is that service implementations _not fail_ to render a response +when an output value does not meet the specified constraints. From +[awslabs/smithy#1039]: + +> This might seem counterintuitive, but our philosophy is that a change in +> server-side state should not be hidden from the caller unless absolutely +> necessary. Refusing to service an invalid request should always prevent +> server-side state changes, but refusing to send a response will not, as +> there's generally no reasonable route for a server implementation to unwind +> state changes due to a response serialization failure. + +_In general_, clients should not enforce constraint traits in generated code. +Clients must also never enforce constraint traits when sending requests. This +is because: + +* addition and removal of constraint traits are backwards-compatible from a + client's perspective (although this is _not_ documented anywhere in the + Smithy specification), +* the client may have been generated with an older version of the model; and +* the most recent model version might have lifted some constraints. + +On the other hand, server SDKs constitute _the_ source of truth for the +service's behavior, so they interpret the model in all its strictness. + +The Smithy spec defines 8 constraint traits: + +* [`enum`], +* [`idRef`], +* [`length`], +* [`pattern`], +* [`private`], +* [`range`], +* [`required`]; and +* [`uniqueItems`]. + +The `idRef` and `private` traits are enforced at SDK generation time by the +[`awslabs/smithy`] libraries and bear no relation to generated Rust code. + +The only constraint trait enforcement that is generated by smithy-rs clients +should be and is the `enum` trait, which renders Rust `enum`s. + +The `required` trait is already and only enforced by smithy-rs servers since +[#1148](https://github.com/awslabs/smithy-rs/pull/1148). + +That leaves 4 traits: `length`, `pattern`, `range`, and `uniqueItems`. + +[Constraint traits]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html +[awslabs/smithy#1039]: https://github.com/awslabs/smithy/issues/1039 +[`enum`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#enum-trait +[`idRef`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#idref-trait +[`length`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#length-trait +[`pattern`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#pattern-trait +[`private`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#private-trait +[`range`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#range-trait +[`required`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#required-trait +[`uniqueItems`]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#uniqueItems-trait +[`awslabs/smithy`]: https://github.com/awslabs/smithy + +Implementation +-------------- + +This section addresses how to implement and enforce the `length`, `pattern`, +`range`, and `uniqueItems` traits. We will use the `length` trait applied to a +`string` shape as a running example. The implementation of this trait mostly +carries over to the other three. + +### Example implementation for the `length` trait + +Consider the following Smithy model: + +```smithy +@length(min: 1, max: 69) +string NiceString +``` + +The central idea to the implementation of constraint traits is: [parse, don't +validate]. Instead of code-generating a Rust `String` to represent `NiceString` +values and perform the _validation_ at request deserialization, we can +[leverage Rust's type system to guarantee domain invariants]. We can generate a +wrapper [tuple struct] that _parses_ the string's value and is "tight" in the +set of values it can accept: + +```rust +pub struct NiceString(String); + +impl TryFrom for NiceString { + type Error = nice_string::ConstraintViolation; + + fn try_from(value: String) -> Result { + let num_code_points = value.chars().count(); + if 1 <= num_code_points && num_code_points <= 69 { + Ok(Self(value)) + } else { + Err(nice_string::ConstraintViolation::Length(num_code_points)) + } + } +} +``` + +(Note that we're using the _linear_ time check `chars().count()` instead of +`len()` on the input value, since the Smithy specification says the `length` +trait counts [the number of _Unicode code points_ when applied to string +shapes].) + +The goal is to enforce, at the type-system level, that these constrained +structs always hold valid data. It should be impossible for the service +implementer, without resorting to `unsafe` Rust, to construct a `NiceString` +that violates the model. The actual check is performed in the implementation of +[`TryFrom`]`` for the generated struct, which makes it convenient to use +the [`?` operator for error propagation]. Each constrained struct will have a +related `std::error::Error` enum type to signal the _first_ parsing failure, +with one enum variant per applied constraint trait: + +```rust +pub mod nice_string { + pub enum ConstraintViolation { + /// Validation error holding the number of Unicode code points found, when a value between `1` and + /// `69` (inclusive) was expected. + Length(usize), + } + + impl std::error::Error for ConstraintViolation {} +} +``` + +[`std::error::Error`] requires [`Display`] and [`Debug`]. We will +`#[derive(Debug)]`, unless the shape also has the [`sensitive` trait], in which +case we will just print the name of the struct: + +```rust +impl std::fmt::Debug for ConstraintViolation { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + let mut formatter = f.debug_struct("ConstraintViolation"); + formatter.finish() + } +} +``` + +`Display` is used to produce human-friendlier representations. Its +implementation might be called when formatting a [400 HTTP response message] in +certain protocols, for example. + +[parse, don't validate]: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/ +[leverage Rust's type system to guarantee domain invariants]: https://www.lpalmieri.com/posts/2020-12-11-zero-to-production-6-domain-modelling/ +[tuple struct]: https://doc.rust-lang.org/1.9.0/book/structs.html#tuple-structs +[the number of _Unicode code points_ when applied to string shapes]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#length-trait +[`?` operator for error propagation]: https://doc.rust-lang.org/book/ch09-02-recoverable-errors-with-result.html#a-shortcut-for-propagating-errors-the--operator +[`std::error::Error`]: https://doc.rust-lang.org/nightly/std/error/trait.Error.html +[`sensitive` trait]: https://awslabs.github.io/smithy/1.0/spec/core/documentation-traits.html#sensitive-trait +[400 HTTP response message]: https://developer.mozilla.org/en-US/docs/web/http/status/400 +[`TryFrom`]: https://doc.rust-lang.org/std/convert/trait.TryFrom.html +[`Display`]: https://doc.rust-lang.org/std/fmt/trait.Display.html +[`Debug`]: https://doc.rust-lang.org/std/fmt/trait.Debug.html + +### Request deserialization + +We will continue to deserialize the different parts of the HTTP message into +the regular Rust standard library types. However, just before the +deserialization function returns, we will convert the type into the wrapper +tuple struct that will eventually be handed over to the operation handler. This +is what we're _already_ doing when deserializing strings into `enums`. For +example, given the Smithy model: + +```smithy +@enum([ + { name: "Spanish", value: "es" }, + { name: "English", value: "en" }, + { name: "Japanese", value: "jp" }, +]) +string Language +``` + +the code the client generates when deserializing a string from a JSON document +into the `Language` enum is (excerpt): + +```rust +... +match key.to_unescaped()?.as_ref() { + "language" => { + builder = builder.set_language( + aws_smithy_json::deserialize::token::expect_string_or_null( + tokens.next(), + )? + .map(|s| { + s.to_unescaped() + .map(|u| crate::model::Language::from(u.as_ref())) + }) + .transpose()?, + ); + } + _ => aws_smithy_json::deserialize::token::skip_value(tokens)?, +} +... +``` + +Note how the `String` gets converted to the enum via `Language::from()`. + +```rust +impl std::convert::From<&str> for Language { + fn from(s: &str) -> Self { + match s { + "es" => Language::Spanish, + "en" => Language::English, + "jp" => Language::Japanese, + other => Language::Unknown(other.to_owned()), + } + } +} +``` + +For constrained shapes we would do the same to parse the inner deserialized +value into the wrapper tuple struct, except for these differences: + +1. For enums, the client generates an `Unknown` variant that "contains new + variants that have been added since this code was generated". The server + does not need such a variant ([#1187]). +2. Conversions into the tuple struct are fallible (`try_from()` instead of + `from()`). These errors will result in a `my_struct::ConstraintViolation`. + +`length` trait +-------------- + +We will enforce the length constraint by calling `len()` on Rust's `Vec` +(`list` and `set` shapes), `HashMap` (`map` shapes) and our +[`aws_smithy_types::Blob`] (`bytes` shapes). + +We will enforce the length constraint trait on `String` (`string` shapes) by +calling `.chars().count()`. + +`pattern` trait +--------------- + +The [`pattern` trait] + +> restricts string shape values to a specified regular expression. + +We will implement this by using the `regex`'s crate [`is_match`]. We will use +[`once_cell`] to compile the regex only the first time it is required. + +`uniqueItems` trait +------------------- + +The [`uniqueItems` trait] + +> indicates that the items in a [`List`] MUST be unique. + +If the list shape is `sparse`, more than one `null` value violates this +constraint. + +We will enforce this by copying _references_ to the `Vec`'s elements into a +`HashSet` and checking that the sizes of both containers coincide. + +[`Eq`]: https://doc.rust-lang.org/std/cmp/trait.Eq.html +[even if its members are not equipped with an equivalence relation]: https://github.com/awslabs/smithy/issues/1093 +[`List`]: https://awslabs.github.io/smithy/1.0/spec/core/model.html#list +[`uniqueItems` trait]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#uniqueitems-trait +[`is_match`]: https://docs.rs/regex/latest/regex/struct.Regex.html#method.is_match +[`pattern` trait]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html#pattern-trait +[`aws_smithy_types::Blob`]: https://docs.rs/aws-smithy-types/latest/aws_smithy_types/struct.Blob.html +[`aws_smithy_http_server::rejection::SmithyRejection`]: https://docs.rs/aws-smithy-http-server/latest/aws_smithy_http_server/rejection/enum.SmithyRejection.html +[`aws_smithy_http_server::rejection::Deserialize`]: https://docs.rs/aws-smithy-http-server/latest/aws_smithy_http_server/rejection/struct.Deserialize.html +[#1187]: https://github.com/awslabs/smithy-rs/issues/1187 +[`once_cell`]: https://docs.rs/once_cell/latest/once_cell/ + +Trait precedence and naming of the tuple struct +----------------------------------------------- + +From [the spec]: + +> Some constraints can be applied to shapes as well as structure members. If a +> constraint of the same type is applied to a structure member and the shape +> that the member targets, the trait applied to the member takes precedence. + +```smithy +structure ShoppingCart { + @range(min: 7, max:12) + numberOfItems: PositiveInteger +} + +@range(min: 1) +integer PositiveInteger +``` + +In the above example, + +> the range trait applied to numberOfItems takes precedence over the one +> applied to PositiveInteger. The resolved minimum will be 7, and the maximum +> 12. + +When the constraint trait is applied to a member shape, the tuple struct's name +will be the PascalCased name of the member shape, `NumberOfItems`. + +[the spec]: https://awslabs.github.io/smithy/1.0/spec/core/constraint-traits.html?highlight=enum#trait-precedence + +Unresolved questions +-------------------- + +1. Should we code-generate unsigned integer types (`u16`, `u32`, `u64`) when + the `range` trait is applied with `min` set to a value greater than or equal + to 0? + - A user has even suggested to use the `std::num::NonZeroUX` types (e.g. + [`NonZeroU64`]) when `range` is applied with `min` set to a value greater + than 0. + - UPDATE: This requires further design work. There are interoperability + concerns: for example, the positive range of a + `u32` is strictly greater than that of an `i32`, so clients wouldn't be + able to receive values within the non-overlapping range. +2. In request deserialization, should we fail with the first violation and + immediately render a response, or attempt to parse the entire request and + provide a complete and structured report? + - UPDATE: We will provide a response containing all violations. See the + "Collecting Constraint Violations" section in the [Better Constraint + Violations] RFC. +3. Should we provide a mechanism for the service implementer to construct a + Rust type violating the modeled constraints in their business logic e.g. a + `T::new_unchecked()` constructor? This could be useful (1) when the user + _knows_ the provided inner value does not violate the constraints and + doesn't want to incur the performance penalty of the check; (2) when the + struct is in a transient invalid state. However: + - (2) is arguably a modelling mistake and a separate struct to represent + the transient state would be a better approach, + - the user could use `unsafe` Rust to bypass the validation; and + - adding this constructor is a backwards-compatible change, so it can + always be added later if this feature is requested. + - UPDATE: We decided to punt on this until users express interest. + +[`NonZeroU64`]: https://doc.rust-lang.org/std/num/struct.NonZeroU64.html +[Better Constraint Violations]: https://github.com/awslabs/smithy-rs/pull/2040 + +Alternative design +------------------ + +An alternative design with less public API surface would be to perform +constraint validation at request deserialization, but hand over a regular +"loose" type (e.g. `String` instead of `NiceString`) that allows for values +violating the constraints. If we were to implement this approach, we can +implement it by wrapping the incoming value in the aforementioned tuple struct +to perform the validation, and immediately unwrap it. + +Comparative advantages: + +* Validation remains an internal detail of the framework. If the semantics of a + constraint trait change, the behavior of the service is still + backwards-incompatibly affected, but user code is not. +* Less "invasive". Baking validation in the generated type might be deemed as + the service framework overreaching responsibilities. + +Comparative disadvantages: + +* It becomes _possible_ to send responses with invalid operation outputs. All + the service framework could do is log the validation errors. +* Baking validation at the type-system level gets rid of an entire class of + logic errors. +* Less idiomatic (this is subjective). The pattern of wrapping a more primitive + type to guarantee domain invariants is widespread in the Rust ecosystem. The + standard library makes use of it extensively. + +Note that _both_ designs are backwards incompatible in the sense that you can't +migrate from one to the other without breaking user code. + +UPDATE: We ended up implementing both designs, adding a flag to opt into the +alternative design. Refer to the mentions of the `publicConstrainedTypes` flag +in the description of the [Builders of builders PR][builders-of-builders].