[Feature Request] Add an in-between level of full-fat checking #349

JWCS · 2024-03-27T20:54:01Z

Leaving this as an unclosed issue for (far?) future discussion, once full-fat becomes a thing.
The desired default behavior for runtime checking might be more of a 2%, rather than the current skim, or the pytest/QA full-fat.
Specifically, the common cases mentioned below (and maybe a few more).
Since the beartype.claw moved configuration into a once-only init.py, changing the config is trivial, further encouraging (weak opinion here) correctness is more valuable than speed.
(until you know its already correct, but then you wouldn't need beartype at all) (the fastest code is no code, so if you need the code, it better do its job).

On that note, there's 2 scenarios that seem common, that could justify an "in-between" parameterizable level of type checking (when you get around to it): full-fat checking collection of literals and "small flat collection" (for a definition of small; 10? 20?).
(Mostly overlapping, but a dict with literal keys may justify full-fat key-type-checking, but not full-fat value-type-checking)

Originally posted by @JWCS in #347 (comment)

leycec · 2024-03-29T07:22:32Z

Amazingness! Yes, we're absolutely on the same wavelength. It's a wavelength that only genius subscribes to. I am calling you that genius, @JWCS.

I have much to say about this – yet no time with which to say it. Tomorrow, we release @beartype 0.18.0. Okay. Maybe on Monday. Very well. Perhaps by Tuesday. But what I'm saying here is that I'm lazy, tired, and greasy like a hibernating bear.

The short answer is that @beartype's initial implementation of O(n) type-checking will actually incorporate a deadline scheduler dictated by a new hypothetical BeartypeConf.check_time_max_multiplier option, whose docstring currently reads:

Parameters
----------
check_time_max_multiplier : Union[int, None] = 1000
    **Deadline multiplier** (i.e., positive integer instructing
    :mod:`beartype` to prematurely halt the current type-check when the
    total running time of the active Python interpreter exceeds this
    integer multiplied by the running time consumed by both the current
    type-check and all prior type-checks *and* the caller also passed a
    non-default ``strategy``) *or* :data:`None` if :mod:`beartype`
    should never prematurely halt runtime type-checks.

    Increasing this integer increases the number of container items that
    :mod:`beartype` type-checks at a cost of decreasing application
    responsiveness. Likewise, decreasing this integer increases
    application responsiveness at a cost of decreasing the number of
    container items that :mod:`beartype` type-checks.

    Ignored when ``strategy`` is :attr:`BeartypeStrategy.O1`, as that
    strategy is already effectively instantaneous; imposing deadlines
    and thus bureaucratic bookkeeping on that strategy would only
    reduce its efficiency for no good reason, which is a bad reason.

    Defaults to 1000, in which case a maximum of 0.10% of the total
    runtime of the active Python process will be devoted to performing
    non-constant :mod:`beartype` type-checks over container items. This
    default has been carefully tuned to strike a reasonable balance
    between runtime type-check coverage and application responsiveness,
    typically enabling smaller containers to be fully type-checked
    without noticeably impacting codebase performance.

    **Theory time.** Let:

    * :math:`T` be the total time this interpreter has been running.
    * :math:``b` be the total time :mod:`beartype` has spent
      type-checking in this interpreter.

    Clearly, :math:`b <= T`. Generally, :math:`b <<<<<<< T` (i.e.,
    type-checks consume much less time than the total time consumed by
    the process). However, it's all too easy to exhibit worst-case
    behaviour of :math:`b ~= T` (i.e., type-checks consume most of the
    total time). How? By passing the :func:`beartype.door.is_bearable`
    tester an absurdly large nested container subject to the non-default
    ``strategy`` of :attr:`BeartypeStrategy.On`.

    This deadline multiplier mitigates that worst-case behaviour.
    Specifically, :mod:`beartype` will prematurely halt any iterative
    type-check across a container when this constraint is triggered:

    .. code-block:: python

       b * check_time_max_multiplier >= T

Likewise, I have a year-old #FIXME: comment that is an elongated dissertation on the nature of scaleable runtime type-checking as it pertains to everything you just said. It will break your will to live. Therefore, I copy-paste it here:

#FIXME: [O(n)] Ah-ha! We now know how to implement O(log n) and O(n)
#type-checking in a scaleable manner that preserves @beartype's strong
#performance guarantees. How? With a timed deadline cutoff. Specifically:
#* Define a new "BeartypeConf.check_time_max_multiplier" instance variable,
#  which we've already conveniently documented.
#* Note this critical formula in the documentation for that variable (with the
#  equality flipped so as to express the condition under which @beartype
#  continues to perform type-checks):
#      b * check_time_max_multiplier < T
#
#  Naturally, real life isn't quite that convenient. We don't actually have
#  either "b" or "T". We need to dynamically compute them on each check. What we
#  *DO* have is:
#  * "B", the total time consumed by all *PRIOR* @beartype type-checks.
#  * "t", the current time.
#  * "s", the initial time at which the current round of @beartype type-checks
#    was started (e.g., for the current call to a @beartype-decorated callable).
#  * "Z", the initial time at which the active Python interpreter was started.
#
#  For convenience, additionally let:
#  * "K" be "check_time_max_multiplier" for readability.
#
#  Note that "t - s" yields the total time consumed by the *CURRENT* round of
#  @beartype type-checks. Ergo, "B + t - s" yields the total time consumed by
#  *ALL* @beartype type-checks. Given that, "b" and "T" can both then be
#  expressed in terms of these known quantities:
#      b = B + t - s
#      T =     t - Z
#
#  Then the above formula expands to:
#      (B + t - s) * K < t - Z     (4 total arithmetic operations)
#
#  Note that "t" *MUST* be localized as a local instance variable (e.g., as
#  "time_curr = time()) to avoid erroneous recomputation of that time. Also, "K"
#  and "Z" are constants. Ideally, we would be able to simplify away some of
#  this. Superficially, simplification yields *NO* significant joys:
#      Bk + Kt - Ks - t + Z < 0
#      Bk + Z + t(K - 1) < Ks      (6 total arithmetic operations)
#
#  However, "K - 1" is clearly also a constant. Let "L = K - 1". Then:
#      KB + Z + Lt < Ks            (5 total arithmetic operations)
#      KB - Ks + Z + Lt < 0
#      K(B - s) + Lt < -Z          (5 total arithmetic operations)
#
#  Ah-ha! But "-Z" is also a constant. Let "Y = -Z". Then we have:
#      K(B - s) + Lt < Y           (4 total arithmetic operations)
#
#  This is considerably better. Since "t" only appears once, we no longer need
#  to localize the result of calling the time() function and can instead simply
#  embed that call directly -- a significant savings in both time and code
#  complexity.
#
#  Can we do better? Possibly. Note that "s ~= t" for most intents and purposes.
#  In particular, "s - Z ~= t - Z" (with respect to orders of magnitude, which
#  is all the right side of that equation is attempting to capture, anyway).
#  We can then resimplify as follows:
#      (B + t - s) * K < s - Z     (4 total arithmetic operations)
#      KB + Kt - Ks - s < -Z
#      KB + Kt - Ks - s < -Z
#      K(B + t) - s(K + 1) < Y
#      K(B + t) - Ls < Y           (4 total arithmetic operations)
#
#  Amusingly, that fails to help (and in fact reduces accuracy). Gah! The best
#  we can do from here is to eliminate the need for "Y" from above as follows:
#      K(B - s) + Lt < -Z          (5 total arithmetic operations)
#      K(B - s) < -Lt - Z
#      K(B - s) < -(Lt + Z)
#      -K(B - s) > Lt + Z
#      K(s - B) > Lt + Z           (4 total arithmetic operations)
#      K(s - B) - Lt > Z           (4 total arithmetic operations)
#
#  Furthermore, note that:
#  * "s" is internally localized at the start of the body of each
#    @beartype-decorated callable.
#  * "t" is simply a call to the time() function.
#  * "K" and "L = K - 1" don't actually need to be passed anywhere. They're
#    *NOT* variables; they're just hard-coded in at code generation time.
#  * "B and Z", on the other hand, *DO* need to passed to each
#    @beartype-decorated callable under the O(n) strategy.
#  * Moreover, "B" *MUST* be updated by each @beartype-decorated callable under
#    the O(n) strategy.
#
#  Ah-ha! With respect to each call to a @beartype-decorated callable under the
#  O(n) strategy, the quantity "K(s - B)" is actually a constant. Since "Z" is
#  also a constant, we simply localize these two constants at the start of the
#  body of each such callable into a new constant "J = K(s - B) - Z". This then
#  yields:
#      K(s - B) - Lt > Z           (4 total arithmetic operations)
#      K(s - B) - Z > Lt
#      J > Lt
#      Lt < J                      (1 total arithmetic operation)
#
#  And... that's all she wrote, folks. While that condition holds, @beartype
#  continues performing O(n) type-checks. It should be noted that this
#  simplified conditional trivially follows from only a few steps of the
#  original conditional like so:
#      (B + t - s) * K < t - Z
#      KB + Kt - Ks < t - Z
#      Kt - t < Ks - KB - Z
#      t(K - 1) < K(s - B) - Z
#* Altogether, the above implies that each @beartype-decorated callable under
#  the O(n) strategy should be passed a new hidden "__beartype_times" parameter.
#  This parameter is a 2-list "(process_time_start, beartype_time_total)",
#  where:
#  * "process_time_start" is simply "Z".
#  * "beartype_time_total" is simply "B".
#
#  This 2-list is a global list exposed to each @beartype-decorated callable via
#  this "__beartype_times" parameter. Declare this global list in a separate
#  submodule -- say, a new "exprtime" submodule -- as follows:
#      from time import monotonic
#
#      CHECK_TIMES = [monotonic(), 0.0]
#      '''
#      **Type-checking wall-clock time accumulator** (i.e., list whose items
#      allow :mod:`beartype`-generated type-checkers under non-constant
#      strategies like :attr:`beartype.BeartypeStrategy.On` to record how much
#      time the active Python process has devoted to :mod:`beartype`, enabling
#      :mod:`beartype` to prematurely halt type-checking when those
#      type-checkers exceed scheduled deadlines).
#
#      Specifically, this global is a 2-list
#      ``(process_time_start, beartype_time_total)``, where:
#
#      * ``process_time_start`` is the initial time at which the active Python
#        process was started, denominated in fractional seconds.
#      * ``beartype_time_total`` is the total time consumed by all *prior*
#        :mod:`beartype`-generated type-checks, denominated in fractional
#        seconds.
#      '''
#* Now, note the naive implementation of an O(n) type-check for a callable
#  accepting a parameter annotated by the type hint "list[str]":
#      @beartype(conf=BeartypeConf(strategy=BeartypeStrategy.On))
#      def muh_callable(muh_param: list[str]) -> None:
#          ...
#
#          if not (
#              isinstance(muh_param, list) and
#              all(isinstance(muh_item, str) for muh_item in list)
#          ):
#              raise get_func_pith_violation(...)
#          ...
#
#  Trivial. Now, note the non-naive implementation of the same O(n) type-check
#  respecting the "check_time_max_multiplier" configuration setting:
#      from beartype._check._expr.exprtime import CHECK_TIMES
#      from time import monotonic
#
#      @beartype(conf=BeartypeConf(strategy=BeartypeStrategy.On))
#      def muh_callable(
#          muh_param: list[str],
#          __beartype_check_times = CHECK_TIMES,
#          __beartype_get_time_monotonic = monotonic,
#      ) -> None:
#          CHECK_TIME_START = __beartype_get_time_monotonic()
#
#          # Constant "J" in the inequality "Lt < J" governing @beartype's
#          # deadline scheduler for non-constant type-checking, denominated in
#          # fractional seconds.
#          CHECK_TIME_MAX = (
#              {check_time_max_multiplier}(
#                  CHECK_TIME_START - __beartype_check_times[1]
#              ) - __beartype_check_times[0]
#          )
#
#          ...
#
#          if not (
#              isinstance(muh_param, list) and
#              # For each item of this iterable...
#              all(
#                  (
#                      # If @beartype has yet to exceed its scheduled deadline
#                      # for non-constant type-checks *AND*...
#                      (
#                          {check_time_max_multiplier - 1} *
#                          __beartype_get_time_monotonic() < CHECK_TIME_MAX
#                      ) and
#                      isinstance(muh_item, str)
#                  )
#                  for muh_item in list
#              )
#          # *AND* @beartype has still yet to exceed its scheduled deadline for
#          # non-constant type-checks...
#          ) and (
#              {check_time_max_multiplier - 1} *
#              __beartype_get_time_monotonic() < CHECK_TIME_MAX
#          ):
#              raise get_func_pith_violation(...)
#          # Else, this pith either satisfies this hint *OR* @beartype has
#          # exceeded its scheduled deadline for non-constant type-checks.
#
#          # Update the total time consumed by @beartype type-checks.
#          __beartype_check_times[1] += (
#              __beartype_get_time_monotonic() - CHECK_TIME_START)
#          ...
#
#Seriously trivial, everybody. \o/

That Wasn't Short at All! You Lied! You Lied Again!

...heh. I did, didn't I? Feels good. I gotta admit.

The tl;dr of the juggernaut of unreadable plaintext above is that our deadline scheduler will go a long way towards satisfying your gluttonous appetite for scalability. Sure, it's not quite what you're asking for here – but it's not too far off, either.

Fine-tuning the BeartypeConf.check_time_max_multiplier option fine-tunes the total amount of wall-clock time that @beartype devotes to O(n) type-checking. At the lower end, setting check_time_max_multiplier to something like 10 would effectively inhibit O(n) type-checking to just the first several items of each container. At the upper end, setting check_time_max_multiplier to something like 20000 would permissively allow O(n) type-checking to consume at most 2% of wall-clock time – exactly as requested above.

Viva la BeartypeStrategy.On! Viva la @leycec actually doing something next year! 🥳

leycec mentioned this issue Apr 6, 2024

[Feature Request] Deep collections.abc.Iterable type-checking 😬 #344

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add an in-between level of full-fat checking #349

[Feature Request] Add an in-between level of full-fat checking #349

JWCS commented Mar 27, 2024 •

edited

leycec commented Mar 29, 2024 •

edited

[Feature Request] Add an in-between level of full-fat checking #349

[Feature Request] Add an in-between level of full-fat checking #349

Comments

JWCS commented Mar 27, 2024 • edited

leycec commented Mar 29, 2024 • edited

That Wasn't Short at All! You Lied! You Lied Again!

JWCS commented Mar 27, 2024 •

edited

leycec commented Mar 29, 2024 •

edited