Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Python native dataclass APIs in Event Source Data Classes utility #1177

Closed
LouisAmon opened this issue Jun 14, 2021 · 19 comments
Closed

Comments

@LouisAmon
Copy link

What were you initially searching for in the docs?

I'm using the @event_source decorator and I was looking for a way to print my event in my Cloudwatch logs.

When I simply print(event), I just get a class instance, but I want to be able to see the actual keys and values :)

Being used to working with Python dataclasses, I naturally tried asdict:

print(event.asdict())

Or even...

import dataclasses

dataclasses.asdict(event)

...and I was very surprised by the result:

asdict() should be called on dataclass instances

Is this related to an existing part of the documentation? Please share a link

https://awslabs.github.io/aws-lambda-powertools-python/latest/utilities/data_classes/

Describe how we could make it clearer

Dataclasses are meant to be a Pythonic way to define a data model.

If you define your own class (DictWrapper in your case) which inherits from nothing then all the APIs and principles behind dataclasses are lost.

Dataclasses are used in libraries such as sqlalchemy or strawberry to define a unified way to communicate with a database or build a GraphQL API... dataclasses provides unity between all these, so why not AWS Lambda ? :)

If you have a proposed update, please share it here

I think we should either revamp the code to actually define proper data classes, or stop calling it as such : it's just confusing.

What do you think ?

@boring-cyborg
Copy link

boring-cyborg bot commented Jun 14, 2021

Thanks for opening your first issue here! We'll come back to you as soon as we can.

@heitorlessa
Copy link
Contributor

Hey @LouisAmon thanks for raising this. We have created awslabs/aws-lambda-powertools-python#346 to rewrite the entire documentation on dataclass utility - naming is hard. This next release is 80% dedicated to that doc.

This Friday, I'll start revamping that doc and lay down ambitions for v2 - that includes changing dataclasses to more prescriptive so we don't break anyone in v1.

If you have ideas for naming I'd love to hear them so we can improve everyone's experience. If we could have a better name by the time the new docs are completely rewritten, then I could easily create code aliases and redirect docs altogether

Thank you!

@LouisAmon
Copy link
Author

Well to be honest I think dataclasses are indeed a best practice in Python, especially when it comes to data modeling and typing

May I ask why the project didn't go into actually using the @dataclass decorator ?

I guess now that a huge amount of code is in place, you can't just add @dataclass on top of DictWrapper... but still, have you tried ?

I know some projects have encountered limitations with the way Python's standard dataclasses are meant to work. For example Pydantic ended up defining their BaseModel and their own @dataclass decorator, but overall they tried to stay consistent with the Python API

Maybe the right way for AWS Lambda Powertools would be to piggyback on Pydantic when it comes to validation (using their @dataclass), or using the regular Python decorator for "normal" data modelling

Another solution would be to define your own @dataclass, but then also make sure that the regular APIs (such as asdict()) are available

TBH I think Pydantic adds a great deal of value to Lambda payload processing, although it is a bit heavy as a dependency :)

@michaelbrewer
Copy link
Contributor

@LouisAmon i added this original as a helper for the event classes, and over time it started to include more features like handling decoding and then decorators for specific use cases.

The objective was to work with Python 3.6 and not have any additional dependencies. So @dataclasses is not available in Python 3.6 without a backport. I am sure once we deprecate Python 3.6 support, we can strip down the code alot.

So we could create a lightweight version of @dataclass with asdict(), for now there is raw_event that you can use if you want the dict.

@michaelbrewer
Copy link
Contributor

We (@heitorlessa and @cakepietoast) did discuss the naming of this in the original PR (#159 (comment)). I am not sure any name would have worked :). But we tried our best.

Maybe a warning note or something docs could help reduce any potential confusion, and then adding asdict() method, could be another alias of raw_event property.

Otherwise i am pretty proud of how many people this has helped. The next features that are related to this is some common Exception types needed for API Gateway Proxy events, and then better integration with the Idempotent decorator.

@LouisAmon
Copy link
Author

We (@heitorlessa and @cakepietoast) did discuss the naming of this in the original PR (#159 (comment)). I am not sure any name would have worked :). But we tried our best.

Don't get me wrong: you guys did an outstanding job overall, thanks for that !

Maybe a warning note or something docs could help reduce any potential confusion, and then adding asdict() method, could be another alias of raw_event property.

Maybe, but it feels to me like a half-measure : either you're working with a dataclass with the full API or you're simply working on a data model (which is very fine, many SQLA / Django users are familiar with that term and what it means)

For example instead of:

from aws_lambda_powertools.utilities.data_classes import event_source, APIGatewayProxyEventV2

@event_source(data_class=APIGatewayProxyEventV2)
def lambda_handler(event: APIGatewayProxyEventV2, context):
    if "helloworld" in event.path and event.http_method == "POST":
        do_something_with(event.json_body, event.query_string_parameters)

I'd rather see:

from aws_lambda_powertools.utilities.models import event_source, APIGatewayProxyEventV2

@event_source(model=APIGatewayProxyEventV2)
def lambda_handler(event: APIGatewayProxyEventV2, context):
    if "helloworld" in event.path and event.http_method == "POST":
        do_something_with(event.json_body, event.query_string_parameters)

That's my opinion anyway :)

@LouisAmon
Copy link
Author

Overall, if your concern is to stick with concepts that exist in Python 3.6 then maybe it's best to avoid the dataclasses terminology altogether no ?

My personal suggestion would be to be technically compatible with Python 3.6, yet bring them one step into the future (so to speak) by using a dataclasses backport

With the backport in place, you can also use the dataclasses in Pydantic, and stick to an all-dataclass terminology and API throughout the project

It would make even more sense to make that change regarding Envelopes and the parser feature : the base model could be defined as a dataclass and the envelope would be a dataclass as well !

Having a base model defined as a data class allows you to do cool stuff like generating fake data on the fly or defining a default_factory, in a way that's consistent with other open source libraries

My two cent on the matter 😄

@michaelbrewer
Copy link
Contributor

We (@heitorlessa and @cakepietoast) did discuss the naming of this in the original PR (#159 (comment)). I am not sure any name would have worked :). But we tried our best.

Don't get me wrong: you guys did an outstanding job overall, thanks for that !

Maybe a warning note or something docs could help reduce any potential confusion, and then adding asdict() method, could be another alias of raw_event property.

Maybe, but it feels to me like a half-measure : either you're working with a dataclass with the full API or you're simply working on a data model (which is very fine, many SQLA / Django users are familiar with that term and what it means)

For example instead of:

from aws_lambda_powertools.utilities.data_classes import event_source, APIGatewayProxyEventV2

@event_source(data_class=APIGatewayProxyEventV2)
def lambda_handler(event: APIGatewayProxyEventV2, context):
    if "helloworld" in event.path and event.http_method == "POST":
        do_something_with(event.json_body, event.query_string_parameters)

I'd rather see:

from aws_lambda_powertools.utilities.models import event_source, APIGatewayProxyEventV2

@event_source(model=APIGatewayProxyEventV2)
def lambda_handler(event: APIGatewayProxyEventV2, context):
    if "helloworld" in event.path and event.http_method == "POST":
        do_something_with(event.json_body, event.query_string_parameters)

That's my opinion anyway :)

Which part is different? If we make changes now it would have breaking changes for existing people.

@michaelbrewer
Copy link
Contributor

dataclasses

Yep, it was originally event sources, and it could be models, but it is data_classes so not exactly dataclasses ;-)

It is quick too refactor, but it would involve breaking changes for all the library users.

So maybe for V2 when Python 3.6 is deprecated, and whatever this is call moves up a level.

So note that this was my first python contribution coming from a Kotlin/Java background, so forgive the lack of pythonic terms used.

note, as a contributor and user of the library, I do like that this feature does not introduce any additional dependencies or slows down the lambda cold starts.

@michaelbrewer
Copy link
Contributor

@heitorlessa - I would propose we switch back to what I had originally suggested as event_sources and frame it as classes, utilities and helpers for the AWS standard events. And we can target this for V2 (and when Python 3.6 lambdas are deprecated), and then we can potentially minimize the amount of code.

@LouisAmon
Copy link
Author

LouisAmon commented Jun 18, 2021

I agree with you both that making a breaking change just for syntactic changes is not worth it.

I would also propose to make the big change in v2, but at that point : why not consider actually using Python dataclasses ? I mean, since 3.6 is gonna be dropped and all...

By the way, you can even keep the 3.6 support with the backport... and keep the old syntax with a DeprecationWarning for a few versions... :)

Shall we close the ticket ? I realise I should've opened a discussion thread instead 🙏

@michaelbrewer
Copy link
Contributor

@LouisAmon this library is supposed to run on Python 3.6 and higher, but you are welcome to make a PR with your proposed changes

@heitorlessa
Copy link
Contributor

@LouisAmon - Thank you for your strong opinions and suggestions, we appreciate your enthusiasm to help us make everyone's experience better.

I'd like to clarify a few things as I turn this issue into a feature request and move forward with other higher priorities tasks.

Dataclasses not being added in day one. Back then, VSCode and PyCharm didn't have intellisense support for them resulting in either false positives or no autocompletion at all. This led us to consider two routes: a) attrs as that's been the de-facto for class boilerplate and handy data methods since 2015 including support for 3.6+ with all dataclass features with a more rapid release cycle, or b) dataclasses backport for 3.6 only, though we had tiny concerns like it was always installing for 3.6+ and not 3.6 only, and whether additional features and types would've been backported to 3.6 on a release cycle.

attrs seemed the perfect candidate considering release cycle but we hit the same problem with autocompletion, and VSCode Pylance was not GA at the time either. We decided not to risk it, keep the original logic @michaelbrewer did using @property as that worked for all IDEs on autocompletion with a helpful text to fields, and deal with the naming per se later as we'd figure a better name given the handy methods it has.

Using pydantic instead. As you mentioned, Pydantic adds at minimum ~35M using the latest CFlag optimizations with pip. When using SAM CLI however, it PythonPipBuilder doesn't support these new optimizations leading to an additional 91M vs 35M, as it pulls wheels and binaries for Linux and doesn't support the env vars in pip discussed in the thread. For this reason, we decided not to use pydantic however easier it'd have been to maintained it'd directly incur additional charges for customers and not aligning with our tenets.

This in itself brings us to the major decision factor in v2 on whether to use implicit namespace packages (PEP 420) so you could install each utility separately along with their own dependencies, or make all dependencies optional since this whole package is ~1.2M. The latter is another discussion I need to write into a new issue to hear from the community before we cut v2, since Boto alone went from 35M to 64M in an year (~10M increase in the past 2 months alone).

Moving forward from here

I'm turning this issue into a feature request and adjusting the title so the language is more accommodating to what this community represents. These are the changes we'd be comfortable to make or welcome PRs at this stage to not having to wait until v2 as there are bigger priorities like the boto package size above:

  • Add asdict() method in DictWrapper to make it more intuitive
  • Add a note in the documentation to note this does not follow the dataclasses API but asdict() for convenience
  • As part of the documentation revamp prioritised for this release, rename data classes to Event Sources
  • Create an alias package event_sources and reflect our documentation to point to that instead of data classes as they've outgrown their initial purpose. This will keep both running to not break existing customers but only one emphasized in the documentation w/ a note that data classes will be deprecated in v2 along with a place for them to track.

I hope this helps and I'll concentrate on recreating the documentation for this utility as initially planned to start today.

Thank you all

@heitorlessa heitorlessa changed the title Dataclasses are not... actual dataclasses (?!) Support Python native dataclass APIs in Event Source Data Classes utility Jun 18, 2021
@heitorlessa heitorlessa self-assigned this Jun 18, 2021
@michaelbrewer
Copy link
Contributor

Good suggestions @heitorlessa.

Just as a note, raw_event is just returning the wrapped dictionary object (self._data), so when changing this to asdict() method people may think we are generating a new dictionary object? Also having both could be confusing too.

Another note about the “event_sources” implementation, if it was implemented as just a wrapper to the original event dict, making it as light weight as possible. So a very large dict that only needs to have a single key look up, does not need to perform a full parse, i would like to keep this behavior.

mergify bot referenced this issue Oct 5, 2021
Bumps [pytest-cov](https://github.com/pytest-dev/pytest-cov) from 2.12.1 to 3.0.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst">pytest-cov's changelog</a>.</em></p>
<blockquote>
<h2>3.0.0 (2021-10-04)</h2>
<p><strong>Note that this release drops support for Python 2.7 and Python 3.5.</strong></p>
<ul>
<li>Added support for Python 3.10 and updated various test dependencies.
Contributed by Hugo van Kemenade in
<code>[#500](pytest-dev/pytest-cov#500) &lt;https://github.com/pytest-dev/pytest-cov/pull/500&gt;</code>_.</li>
<li>Switched from Travis CI to GitHub Actions. Contributed by Hugo van Kemenade in
<code>[#494](pytest-dev/pytest-cov#494) &lt;https://github.com/pytest-dev/pytest-cov/pull/494&gt;</code>_ and
<code>[#495](pytest-dev/pytest-cov#495) &lt;https://github.com/pytest-dev/pytest-cov/pull/495&gt;</code>_.</li>
<li>Add a <code>--cov-reset</code> CLI option.
Contributed by Danilo Šegan in
<code>[#459](pytest-dev/pytest-cov#459) &lt;https://github.com/pytest-dev/pytest-cov/pull/459&gt;</code>_.</li>
<li>Improved validation of <code>--cov-fail-under</code> CLI option.
Contributed by ... Ronny Pfannschmidt's desire for skark in
<code>[#480](pytest-dev/pytest-cov#480) &lt;https://github.com/pytest-dev/pytest-cov/pull/480&gt;</code>_.</li>
<li>Dropped Python 2.7 support.
Contributed by Thomas Grainger in
<code>[#488](pytest-dev/pytest-cov#488) &lt;https://github.com/pytest-dev/pytest-cov/pull/488&gt;</code>_.</li>
<li>Updated trove classifiers. Contributed by Michał Bielawski in
<code>[#481](pytest-dev/pytest-cov#481) &lt;https://github.com/pytest-dev/pytest-cov/pull/481&gt;</code>_.</li>
</ul>
<h2>2.13.0 (2021-06-01)</h2>
<ul>
<li>Changed the <code>toml</code> requirement to be always be directly required (instead of being required through a coverage extra).
This fixes issues with pip-compile (<code>pip-tools#1300 &lt;https://github.com/jazzband/pip-tools/issues/1300&gt;</code><em>).
Contributed by Sorin Sbarnea in <code>[#472](pytest-dev/pytest-cov#472) &lt;https://github.com/pytest-dev/pytest-cov/pull/472&gt;</code></em>.</li>
<li>Documented <code>show_contexts</code>.
Contributed by Brian Rutledge in <code>[#473](pytest-dev/pytest-cov#473) &lt;https://github.com/pytest-dev/pytest-cov/pull/473&gt;</code>_.</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="https://github.com/pytest-dev/pytest-cov/commit/560b95575efe9e55874bc8bbc99de1dd2db80ba9"><code>560b955</code></a> Bump version: 2.12.1 → 3.0.0</li>
<li><a href="https://github.com/pytest-dev/pytest-cov/commit/e988a6c48b45924433c9b38886f759f4f3be8a94"><code>e988a6c</code></a> Update changelog.</li>
<li><a href="https://github.com/pytest-dev/pytest-cov/commit/f01593218d2cc99defa202a8e7ff83e3a08a7a73"><code>f015932</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/pytest-dev/pytest-cov/issues/500">#500</a> from hugovk/add-3.10</li>
<li><a href="https://github.com/pytest-dev/pytest-cov/commit/60a3cc1e796428f2373fb2024122f8dbc7f1c56b"><code>60a3cc1</code></a> No need to build universal wheels for Python 3-only</li>
<li><a href="https://github.com/pytest-dev/pytest-cov/commit/0bc997adfea8b6ab63029163044a2fc42fd7ecf1"><code>0bc997a</code></a> Add support for Python 3.10</li>
<li><a href="https://github.com/pytest-dev/pytest-cov/commit/679935b82e8177799c34981e8f384a5466840301"><code>679935b</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/pytest-dev/pytest-cov/issues/494">#494</a> from hugovk/test-on-github-actions</li>
<li><a href="https://github.com/pytest-dev/pytest-cov/commit/96f9aad28a35d9d0824add0ef2309e600052c531"><code>96f9aad</code></a> Add 'all good' job to be added as a required build</li>
<li><a href="https://github.com/pytest-dev/pytest-cov/commit/6395ecec756433194c05d34af490315b35dddafa"><code>6395ece</code></a> Test conditional collection on PyPy and CPython</li>
<li><a href="https://github.com/pytest-dev/pytest-cov/commit/f4a88d6187630295ce960010456def6b19ef01b2"><code>f4a88d6</code></a> Test both PyPy3.6 and PyPy3.7</li>
<li><a href="https://github.com/pytest-dev/pytest-cov/commit/a948e899fa002fbc7f52c6c0f7e7dffdf7987ec8"><code>a948e89</code></a> Test both PyPy3.6 and PyPy3.7</li>
<li>Additional commits viewable in <a href="https://github.com/pytest-dev/pytest-cov/compare/v2.12.1...v3.0.0">compare view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pytest-cov&package-manager=pip&previous-version=2.12.1&new-version=3.0.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)


</details>
@michaelbrewer
Copy link
Contributor

@heitorlessa

No much will happen on this issue until Python 3.6 and whether dataclasses actually delivers the savings we expect.

I would just propose a rename

@heitorlessa
Copy link
Contributor

heitorlessa commented Nov 12, 2021 via email

@heitorlessa heitorlessa transferred this issue from aws-powertools/powertools-lambda-python Nov 13, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Apr 28, 2022

Thanks for opening your first issue here! We'll come back to you as soon as we can.

@heitorlessa heitorlessa transferred this issue from aws-powertools/powertools-lambda Apr 28, 2022
@heitorlessa heitorlessa removed their assignment Jul 25, 2022
@heitorlessa
Copy link
Contributor

Closing as we will only be able to rename it completely in v3. Data classes confirmed to be too slow. We're following Pydantic v2 closely for a faster and closer API model. As there isn't a date, raw_event property does the job.

@heitorlessa heitorlessa closed this as not planned Won't fix, can't repro, duplicate, stale Aug 22, 2022
@github-actions
Copy link
Contributor

⚠️COMMENT VISIBILITY WARNING⚠️

This issue is now closed. Please be mindful that future comments are hard for our team to see.

If you need more assistance, please either tag a team member or open a new issue that references this one.

If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants