Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

marshal dynamodb results directly to JSON #134

Closed
jonsmirl opened this issue Feb 26, 2018 · 16 comments
Closed

marshal dynamodb results directly to JSON #134

jonsmirl opened this issue Feb 26, 2018 · 16 comments
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue xl Effort estimation: very large

Comments

@jonsmirl
Copy link

When writing lambda functions that query dynamodb I often simply send this data back to the lamdba caller as a JSON result. I would find it useful to have marshalling functions that go directly from the dynamodb result to json (in addition to the existing ones). This would optionally let me skip a lot of pointless conversions and memory allocations building the intermediate go structure which is promptly destroyed one line later in my code. Note, I'm not bringing back just a single item, usually it is an array of several thousand items.

@jasdel
Copy link
Contributor

jasdel commented Feb 26, 2018

Thanks for posting this feature request, @jonsmirl. Are you looking for the DynamoDB API calls to return a JSON string of the result directly, instead of the AttributeValues that are normally returned?Or are you looking for a utility that allows you to convert AttributeValue collections to the DynamoDB JSON representation?

@jasdel jasdel added feature-request A feature should be added or improved. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Feb 26, 2018
@jonsmirl
Copy link
Author

jonsmirl commented Feb 26, 2018

Does it work like this?

http request goes off to DynamoDB and the results come back as JSON.
Go then turns that JSON into a structure like this:
{
"ConsumedCapacity": null,
"Count": 23,
"Items": [
{
"thing": {
"B": null,
"BOOL": null,
"BS": null,
"L": null,
"M": null,
"N": null,
"NS": null,
"NULL": null,
"S": "70b2d28ecf7e01dd3a4ba88da3ddde44",
"SS": null
},
I marshal it with a structure definition and it turns in to:
error := dynamodbattribute.UnmarshalListOfMaps(result.Items, &items)
[
{
"timestamp": 1519510162585,
"type": "clip",
"thing": "70b2d28ecf7e01dd3a4ba88da3ddde44"
},
{
Which I then turn back into json with this:
res, _ := json.Marshal(items)


Two optimizations seem possible

  1. Leave the data in JSON when it comes back from dynamodb. Let me get access to it without doing any transforms to it. The idea here is to make it easy to just send the result set on out to the caller.

  2. Leave the data in JSON until dynamodbattribute.UnmarshalListOfMaps(result.json, &items) is called. ie avoid that first expanded representation and make the Unmarshal functions directly work on json.

Also, parallel optimizations can be applied to writing data to DynamoDB. My incoming data is already in a JSON string, it would be great if I could simply pass that string into DynamoDB unchanged.

My Go code is used to validate the user and to ensure they don't read/write from unauthorized places (I use CognitoIDs). The Go code doesn't touch the data in either direction.

On the plus side I am converting these functions from Javascript (where they run 500-1000ms) over to Go which so far is running in the 200-300ms range. Getting rid of pointless shuffling of the data by Go might reduce my run-time down to 100ms.

@warrenstephens
Copy link

warrenstephens commented Aug 29, 2019

A function that would convert AttributeValue collections directly to a valid JSON string would be very handy.

The JSON strings can be passed to all manner of library functions and other things -- without the need to pass thru Go specific struct definitions.

My current need is simply to pass the retrieved data via a .JSON file -- the Go code never uses the content of the object at all.

@jasdel jasdel added needs-design and removed response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Sep 19, 2019
@jasdel
Copy link
Contributor

jasdel commented Sep 19, 2019

Thanks for the update @jonsmirl and @warrenstephens. Since the AttributeValue type is used within as a field within a JSON document, there is not a clean way to inject the AttributeValue as a JSON string/byte slice.

Since we have a good understanding of the use case we can use this issue to track design proposals for this feature. There are a few ways I think this could be implemented with varying levels of "completeness"

Ideas:

1.) Add custom member to AttributeValue

Adding a custom member, (e.g. Raw) to the AttributeValue that can contain a JSON byte slice. This new member would always be used if not-nil. Serializing the AttributeValue to the request would write the Raw member instead of serializing the AttributeValue's other members.

Reading a response from DynamoDB, the Raw member would contain the response's raw value for the AttributeValue.

This would be limited to AttributeValue type only, does not cover map[string]AttributeValue, nor []AttributeValue. Requires the user to split a JSON document's map/list into separate AttributeValues.

2.) Add Additional byte slice API input/output parameters

For every API input/output that contains an AttributeValue parameter the SDK's generator would also generate a new Raw<ParamName> parameter that contains a raw JSON byte slice for the JSON document to be sent.

The second option seems to be more of a sledge hammer to the problem, and it would be better to find a cleaner, more concise solution to this issue.

@tarndt
Copy link

tarndt commented Mar 10, 2020

Any updates on this issue? Approach #1 would greatly increase the performance of some applications I maintain.

@neoramos
Copy link

I'm also interested on the resolution of this enhancement. thank you.

@codeherk
Copy link

Any updates on this? I'm interested as well.

@marc-ostrow
Copy link

I think a better example would be that json.Unmarshal of DynamoDB JSON would result in a map[string]types.AttributeValue because then one can use attributevalue.UnmarshalMap will get you your go type which can be json.Marshaled to normal JSON.

I had to port dynamodbattribute from v1 SDK so I could get my go type from a DynamoDB Streams Event

@vudh1
Copy link
Contributor

vudh1 commented Apr 22, 2022

Hi all, I have discussed with the team, and we are planning to implement this. However, there is no timeline for this yet, I will comment if we have any update.

@ryan-ju
Copy link

ryan-ju commented Jul 26, 2022

There is already a function

func awsAwsjson10_serializeDocumentExpressionAttributeValueMap(v map[string]types.AttributeValue, value smithyjson.Value) error {
that does almost what we need.

This can be copied into a public function. This should only take a couple of days to implement.

@RanVaknin RanVaknin added p3 This is a minor priority issue m Effort estimation: medium labels Oct 31, 2022
@duaneking
Copy link

This is a critical path feature that should already be supported.

@rw-access
Copy link

rw-access commented Jan 23, 2023

I would love to see some traction on this, but want to revise the ideas proposed earlier:
#134 (comment)

My use case would prefer to not allocate everything in memory, but rather accept an io.Writer because I prefer to process everything in a stream. It looks to me like there is only one thing blocking this:

  • smithyjson.newValue could be made public can be done with NewEncoder().Value
  • awsAwsjson10_serializeDocumentAttributeUpdates could be made public

Those honestly appear to be the only blockers. I may just fork the repo and make those changes to unblock myself

@duaneking
Copy link

Respectfully, while the Writer interface is commonly used for a lot of things in go and that is kind of a golang thing, I don't believe that that alone should block this.

First version doing allocations is fine, as long as the interface allows updates and optimizations later. If the initial requirement is that it support the Write interface to cll Write(*), then provide an option for that, but don't focus too much on optimization prematurely. It's better to get the interfaces correct, first.

The initial implementation could be as simple as a strings.Builder consuming the write of the json to return the result.

Separation of concerns is also an issue, and I don't believe that overloading concerns is going to be useful here. It might be better for backwards compatibility to have additional API's.

@lucix-aws lucix-aws added p2 This is a standard priority issue xl Effort estimation: very large and removed needs-design p3 This is a minor priority issue m Effort estimation: medium labels Sep 19, 2023
@lucix-aws
Copy link
Contributor

Chiming in on this given the amount of attention/upvotes it's received in the past. As an FYI, jasdel@ and vudh1@ are no longer on the project, so the only context I have is what I can read here.

I'd like to highlight a few very salient points from the discussion so far-

  • @jonsmirl There are real use cases, for operations which return structured data in some form that is useful/valid outside of a deserialization context, that would be served by being able to skip SDK deserialization and instead access that data directly.
    • I would assert this extends beyond just dynamodb scan results, although I can't necessarily provide a concrete example at this time.
    • The way the response (or parts of it) is exposed is ultimately at the whim of the service and will vary therein. Cloudformation templates, for example, are emitted into the response type directly as a string, so the use case described here doesn't require any additional SDK support.
  • @rw-access io.Reader/Writer and friends are ubiquitous in the language and are certainly the most idiomatic when working with io/buffers/etc. We've certainly seen this come up in other areas, for example in the FR to add better support for copying to Writers w/ transfer manager (Support io.Writer in s3 transfer manager downloader #2036)

Given those points I think the way I'd like us to support this is through something entirely generic like the following:

// WithHijackToReader skips deserialization and instead passes the response
// body to the provided reader.
func WithHijackToReader(r *io.ReadCloser) func(*middleware.Stack) error

// WithHijackToWriter skips deserialization and instead directly copies
// the contents of the response bodyto the provided writer.
func WithHijackToWriter(r io.Writer) func(*middleware.Stack) error

// example
var body io.ReadCloser
svc.Scan(context.Background(), &dynamodb.ScanInput{
    // ...
}, dynamodb.WithAPIOptions(WithHijackToReader(&body)))

These would exist as escape hatches which you could apply to any SDK operation. WithHijackToWriter is mostly a convenience, the real value-add lies in the ability to break out of normal deserialization flow and work with the response body directly.

A crude version of this is actually entirely possible today. You can just blow out the Deserialize part of the operation and replace it with something that steals the response body:

func withHijackResponseBody(body *io.ReadCloser) func(*dynamodb.Options) {
    return func(o *dynamodb.Options) {
        o.APIOptions = append(o.APIOptions, func(s *middleware.Stack) error {
            s.Deserialize.Clear()
            // implementation of hijackResponseBody is omitted but it essentially just plumbs the body
            // from out.RawResponse back to the iface ptr
            return s.Deserialize.Add(&hijackResponseBody{body: body}, middleware.After)
        })
    }
}

I can't necessarily sanction this as a workaround at this time- will get into it below, but if you do this today you're liable to bypass a set of behavior that you really shouldn't.

There are definitely a few things as-written that prevent us from supporting this today:

  • Success and error deserialization right now are coupled within the same middleware. If you remove one, you remove the other. I suspect we'd want to preserve error response deserialization with these new APIs.
  • Features that need to "middle-man" the response body, such as dynamodb checksum response validation and s3 flexible checksums would likely be broken.
  • The "hijack" would be all-in: header-bound fields of the response wouldn't be deserialized anymore.
    • Also, in taking ownership of the response body you get everything it contains. e.g. for Scan, you'd have to fish the actual results out of the Items field in your code that consumes the body, and they'd be in their raw "union" form.
  • Right now the Deserialize step MUST return a pointer to the expected output type, or the type assertion at the end of the operation will panic. There's no way to generically do that for every response type in the proposed API, I expect we could change how the outer type assertion is performed to remove the need for this, but there are behavioral differences to be considered there.

Some of the points above could probably be written off as risks/tradeoffs accepted due to the (imo) advanced nature of this use case, others would have to be addressed before we implement.

@RanVaknin
Copy link
Contributor

Hi all,

We have recently discussed this as a team. At this time we don't plan on supporting this functionality as it's out of the SDK's scope.

Ran~

@RanVaknin RanVaknin closed this as not planned Won't fix, can't repro, duplicate, stale Feb 15, 2024
Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue xl Effort estimation: very large
Projects
None yet
Development

No branches or pull requests