marshal dynamodb results directly to JSON #134

jonsmirl · 2018-02-26T15:32:58Z

When writing lambda functions that query dynamodb I often simply send this data back to the lamdba caller as a JSON result. I would find it useful to have marshalling functions that go directly from the dynamodb result to json (in addition to the existing ones). This would optionally let me skip a lot of pointless conversions and memory allocations building the intermediate go structure which is promptly destroyed one line later in my code. Note, I'm not bringing back just a single item, usually it is an array of several thousand items.

jasdel · 2018-02-26T18:12:24Z

Thanks for posting this feature request, @jonsmirl. Are you looking for the DynamoDB API calls to return a JSON string of the result directly, instead of the AttributeValues that are normally returned?Or are you looking for a utility that allows you to convert AttributeValue collections to the DynamoDB JSON representation?

jonsmirl · 2018-02-26T18:35:23Z

Does it work like this?

http request goes off to DynamoDB and the results come back as JSON.
Go then turns that JSON into a structure like this:
{
"ConsumedCapacity": null,
"Count": 23,
"Items": [
{
"thing": {
"B": null,
"BOOL": null,
"BS": null,
"L": null,
"M": null,
"N": null,
"NS": null,
"NULL": null,
"S": "70b2d28ecf7e01dd3a4ba88da3ddde44",
"SS": null
},
I marshal it with a structure definition and it turns in to:
error := dynamodbattribute.UnmarshalListOfMaps(result.Items, &items)
[
{
"timestamp": 1519510162585,
"type": "clip",
"thing": "70b2d28ecf7e01dd3a4ba88da3ddde44"
},
{
Which I then turn back into json with this:
res, _ := json.Marshal(items)

Two optimizations seem possible

Leave the data in JSON when it comes back from dynamodb. Let me get access to it without doing any transforms to it. The idea here is to make it easy to just send the result set on out to the caller.
Leave the data in JSON until dynamodbattribute.UnmarshalListOfMaps(result.json, &items) is called. ie avoid that first expanded representation and make the Unmarshal functions directly work on json.

Also, parallel optimizations can be applied to writing data to DynamoDB. My incoming data is already in a JSON string, it would be great if I could simply pass that string into DynamoDB unchanged.

My Go code is used to validate the user and to ensure they don't read/write from unauthorized places (I use CognitoIDs). The Go code doesn't touch the data in either direction.

On the plus side I am converting these functions from Javascript (where they run 500-1000ms) over to Go which so far is running in the 200-300ms range. Getting rid of pointless shuffling of the data by Go might reduce my run-time down to 100ms.

warrenstephens · 2019-08-29T16:17:14Z

A function that would convert AttributeValue collections directly to a valid JSON string would be very handy.

The JSON strings can be passed to all manner of library functions and other things -- without the need to pass thru Go specific struct definitions.

My current need is simply to pass the retrieved data via a .JSON file -- the Go code never uses the content of the object at all.

jasdel · 2019-09-19T23:58:23Z

Thanks for the update @jonsmirl and @warrenstephens. Since the AttributeValue type is used within as a field within a JSON document, there is not a clean way to inject the AttributeValue as a JSON string/byte slice.

Since we have a good understanding of the use case we can use this issue to track design proposals for this feature. There are a few ways I think this could be implemented with varying levels of "completeness"

Ideas:

1.) Add custom member to AttributeValue

Adding a custom member, (e.g. Raw) to the AttributeValue that can contain a JSON byte slice. This new member would always be used if not-nil. Serializing the AttributeValue to the request would write the Raw member instead of serializing the AttributeValue's other members.

Reading a response from DynamoDB, the Raw member would contain the response's raw value for the AttributeValue.

This would be limited to AttributeValue type only, does not cover map[string]AttributeValue, nor []AttributeValue. Requires the user to split a JSON document's map/list into separate AttributeValues.

2.) Add Additional byte slice API input/output parameters

For every API input/output that contains an AttributeValue parameter the SDK's generator would also generate a new Raw<ParamName> parameter that contains a raw JSON byte slice for the JSON document to be sent.

The second option seems to be more of a sledge hammer to the problem, and it would be better to find a cleaner, more concise solution to this issue.

tarndt · 2020-03-10T22:22:14Z

Any updates on this issue? Approach #1 would greatly increase the performance of some applications I maintain.

neoramos · 2020-03-31T12:28:41Z

I'm also interested on the resolution of this enhancement. thank you.

codeherk · 2021-02-13T04:39:28Z

Any updates on this? I'm interested as well.

marc-ostrow · 2021-02-13T20:09:10Z

I think a better example would be that json.Unmarshal of DynamoDB JSON would result in a map[string]types.AttributeValue because then one can use attributevalue.UnmarshalMap will get you your go type which can be json.Marshaled to normal JSON.

I had to port dynamodbattribute from v1 SDK so I could get my go type from a DynamoDB Streams Event

vudh1 · 2022-04-22T17:20:42Z

Hi all, I have discussed with the team, and we are planning to implement this. However, there is no timeline for this yet, I will comment if we have any update.

ryan-ju · 2022-07-26T11:39:08Z

There is already a function

aws-sdk-go-v2/service/dynamodb/serializers.go

Line 3374 in 98519e6

    
           func awsAwsjson10_serializeDocumentExpressionAttributeValueMap(v map[string]types.AttributeValue, value smithyjson.Value) error {

that does almost what we need.

This can be copied into a public function. This should only take a couple of days to implement.

duaneking · 2022-11-25T20:54:19Z

This is a critical path feature that should already be supported.

rw-access · 2023-01-23T23:10:31Z

I would love to see some traction on this, but want to revise the ideas proposed earlier:
#134 (comment)

My use case would prefer to not allocate everything in memory, but rather accept an io.Writer because I prefer to process everything in a stream. It looks to me like there is only one thing blocking this:

~~smithyjson.newValue could be made public~~ can be done with NewEncoder().Value
awsAwsjson10_serializeDocumentAttributeUpdates could be made public

Those honestly appear to be the only blockers. I may just fork the repo and make those changes to unblock myself

duaneking · 2023-01-26T17:32:39Z

Respectfully, while the Writer interface is commonly used for a lot of things in go and that is kind of a golang thing, I don't believe that that alone should block this.

First version doing allocations is fine, as long as the interface allows updates and optimizations later. If the initial requirement is that it support the Write interface to cll Write(*), then provide an option for that, but don't focus too much on optimization prematurely. It's better to get the interfaces correct, first.

The initial implementation could be as simple as a strings.Builder consuming the write of the json to return the result.

Separation of concerns is also an issue, and I don't believe that overloading concerns is going to be useful here. It might be better for backwards compatibility to have additional API's.

lucix-aws · 2023-09-19T16:03:16Z

Chiming in on this given the amount of attention/upvotes it's received in the past. As an FYI, jasdel@ and vudh1@ are no longer on the project, so the only context I have is what I can read here.

I'd like to highlight a few very salient points from the discussion so far-

@jonsmirl There are real use cases, for operations which return structured data in some form that is useful/valid outside of a deserialization context, that would be served by being able to skip SDK deserialization and instead access that data directly.
- I would assert this extends beyond just dynamodb scan results, although I can't necessarily provide a concrete example at this time.
- The way the response (or parts of it) is exposed is ultimately at the whim of the service and will vary therein. Cloudformation templates, for example, are emitted into the response type directly as a string, so the use case described here doesn't require any additional SDK support.
@rw-access io.Reader/Writer and friends are ubiquitous in the language and are certainly the most idiomatic when working with io/buffers/etc. We've certainly seen this come up in other areas, for example in the FR to add better support for copying to Writers w/ transfer manager (Support io.Writer in s3 transfer manager downloader #2036)

Given those points I think the way I'd like us to support this is through something entirely generic like the following:

// WithHijackToReader skips deserialization and instead passes the response
// body to the provided reader.
func WithHijackToReader(r *io.ReadCloser) func(*middleware.Stack) error

// WithHijackToWriter skips deserialization and instead directly copies
// the contents of the response bodyto the provided writer.
func WithHijackToWriter(r io.Writer) func(*middleware.Stack) error

// example
var body io.ReadCloser
svc.Scan(context.Background(), &dynamodb.ScanInput{
    // ...
}, dynamodb.WithAPIOptions(WithHijackToReader(&body)))

These would exist as escape hatches which you could apply to any SDK operation. WithHijackToWriter is mostly a convenience, the real value-add lies in the ability to break out of normal deserialization flow and work with the response body directly.

A crude version of this is actually entirely possible today. You can just blow out the Deserialize part of the operation and replace it with something that steals the response body:

func withHijackResponseBody(body *io.ReadCloser) func(*dynamodb.Options) {
    return func(o *dynamodb.Options) {
        o.APIOptions = append(o.APIOptions, func(s *middleware.Stack) error {
            s.Deserialize.Clear()
            // implementation of hijackResponseBody is omitted but it essentially just plumbs the body
            // from out.RawResponse back to the iface ptr
            return s.Deserialize.Add(&hijackResponseBody{body: body}, middleware.After)
        })
    }
}

I can't necessarily sanction this as a workaround at this time- will get into it below, but if you do this today you're liable to bypass a set of behavior that you really shouldn't.

There are definitely a few things as-written that prevent us from supporting this today:

Success and error deserialization right now are coupled within the same middleware. If you remove one, you remove the other. I suspect we'd want to preserve error response deserialization with these new APIs.
Features that need to "middle-man" the response body, such as dynamodb checksum response validation and s3 flexible checksums would likely be broken.
The "hijack" would be all-in: header-bound fields of the response wouldn't be deserialized anymore.
- Also, in taking ownership of the response body you get everything it contains. e.g. for Scan, you'd have to fish the actual results out of the Items field in your code that consumes the body, and they'd be in their raw "union" form.
Right now the Deserialize step MUST return a pointer to the expected output type, or the type assertion at the end of the operation will panic. There's no way to generically do that for every response type in the proposed API, I expect we could change how the outer type assertion is performed to remove the need for this, but there are behavioral differences to be considered there.

Some of the points above could probably be written off as risks/tradeoffs accepted due to the (imo) advanced nature of this use case, others would have to be addressed before we implement.

RanVaknin · 2024-02-15T21:26:46Z

Hi all,

We have recently discussed this as a team. At this time we don't plan on supporting this functionality as it's out of the SDK's scope.

Ran~

github-actions · 2024-02-15T21:27:02Z

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

jasdel added feature-request A feature should be added or improved. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Feb 26, 2018

jasdel added needs-design and removed response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Sep 19, 2019

vudh1 mentioned this issue Apr 22, 2022

Add json omitempty tags to dynamodb.AttributeValue fields aws/aws-sdk-go#3943

Closed

RanVaknin added p3 This is a minor priority issue m Effort estimation: medium labels Oct 31, 2022

lucix-aws added p2 This is a standard priority issue xl Effort estimation: very large and removed needs-design p3 This is a minor priority issue m Effort estimation: medium labels Sep 19, 2023

RanVaknin closed this as not planned Won't fix, can't repro, duplicate, stale Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

marshal dynamodb results directly to JSON #134

marshal dynamodb results directly to JSON #134

jonsmirl commented Feb 26, 2018

jasdel commented Feb 26, 2018

jonsmirl commented Feb 26, 2018 •

edited

warrenstephens commented Aug 29, 2019 •

edited

jasdel commented Sep 19, 2019

tarndt commented Mar 10, 2020

neoramos commented Mar 31, 2020

codeherk commented Feb 13, 2021

marc-ostrow commented Feb 13, 2021

vudh1 commented Apr 22, 2022

ryan-ju commented Jul 26, 2022 •

edited

duaneking commented Nov 25, 2022

rw-access commented Jan 23, 2023 •

edited

duaneking commented Jan 26, 2023

lucix-aws commented Sep 19, 2023

RanVaknin commented Feb 15, 2024

github-actions bot commented Feb 15, 2024

marshal dynamodb results directly to JSON #134

marshal dynamodb results directly to JSON #134

Comments

jonsmirl commented Feb 26, 2018

jasdel commented Feb 26, 2018

jonsmirl commented Feb 26, 2018 • edited

warrenstephens commented Aug 29, 2019 • edited

jasdel commented Sep 19, 2019

Ideas:

1.) Add custom member to AttributeValue

2.) Add Additional byte slice API input/output parameters

tarndt commented Mar 10, 2020

neoramos commented Mar 31, 2020

codeherk commented Feb 13, 2021

marc-ostrow commented Feb 13, 2021

vudh1 commented Apr 22, 2022

ryan-ju commented Jul 26, 2022 • edited

duaneking commented Nov 25, 2022

rw-access commented Jan 23, 2023 • edited

duaneking commented Jan 26, 2023

lucix-aws commented Sep 19, 2023

RanVaknin commented Feb 15, 2024

github-actions bot commented Feb 15, 2024

jonsmirl commented Feb 26, 2018 •

edited

warrenstephens commented Aug 29, 2019 •

edited

ryan-ju commented Jul 26, 2022 •

edited

rw-access commented Jan 23, 2023 •

edited