Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: deserialize form data objects #437

Closed
wants to merge 9 commits into from
Closed

feat: deserialize form data objects #437

wants to merge 9 commits into from

Conversation

alcidesqueiroz
Copy link

@alcidesqueiroz alcidesqueiroz commented May 7, 2023

closes #429

Checklist

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@climba03003 climba03003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

  1. Can you have any reference on the implementation syntax for array and json key structure?
    For example, I never see foo{} key as an indicator for JSON value.

  2. How would you distinguish if foo[0] means to be array but not object property?
    For example, qs did a great job by providing option for different behavior. So, the user is opt-in to which fit them most.

  3. The current implementation is exposed to prototype poisoning attack.

  4. You would need to write the document for new feature before it get merged.

@alcidesqueiroz
Copy link
Author

Hi, @climba03003.

I've just updated the PR with the requested changes. In order to keep backward compatibility, I disabled key special notation parsing by default. To enable it, I introduced an option called enableSpecialNotationKeys that might be set either to true or to objectsOnly (similarly to the existing attachFieldsToBody option, that can be either true or keyValues).

Answering your previous questions:

  1. I borrowed this particular notation from Axios serializer
  2. My first approach was to parse numeric keys as array indexes. However, after seeing how qs deals with this dilemma, I've adopted a similar strategy, allowing the developer to disable this behavior through an option. For this, the enableSpecialNotationKeys option can be set to objectsOnly.
  3. Good point. I changed my implementation to prevent it.
  4. I've just included complete documentation for this feature.

Copy link
Contributor

@simoneb simoneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job, I must admit I can't follow entirely what's going on here but apart from a couple of inline comments I can't spot anything else that would need changing.

index.js Outdated Show resolved Hide resolved
@@ -83,6 +93,106 @@ function attachToBody (options, req, reply, next) {
})
}

function validateSerializedKey (tokens, value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hard to follow what's going on here but I guess this is something that we have to do. I have a feeling that this code may be coming from somewhere else, possibly slightly modified though. if that is the case, don't forget to cite the source please

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I originally wanted to save time by starting from some pre-existing battle-tested code, I couldn't find anything close to what I needed, so I ended up writing the entire feature from scratch. Zero third-party code. 😄

README.md Show resolved Hide resolved
Copy link
Member

@climba03003 climba03003 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Uzlopak Do you have any advice on the implementation (performance wise)?

index.js Outdated
if (token === ']' && prevToken === '[') {
parentRef = lastSegment.parentRef[lastSegment.key] ?? (parseArrays ? [] : {})
key = parseArrays ? parentRef.length : Object.keys(parentRef).length
} else if (parseArrays && token === ']' && /^\d+$/.test(prevToken)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prevToken is a single character.
It is necessary to use regexp for this check?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, prevToken not necessarily will have a single character. This regex would also match a multi-digit number. But thanks to your comment, I remembered that I was not checking for leading zeroes, so I extracted this checking to a function and explicitly checked if the number has no leading zero.

I've also benchmarked this regex-based checking against two other alternatives, one iteration-based and other that relies on parseFloat. The regex version won by a slim margin.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I read the code wrongly const prevToken = tokens[i - 1].
prevToken is never reassigned and is always a single character.
That means matching multi-digit number is useless for prevToken.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokens is not a string, it's an array of strings. So, prevToken might be either [, ], {}, a multi-digit numeric index or an alphanumeric object key.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it doesn't make sense why doesn't use a single for-loop.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic I've proposed relies on more than a loop in order to separate concerns. Mixing all steps in a single loop would make it way harder to read, debug and maintain.

Copy link
Contributor

@Uzlopak Uzlopak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some thouhts on the code.

index.js Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
Copy link
Contributor

@Uzlopak Uzlopak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some thouhts on the code.

index.js Outdated Show resolved Hide resolved
@@ -73,7 +73,23 @@ function attachToBody (options, req, reply, next) {
mp.destroy(new Error(`${key} is not allowed as field name`))
return
}
if (body[key] === undefined) {

const tokens = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attachToBody is called only in the preValidation hook in line 277. We could actually add above the addHook a line to normalize the value. Then we dont need to check here for multiple values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't get it. Could you explain it in more details?

index.js Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
index.js Outdated Show resolved Hide resolved
index.js Show resolved Hide resolved
// when the {} special ending is supplied, the value must be valid JSON
if (isValid && tokens[tokens.length - 1] === '{}') {
try {
secureJSON.parse(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your examples are form.append('foo{}', '{"bar":[1,2,3]}') and more complex keys. But what if the value is a primitive, like 'a' or worse 'null'?

secureParse would not fail in case of null.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've included an explicit check for null, but secureJSON.parse would, in fact, allow primitives to be passed. Is there a good reason for explicitly preventing this?


function getSerializedKeySegments (options, body, tokens, value) {
const segments = [{ parentRef: body, key: tokens[0] }]
const parseArrays = options.enableSpecialNotationKeys !== 'objectsOnly'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if enableSpecialNotationKeys is true? Maybe we should normalize like described in a different comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, enableSpecialNotationKeys can only be true or "objectsOnly", as we're explicitly checking it on line 77. So, if it isn't "objectsOnly", it necessarily must be true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

options.enableSpecialNotationKeys !== 'objectOnly' would be true if enableSpecialNotationKeys is true. So parseArrays would be true. So you would generate Arrays and not Objects with numeric keys?!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. When enableSpecialNotationKeys is true, numeric keys generate arrays. When enableSpecialNotationKeys is objectsOnly, everything is parsed as an object, so no arrays at all (unless you use the {} special ending and declare an array using JSON).

index.js Outdated Show resolved Hide resolved
index.js Show resolved Hide resolved
@alcidesqueiroz
Copy link
Author

@Uzlopak @climba03003 hey there! 👋🏼 I wanted to check in on the status of this PR. Could you please let me know if there's anything else I need to address or if there's any further feedback? Thanks.

@alcidesqueiroz alcidesqueiroz closed this by deleting the head repository Aug 28, 2023
@simoneb
Copy link
Contributor

simoneb commented Aug 28, 2023

@alcidesqueiroz thanks for your contribution here, and apologies for the lack of feedback. I think this was a valuable contribution and it's a pity that we couldn't integrate it in the codebase.

@alcidesqueiroz
Copy link
Author

@simoneb no worries, mate! 🙂 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

multipart data objects in array don't converting, still string
5 participants