Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5.3.1 has too much going on and should be shortened #1589

Open
tghosth opened this issue Mar 22, 2023 · 77 comments
Open

5.3.1 has too much going on and should be shortened #1589

tghosth opened this issue Mar 22, 2023 · 77 comments
Assignees
Labels
4a) Waiting for another This issue is waiting for another issue to be resolved next meeting Filter for leaders V5 Temporary label for grouping input validation, sanitization, encoding, escaping related requirements _5.0 - prep This needs to be addressed to prepare 5.0

Comments

@tghosth
Copy link
Collaborator

tghosth commented Mar 22, 2023

5.3.1 Verify that output encoding is relevant for the interpreter and context required. For example, use encoders specifically for HTML values, HTML attributes, JavaScript, URL parameters, HTTP headers, SMTP, and others as the context requires, especially from untrusted inputs (e.g. names with Unicode or apostrophes, such as ねこ or O'Hara). (C4)
@tghosth tghosth added 1) Discussion ongoing Issue is opened and assigned but no clear proposal yet _5.0 - prep This needs to be addressed to prepare 5.0 Community wanted We would like feedback from the community to guide our decision otherwise we will progress labels Mar 22, 2023
@tghosth
Copy link
Collaborator Author

tghosth commented Mar 22, 2023

Point from @ImanSharaf #1561 (comment)

Thank you for your response and for pointing out that ASVS 5.3.1 does technically cover CRLF Injection in HTTP headers. However, I would like to express my concern that the current wording of 5.3.1 might not explicitly highlight the importance of testing for CRLF Injection attacks. As a result, testers might inadvertently overlook this specific attack vector when performing security assessments.

@elarlang
Copy link
Collaborator

We also should address CSS related issue with this: #1558

@elarlang
Copy link
Collaborator

elarlang commented Mar 23, 2023

(updated 2023-03-29, 2023-03-30)

As the Pandora box is opened...

We still need to have abstract requirement to cover all not listed syntaxes:

Verify that output encoding is relevant for the interpreter and context required.

Separate requirements:

  • HTML values, HTML attributes
    • can we add here XML values and XML attributes and split up XMl encoding and XPath sanitize from 5.3.10 need more beef #1556
    • the problem here is, "HTML attributes" should be "HTML attribute values". For HTML attributes encoding is not helpful and validation against allow-list and/or sanitation need to be used.
  • JavaScript - for JavaScript is available escaping or encoding, both should be covered
    • Maybe we can merge it to current 5.3.3 (need to rethink DOM XSS part from there)
  • URL encoding
  • HTTP headers
  • SMTP - is there such thing like "SMTP encoding?" or we actually should name it Content-Transfer-Encoding?

I did not cover from 5.3.1 - do we need to mention it?

especially from untrusted inputs (e.g. names with Unicode or apostrophes, such as ねこ or O'Hara).

CSS - (for some reason it is missing from issue description but actually exists in bleeding edge version) - As there is no such thing like CSS encoding, we need to use input validation and/or sanitize when using userinput in CSS. Should be covered via #1558

@elarlang elarlang added the V5 Temporary label for grouping input validation, sanitization, encoding, escaping related requirements label Apr 5, 2023
@jmanico
Copy link
Member

jmanico commented Apr 5, 2023

Note: CSS encoding is a thing, it's a native JS function. CSS.escape for example.

@elarlang elarlang added the next meeting Filter for leaders label Apr 29, 2023
@tghosth tghosth added the 4b Major-rework These issues need to be part of a full chapter rework label May 23, 2023
@elarlang elarlang removed the next meeting Filter for leaders label Nov 29, 2023
@tghosth
Copy link
Collaborator Author

tghosth commented Feb 15, 2024

I agree that we should split out this requirement as @elarlang has set out above, the treatment should either be one or more of the following:

  • Create new requirement if the particular context does not already have a requirement
  • Reference this issue and what is needed if there is already a related open issue (such as with HTML values, HTML attributes)
  • Make sure that each requirement (existing or new) correctly captures the fix that is needed (encoding, escaping, sanitizing, sandboxing, etc)
  • (Don't move existing requirements around yet)
  • Make sure that an individual requirement is not mixing different treatments or contexts, e.g. we may want to split out SMTP and IMAP.

@elarlang
Copy link
Collaborator

I have started to write those requirements many times.. but those still feel like duplicates. So, there is a turn-around in my proposal, and it is also more aligned with the goals to make fewer requirements to the ASVS and provide "requirement per principle", not "requirement per principle per syntax/technology". We don't have separate requirements for "use authorization correctly in framework x" and "use authorization correctly in framework y", so we also should not have separate requirements for "use encoding correctly for HTML" and "use encoding correctly for URL".

The more I watch the current requirement, the more I think it's ok. Just need to shorten it a bit.

Verify that output encoding is relevant for the interpreter and context required. For example, use encoders specifically for HTML values, HTML attributes, JavaScript, URL parameters, HTTP headers, and SMTP., and others as the context requires, especially from untrusted inputs (e.g. names with Unicode or apostrophes, such as ねこ or O'Hara). (C4)

The list of examples can be fine-tuned. Splitting the requirement per syntax makes it more like a checklist, and this is a testing guide area.

@elarlang elarlang added the next meeting Filter for leaders label Apr 17, 2024
@elarlang
Copy link
Collaborator

For brainstorming - to cover entire chapters 5.2 and 5.3, it's enough to have a requirement:

Verify that the application builds commands and documents without data used and can not change the command and document structure by using relevant defense for the syntax and context required, such as parametrization, encoding, escaping, or sanitization.

Where we can put the line between "too detailed" vs "too abstract".

@jmanico
Copy link
Member

jmanico commented Apr 17, 2024

CSS - (for some reason it is missing from issue description but actually exists in bleeding edge version) - As there is no such thing like CSS encoding, we need to use input validation and/or sanitize when using userinput in CSS. Should be covered via #1558

CSS encoding does exist in native javascript and other languages meant to escape CSS Variables. See: https://developer.mozilla.org/en-US/docs/Web/API/CSS/escape_static

@elarlang
Copy link
Collaborator

Jim, you already said that (#1589 (comment)). Just please, (re)read and learn the issues before commenting :)

@jmanico
Copy link
Member

jmanico commented Apr 18, 2024

Jim, you already said that (#1589 (comment)). Just please, (re)read and learn the issues before commenting :)

Understood and will do.

@elarlang
Copy link
Collaborator

My a bit updated proposal:

Verify that output encoding is relevant for the interpreter and context required, such as encoding value for HTML value, HTML attribute, JavaScript, URL parameter, HTTP header, or SMTP.

@elarlang elarlang added 4) proposal for review Issue contains clear proposal for add/change something and removed next meeting Filter for leaders labels Apr 19, 2024
@jmanico
Copy link
Member

jmanico commented Apr 20, 2024

I like it Elar.

@tghosth
Copy link
Collaborator Author

tghosth commented Apr 21, 2024

So I am not sure I completely understood your draft, does my rephrasing make sense:

Verify that output encoding is relevant for the interpreter and context required, such as encoding the relevant characters for HTML elements, HTML attribute, JavaScript, URL parameter, HTTP header, or SMTP.

@elarlang
Copy link
Collaborator

With quick recheck I would say, that the first comment on CSP is from you.

And you did not still realize, that your understanding of the URL encoding/decoding is/was wrong? Instead of re-analyzing, testing and re-testing, that your statements are technically correct, you say that I'm the one who can not understand the topic? Doing so, you are surprised to have certain feedback?

@jmanico
Copy link
Member

jmanico commented May 14, 2024 via email

@elarlang
Copy link
Collaborator

Just wow.

I even showed you a demo were URL encoding in a path leads to path reversal in certain aspects of the web pipeline.

Are you just ignoring the "path traversal" part from my longer comment? I provided you description there, why your demo is NOT correct.

Let me ask it again then, if in an HTML document, there is:

<img src="https://domain/one/..%2ftwo%2fthree.png">

What is the URL browser is going to ask and what is the file, server is going to serve?

@jmanico
Copy link
Member

jmanico commented May 15, 2024

This is just one of many examples. Many server-side URL parsers and processers still allow path traversal when you put untrusted data in a path with URL encoding. It's just an unsafe practice overall.

My server is not Apache, and I agree the problem is not universal but it is widespread. This is why the JWT standard went for base64url encoding - for universal defense.

So yes I read your argument but I do not agree with it just because you cannot get my demo to work on Apache. It does work elsewhere like I have shown you.

@elarlang
Copy link
Collaborator

elarlang commented May 15, 2024

I do not agree with it just because you cannot get my demo to work on Apache

This is epic facepalm, Jim. You just prove again and again, that you can not understand the need for URI encoding.

If your application unnecessarily decodes the URL and causes path traversal issues because of that, the problem is in your application. The path part was correctly URL-encoded to avoid path traversal.

..., and I agree the problem is not universal but it is widespread.

Do you realize, that if the problem could be with the URL-encoding part, it could be universal? But it is not... Just connect the dots.

This is just one of many examples.

Can you give me 3 more examples, please.

Many server-side URL parsers and processers still allow path traversal when you put untrusted data in a path with URL encoding. It's just an unsafe practice overall.

If the server side component causes the problem, it must be fixed there. From this requirement the URL encoding part is done correctly. For that reason I recommended to change in #1590 (comment)

... and you did not answer my question, what will be the URL? :)

@elarlang
Copy link
Collaborator

elarlang commented May 16, 2024

If you can not back up your sentences with proof-of-concepts and fail to admit your mistakes, you should stay away from it.

I ask clear questions and would like to see clear answers, not some bubbles.

Question 1

Question from: #1589 (comment)

I asked a clear question, but instead of answering it, you did not respond with a clear answer.

If there is a URL:

<img src="https://domain/one/..%2ftwo%2fthree.png">

You claim (#1589 (comment)) to certainly know the URL encoding, it should be simple for you. Let the World see your knowledge.

I’m very certain that my understanding of URL handling is correct.

Anwers options:

  • Option A: Should application serve the file (like you said):
https://domain/two/three.png
  • Option B: Should application serve the file (like I said):
https://domain/one/..%2ftwo%2fthree.png

Question 2

Quote from: #1589 (comment)

If you are placing untrusted data in the path of a URL REST-style you need to base64url encode to avoid path traversal.

Do you (now, finally) agree, that saing "URL-encoding path data is not enough against path traversal in the URL" is technically incorrect?

@elarlang
Copy link
Collaborator

Explanation for others, why I "keep going": In a place like an argument-based discussion for implementing security requirements, there is no place for bluff or demagogy. It causes direct quality issues to the standard and, additionally, kills motivation for other volunteers to waste time with this behavior. So, sorry for the noise, but this is the reason, why I just don't leave it like that - if a person comes and throws sh*t towards me/others, there is a response for that.

@jmanico
Copy link
Member

jmanico commented May 16, 2024 via email

@jmanico
Copy link
Member

jmanico commented May 16, 2024 via email

@tghosth tghosth mentioned this issue May 16, 2024
@tghosth
Copy link
Collaborator Author

tghosth commented May 16, 2024

Gents, I would really like to get this discussion back on track.

Having read through, it seems like we are all in agreement that encoding for a document (such for an HTML response, HTTP response headers, other document generation) needs to be done correctly based on context, e.g. whatever goes into an HTML attribute has to be encoded correctly for an HTML attribute, be it a URL or whatever.

Separately, there is a complicated discussion going on regarding the safe creation of URLs. I would like to separate out this discussion into #1961 as I believe it is complicated enough to merit it's own requirement. Feel free to copy relevant points from here into there.

@jmanico on that basis, can we proceed with the following wording for this requirement plus the outcome of #1961.

Verify that output encoding is relevant for the interpreter and context required, such as encoding the relevant characters for HTML elements, HTML attribute, JavaScript, URL parameter or HTTP header, to avoid changing the document structure or functionality.

@jmanico
Copy link
Member

jmanico commented May 16, 2024 via email

@elarlang
Copy link
Collaborator

elarlang commented May 17, 2024

The issue is comment-by-comment analyzed by 2 co-leaders and the proposal is the result of that.

I provide the summary and argumentation for the discussion, although all points are already covered in the comments.

Verify that output encoding is relevant for the interpreter and context required, such as encoding the relevant characters for HTML elements, HTML attribute, JavaScript, URL parameter or HTTP header, to avoid changing the document structure or functionality.

It is important to understand the meaning of the requirement - it is "to avoid changing the document structure or functionality". Whatever is not talking about this goal, it is not responsibility for this requirement and should stay away from this discussion (validating and using the URL by interpretators).

The requirement is also valid for URL parameters or URL paths.

https://datatracker.ietf.org/doc/html/rfc1738

All unsafe characters must always be encoded within a URL. For
example, the character "#" must be encoded within URLs even in
systems that do not normally deal with fragment or anchor
identifiers, so that if the URL is copied into another system that
does use them, it will not be necessary to change the URL encoding.

If we take an URL:

<scheme>://<host>:<port>/<path>?<searchpart>

If building an URL, any part of it must follow correct encoding according to context, to keep the structure - that it is not possible to change URL meaning with special characters (e.g. /, #, ? etc).

It is offtopic from this requirement point of view, but as it seems to cause confusion, I explain that as well - if an interpreter (web server, API) causes a path traversal issue because it unnecessarily URL-decodes the value and causes security risks like path traversal, then there can be a proposal to base64 encode the value and send this instead.

It is a logical value, but the base64 encoded value must be URL encoded, when used in any part of the URL, because it may contain characters, that require URL encoding, such as /, +, and =. For that exact reason, base64uri was invented.

But in all given examples, the proposed requirement stays valid - whatever is the logical value to use as part of the URL, it must be correctly URL-encoded. If there is an extra logical security risk to cover for interpreters (which is again another separate topic), it does not make this requirement incorrect.

Also worth pointing out - this requirement has been there pretty much as it is, we only shortened it a bit and clarified the meaning.

I'm confident to say, that the requirement is technically correct and valid, without taking away the possibility that it will be modified if there are new valid arguments to do so.

@jmanico
Copy link
Member

jmanico commented May 17, 2024 via email

@elarlang
Copy link
Collaborator

There are two forms of base 64 encoding. The first is base64 which is not file/url path safe due to the / and + encoding characters. I am suggesting base64url encoding which is indeed one of the only filepath/url safe standards there is.Base64url encoding is safe because it replaces the / and + with filesafe characters and is specially meant for use on a url path for binary or other data.I mean no disrespect here. I’m only commenting because I saw a base64 encoding mentioned which is not path safe.Base64URL | Base64 Standards | Base64base64.guru

It was all covered with

It is a logical value, but the base64 encoded value must be URL encoded, when used in any part of the URL, because it may contain characters, that require URL encoding, such as /, +, and =. For that exact reason, base64uri was invented.

@jmanico
Copy link
Member

jmanico commented May 17, 2024 via email

@tghosth
Copy link
Collaborator Author

tghosth commented May 19, 2024

The continued discussion above reinforces for me that we need to move discussion of URLs to a separate requirement as per #1961.

I am undecided on URL parameter and I might leave this to our next meeting to discuss.

@tghosth tghosth removed 4) proposal for review Issue contains clear proposal for add/change something Community wanted We would like feedback from the community to guide our decision otherwise we will progress 4b Major-rework These issues need to be part of a full chapter rework labels May 19, 2024
@tghosth
Copy link
Collaborator Author

tghosth commented May 23, 2024

The agreed wording so far is as follows but we are waiting for the outcome of #1961:

Verify that output encoding is relevant for the interpreter and context required, such as encoding the relevant characters for HTML elements, HTML attribute, JavaScript, or HTTP header, to avoid changing the document structure.

@tghosth tghosth added 4a) Waiting for another This issue is waiting for another issue to be resolved and removed 1) Discussion ongoing Issue is opened and assigned but no clear proposal yet next meeting Filter for leaders labels May 23, 2024
@elarlang
Copy link
Collaborator

To be clear with that - it was not agreed from my side, that URL parameters should be removed from the requirement. Conditional "waiting for #1961" means that it should provide a duplicate to this requirement to cover URL encoding only.

This comment is an add-on to my previous comment: #1589 (comment)

Encoding - It is needed against injection attacks, if the document is built dynamically, user input is placed between some limiters, for keeping the document structure.

If someone ever reads this issue, here are my arguments. I put numbers to points, then it is easier for me to say, which point was ignored (again) when (not) reading my arguments.

Point 1 - Injection vs Execution.

Injection - when an attacker can change the dynamically built document structure

  • program-code: <a href=" + userinput + ">link</a>
  • userinput:
    • "><script>evil-code</script> - HTML injection
    • " onclick="evil-code" style="over-the-screen - HTML attribute injection

Execution - when an attacker can execute malicious code, without manipulating the document structure during building it - so encoding does not help in this situation

  • program-code: <a href=" + userinput + ">link</a>
  • userinput:
    • javascript:evil-code - JavaScript execution

Encoding the user input here is not enough, as it is logical problem - untrusted input reached to HTML attribute and the problem here is, it was not validated.

Again, this example is input validation problem, and you can not fix it with any encoding. To say, that "URL-encoding is not enough because of javascript:" is just filtered. The responsibility for encoding is to keep the document structure. The responsibility for validation is to avoid untrusted data to reach to content execution.

For JavaScript code, execution attacks are often caused because of eval, applying untrusted data to innerHTML, document.write() etc. Although (double)encoding may help there, by principle it is an incorrect defense.

Content execution is also the situation when HTML and JavaScript get executed because of an incorrect Content-Type header, let's say text/html instead of application/json. Instead of encoding the data, you need to just set the correct Content-Type.

Point 2 - It is not limited to HTML syntax or document

The requirement is as principle, it is valid for every syntax and for every context when user input is used for dynamically built documents and is placed between delimiters.

JavaScript:

var x = 'USERINPUT';

You can use encoding to unicode, e.g. ' is \u0027 (and other alternatives). In this case one option is escaping with the same goal - to keep the document structure.

JSON (== JavaScript), no one should build it manually:

{"x":"USERINPUT"}

The same, as described with JavaScript.

URL-encoding when building a URL:

scheme://hostname/USERINPUT1/?x=USERINPUT2#USERINPUT3

In any part of the URL, you need to use URL encoding to be sure it does not change the structure of the URL.

If the value is something, that is "URL-safe" (base64uri encoded value), that does not contain any characters that require URL-encoding, then URL-encoding does not make any changes to the value, but, the URL-encoding should be there anyway.

Without URL-encoding this URL-safe part, your defense is relying on the expectation, that this part is always URL-safe, but it is a business logic rule. What if there is a need to send something else to that place that is not URL-safe anymore?

HTTP header, classical mistake:

Content-Disposition: attachment; filename="USERINPUT"

Note, that expected defense in this case is to sanitize the input for the filename attribute and encode it correctly for filename* attribute. The point is, encoding is not an HTML document specific problem.

You can add here CSS, XML, "any plain-text syntax", ...

Point 3 - It is not limited to XSS

However many times I wrote the goal for the requirement is not avoiding all possible XSS vectors, it's just keeping the document structure in place. It is just an important part of the defense against XSS if certain technologies or syntaxes are involved.

Point 4 - Limiters!

Often encoding helps only when user input goes between some limiter marks, and the point for the encoding is - you should stay between those limiter marks.

  • In HTML document context - > and <
  • In HTML attribute value context - ' or "
    • Note that, if user input is placed without delimiter marks, encoding does not help, <input name=x value=USERINPU>
  • JavaScript - ', ", or template markers ```

Examples, when encoding does not help:

  • HTML elemenent attriburte name <input name=x USERINPUT>
  • JavaScript: var x = USERINPUT;
  • JSON {USERINPUT:"value"}

In those cases, not validated (and if needed, then also sanitized) user input must never reach to this functionality.

So encoding is just one tool with a really precise goal - to keep the structure for dynamically built documents. Whatever is outside of this goal, is out of scope for this requirement AND does not make this requirement incorrect, including the "URL parameters".

@jmanico
Copy link
Member

jmanico commented May 23, 2024 via email

@elarlang
Copy link
Collaborator

You make a very solid argument. You were outvoted by two other ASVS leads (for now) because:

By concept, if "voting" goes over technically provided arguments, it's the dead-end.

  1. comment is out of scope - the scope is to keep the structure.
  2. comment is out of scope - web server handles URL as interpreter, it is another topic
  3. comment is out of scope - comment is also against / in conflict the point made
  4. each defense method has its own place in the flow, encoding does not replace validation, and validation does not replace encoding. Do not mix them.

In conclusion, it does not matter at all, what arguments I provide, you still come "against" the points with things, that I just described to be out of scope.

@tghosth tghosth added the next meeting Filter for leaders label Jun 2, 2024
@tghosth
Copy link
Collaborator Author

tghosth commented Jun 2, 2024

From my perspective, the removal of "URL parameter" relates to the complexity of handling URLs which I believe should be included elsewhere.

I think there is still a question here related to document structure which we should discuss further.

@jmanico
Copy link
Member

jmanico commented Jun 2, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4a) Waiting for another This issue is waiting for another issue to be resolved next meeting Filter for leaders V5 Temporary label for grouping input validation, sanitization, encoding, escaping related requirements _5.0 - prep This needs to be addressed to prepare 5.0
Projects
None yet
Development

No branches or pull requests

3 participants