Percent-decoding entire URL components is not valid #180

karwa · 2022-11-06T03:35:38Z

These operations are not valid:

Lines 178 to 183 in 3c7235e

    
           // Decode URI octets 
        
           if (urlObject.pathname) { 
        
           	try { 
        
           		urlObject.pathname = decodeURI(urlObject.pathname); 
        
           	} catch {} 
        
           }

normalize-url/index.js

Lines 243 to 245 in 3c7235e

    
           try { 
        
           	urlObject.search = decodeURIComponent(urlObject.search); 
        
           } catch {}

Since URLs are a textual format, certain characters have semantic meaning. Percent-encoding can be used to escape those characters. For example, if we want a single path component named "AC/DC", we'll have a problem, because "/" can mean a path separator:

/music/bands/AC/DC
             ^^^^^ - 2 components! ❌

So instead, we have to escape the use of "/" within name "AC/DC":

/music/bands/AC%2FDC
             ^^^^^^^ - 1 component ✅

If you percent-decode the entire path string, we irreversibly lose the information that "AC/DC" was supposed to be a single path component.

Instead, the correct way to do this is to split the component (still escaped) up in to its constituent parts, decode each component, escape any characters with semantic meaning, and join them up again. For the path, that means breaking it up in to path components and ensuring "/" and "" characters are escaped again in each component before you rebuild the path string. For the query, it means doing the same for each key and value (not each key-value pair - they need to be broken down in to their smallest subcomponents).

The text was updated successfully, but these errors were encountered:

sindresorhus · 2022-11-06T03:46:28Z

// @oswaldosalazar @ludofischer

karwa · 2022-11-06T04:19:20Z

Ah my bad - the path example is okay because decodeURI won't unescape some reserved characters, like "/", so it will keep the path component as "AC%2FDC". But for the query decoding, decodeURIComponent really does just unescape everything:

var url = new URL("http://example/?show=Tom%26Jerry&episode=3");

url.href;
// 'http://example/?show=Tom%26Jerry&episode=3'
//                          ^^^ - ❗️
url.searchParams.get("show");
// 'Tom&Jerry'

url.search = decodeURIComponent(url.search);

url.href;
// 'http://example/?show=Tom&Jerry&episode=3'
//                          ^ - ❗️
url.searchParams.get("show");
// 'Tom'

ludofischer · 2022-11-10T17:50:09Z

As far as I remember decodeURIcomponent was introduced because some people complained that nomalize-url would break their URLs when the search parameters contained a forward slash (it would encode the forward slash as a side effect of sorting). As I see it, the library should not change the encoding unless the user asks for it. Since sorting is responsible via searchParams.sort() for changing the encoding, could the solution be to sort them by hand? Or is this more complicated than it looks?
Interesting, it looks like even server-side frameworks that do file-based routing end up with issues with decodeURIComponent. sveltejs/kit#7577

ludofischer · 2022-11-10T18:05:04Z

The problem is that URL.searchParams gives you the parameter already decoded, so we can't rely on that to preserve the original encoding. We can't use decodeURI instead of decodeURIComponent because decodeURI does not bring back the forward slash (it keeps encoded as %2F).

ludofischer mentioned this issue Mar 19, 2023

feat!(postcss-normalize-url): inline third-party dep and remove options cssnano/cssnano#1480

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Percent-decoding entire URL components is not valid #180

Percent-decoding entire URL components is not valid #180

karwa commented Nov 6, 2022

sindresorhus commented Nov 6, 2022

karwa commented Nov 6, 2022

ludofischer commented Nov 10, 2022

ludofischer commented Nov 10, 2022

Percent-decoding entire URL components is not valid #180

Percent-decoding entire URL components is not valid #180

Comments

karwa commented Nov 6, 2022

sindresorhus commented Nov 6, 2022

karwa commented Nov 6, 2022

ludofischer commented Nov 10, 2022

ludofischer commented Nov 10, 2022