-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SSR: Disable cache when query params are not whitelisted #22593
Conversation
33850f2
to
0259ca9
Compare
afb2b35
to
73d4558
Compare
the cache key isn't being sorted. We'll need to upgrade (multiple major versions 😬) or find another solution. |
#22587 becomes unnecessary with this PR, so we can revert it... in this PR? |
Thanks for mentioning, I was wondering how best to handle that. I'm happy to include that here, although it shouldn't conflict with this change so could be reverted separately. I'm going to try to sort out a solution for stable cache keys, then I'll add a revert commit for #22587. |
*/ | ||
import { getCacheKey } from '..'; | ||
|
||
jest.mock( 'redux-form/es/reducer', () => require( 'lodash' ).identity ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module has es6 imports which was breaking tests. This allows the tests to run although it's likely a symptom of some lacking configuration.
Improving configs is outside the scope of this PR.
server/isomorphic-routing/index.js
Outdated
Object.keys( query ).length === cacheQueryKeys.length && | ||
every( cacheQueryKeys, key => has( query, key ) ) | ||
) { | ||
return pathname + '?' + deterministicStringify( query ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look like a URL any more, but all we need is something deterministic that we can serialize for use as a key.
You can see in the tests we'll now get strings like /my/path?{"and_me":"2","cache_me":"1","me_too":"3"}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the benefit of not keeping it a URL? I mean, why change it to that structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qs
is not stable, so requests to /?a=1&b=2
and /?b=2&a=1
could be cached differently. sort
, which is intended to provide stable urls from qs.stringify
that was used is not present in the version of qs
we depend on, so had no impact.
This is a simple way to get a stable string from an object.
You can see the failing stability test here: https://circleci.com/gh/Automattic/wp-calypso/85034#tests/containers/1
It was run on 73d4558, before this implementation was changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort
, which is intended to provide stable urls fromqs.stringify
that was used is not present in the version ofqs
we depend on, so had no impact.
I thought the reason might've perhaps been the notorious localCompare
typo?
a.localCompare( b )
should be a.localeCompare( b )
This was pointed out and fixed by @Tug in #19036, which had to be reverted, and we forgot to carry over the typo fix...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😬 Could be, I just noticed that the test failed, and compared our version with the changelog:
5.2.0
- #64 Add option to sort object keys in the query string
https://github.com/ljharb/qs/blob/master/CHANGELOG.md#520
Line 123 in eb1ed86
"qs": "4.0.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like that string was wrong, but I did a quick verification by applying the below patch to 73d4558. Test still fails.
diff --git a/server/isomorphic-routing/index.js b/server/isomorphic-routing/index.js
index 68f309abfa..753e4487e4 100644
--- a/server/isomorphic-routing/index.js
+++ b/server/isomorphic-routing/index.js
@@ -115,7 +115,7 @@ export function getCacheKey( { cacheQueryKeys, pathname, query } ) {
) {
// Make a stable string representation
// @TODO: qs too old for sort param (added 5.2.0, fixed 6.1.0)
- return pathname + '?' + qs.stringify( query, { sort: ( a, b ) => a.localCompare( b ) } );
+ return pathname + '?' + qs.stringify( query, { sort: ( a, b ) => a.localeCompare( b ) } );
}
return null;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with deterministicStringify
for now. I'll file a PR to bump the qs
version so we can revisit and go back to normalized routes as cache keys later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go with deterministicStringify for now. I'll file a PR to bump the
qs
version
Sounds great, I'd be happy to get back to a url-like key when our qs
module can handle that 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll file a PR to bump the
qs
version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll file a PR to bump the
qs
version
Merged, FWIW 🙂
@@ -122,8 +122,7 @@ export function serverRender( req, res ) { | |||
context.layout && | |||
! context.user && | |||
cacheKey && | |||
isDefaultLocale( context.lang ) && | |||
! context.query.email_address // Don't do SSR when PIIs are present at the request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverting #22587
@@ -59,7 +59,7 @@ describe( 'getCacheKey', () => { | |||
expect( getCacheKey( context ) ).toEqual( getCacheKey( keysSwapped ) ); | |||
} ); | |||
|
|||
test( 'should return null if unknown and cahceable query params are mixed', () => { | |||
test( 'should return null if unknown and cacheable query params are mixed', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔤 ✅
words 🤦♂️ so tough…
😆
I get an SSR error for http://calypso.localhost:3000/log-in?client_id=someId. Should wp-calypso/client/login/controller.js Line 27 in 97c7a6c
|
I've given this some testing on the We actually see this broken in production as well: This PR fixes this by not caching any themes url with a On |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the docs ❤️ 👍
docs/server-side-rendering.md
Outdated
|
||
##### Data Cache | ||
Both caches use the same key, which is the pathname of the URL. URLs with query args are not cached unless the arg name is present in `context.queryCacheKeys`, in which case the argument or arguments are appended to the key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about being a bit more specific:
[…] unless all of the query arguments are in queryCacheKeys
[…]
If query arguments not in queryCacheKeys
are detected, server-side rendering is disabled.
It's likely that See p1519133919000135-slack-amber-dev |
We can't disable SSR with I think we'll have to do it the hard way and make sure it's cached "safely". |
docs/server-side-rendering.md
Outdated
|
||
At render time, the Redux state is [serialized and cached](../server/render/index.js), using the current path as the cache key, unless there is a query string, in which case we don't cache. | ||
|
||
This means that all data that was fetched to render a given page is available the next time the corresponding route is hit. A section controller thus only needs to check if the required data is available (using selectors), and dispatch the corresponding fetching action if it isn't; see the [themes controller](../client/my-sites/themes/controller.jsx) for an example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure we want to drop this? It's meant to be instructive as to how to actually prime Redux state on the server side, which seems relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm yeah I've been overzealous with the trimming here, I'll give this a less drastic edit...
c107da1
to
108ae47
Compare
There's no requirement for the cache keys to look like URLs. Just use stable stringification.
108ae47
to
06e00a2
Compare
This is a major improvement over what we currently have 👍 Thank you for working on it! |
5fd5aad
to
653dbf9
Compare
Store has DOM dependency, so it needs to be mocked in the tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tug I have some concerns about this approach and would like to fully understand the implications.
- The proposal modifies
context.query
on the server. My first intuition is that this seems very invasive. context.query
may be modified on the server, which may lead to differences between client and server rendering. Doesn't this lead to different render results and reconciliation problems, offsetting the benefits of SSR?- The query may not only be read via
context.query
, but also via redux state (see selectors below). Will this produce inconsistencies?
Redux state query
selectors:
export const getCurrentQueryArguments = state => get( state, 'ui.route.query.current', null ); |
export const getInitialQueryArguments = state => get( state, 'ui.route.query.initial', null ); |
Here's an example (view this logged-out):
https://wordpress.com/log-in?email_address=hello+world
https://calypso.live/log-in?email_address=hello+world&branch=udpate/ssr/get-cache-key
The login block relies on initial query args, which appears to be server-supplied in this case, which does not reflect reality:
wp-calypso/client/blocks/login/login-form.jsx
Line 441 in def342e
userEmail: getInitialQueryArguments( state ).email_address, |
There also appear to be issues with oauth. Try logging into https://woocommerce.com using this branch. Notice that the "Create an Account" link does not include oauth2_redirect
like it does in master (the query param is present on the link, but empty).
error.status = 401; | ||
return next( error ); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it OK to render this without redirect_to
now?
@sirreal Yes it's an important change on SSR, query parameters being completely ignored by default with this patch.
This is a choice we should take imo. Currently, there are not other routes than Note that query params that are not read during SSRing are still available on the server (for instance https://github.com/Automattic/wp-calypso/blob/master/server/pages/index.js#L315) |
Regarding the issues you mentioned, I think they should be easily fixed and we should address them here before merging 👍 The initial query args should be refreshed on client load which does not appear to be the case at the moment |
I'd like to second @sirreal's concerns. Modifying This is increasing the impact surface of changes performed in order to fix SSR. Let's take a step back and find a solution that's as contained as possible rather than introducing possibly unexpected side-effects at a global application level. |
This sounds in particular like we'd be removing qargs on the server, and adding them back on the client. As a general principle, I'd much prefer to avoid this kind of do-undo operations, since they're quite certainly unexpected by most people; we should allow devs to take things "at face value" as much as possible. |
Right, although I agree with the principles, I don't think I have an alternate solution to suggest. I think query params should be treated as "sensitive" by default and thus removed when given to an SSR engine, especially one that caches requests. |
Upgraded per #23259 🙂 |
I've given this some more thought. Let's try an approach guided by first principles, and logical deduction from there.
The single biggest issue that I see is in using a cache key that discards information: If our route contains various valid query args (that thus end up somehow in the generated markup and Redux state), our cache key needs to include them all. We should never create 'incomplete' cache keys, as this will lead to cross-cache slot pollution, which was the underlying issue with PII spilt across requests. By consequence, if we have routes with query args some of which hold PII, we have to decide to
We might find that we need a more granular mechanism to define allowed sets of values for a given query arg (still on a per-route basis, i.e. set in middleware), which I think is feasible. Stripping out query args on the server side and adding them back on the client IMO doesn't tackle this underlying conceptual issue but rather both obfuscates it, and adds unneeded complexity that will likely make other things more fragile. |
Disable SSR cache by returning
null
fromgetCacheKey
when the query parameters do not match the whitelistedcontext.cacheQueryParams
.Inspired by problems detected in p3btAN-SU-p2
Testing
curl "http://calypso.localhost:3000/log-in?client_id=50916&redirect_to=https%3A%2F%2Fpublic-api.wordpress.com%2Foauth2%2Fauthorize%3Fresponse_type%3Dcode%26client_id%3D50916%26state%3D3007b07829b1bf38eb89f6c0f8aab624%26redirect_uri%3Dhttps%253A%252F%252Fwoocommerce.com%252Fwc-api%252Fwpcom-signin%253Fnext%253Dmy-dashboard%26blog_id%3D0%26wpcom_connect%3D1" | grep redirect_to
to test this, the cache key here should be/log-in?{"client_id":"50916"}
and redirect_to value should not leak anywhere into the page)