Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSR: Disable cache when query params are not whitelisted #22593

Closed
wants to merge 10 commits into from

Conversation

sirreal
Copy link
Member

@sirreal sirreal commented Feb 19, 2018

Disable SSR cache by returning null from getCacheKey when the query parameters do not match the whitelisted context.cacheQueryParams.

Inspired by problems detected in p3btAN-SU-p2

Testing

@matticbot
Copy link
Contributor

@sirreal sirreal force-pushed the udpate/ssr/get-cache-key branch 2 times, most recently from 33850f2 to 0259ca9 Compare February 19, 2018 17:35
@sirreal sirreal requested a review from seear February 19, 2018 17:39
@sirreal
Copy link
Member Author

sirreal commented Feb 19, 2018

the cache key isn't being sorted. qs is too old. Exposed by tests.

We'll need to upgrade (multiple major versions 😬) or find another solution.

@seear
Copy link
Contributor

seear commented Feb 20, 2018

#22587 becomes unnecessary with this PR, so we can revert it... in this PR?

@sirreal
Copy link
Member Author

sirreal commented Feb 20, 2018

#22587 becomes unnecessary with this PR, so we can revert it... in this PR?

Thanks for mentioning, I was wondering how best to handle that. I'm happy to include that here, although it shouldn't conflict with this change so could be reverted separately.

I'm going to try to sort out a solution for stable cache keys, then I'll add a revert commit for #22587.

*/
import { getCacheKey } from '..';

jest.mock( 'redux-form/es/reducer', () => require( 'lodash' ).identity );
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module has es6 imports which was breaking tests. This allows the tests to run although it's likely a symptom of some lacking configuration.

Improving configs is outside the scope of this PR.

Object.keys( query ).length === cacheQueryKeys.length &&
every( cacheQueryKeys, key => has( query, key ) )
) {
return pathname + '?' + deterministicStringify( query );
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look like a URL any more, but all we need is something deterministic that we can serialize for use as a key.

You can see in the tests we'll now get strings like /my/path?{"and_me":"2","cache_me":"1","me_too":"3"}.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of not keeping it a URL? I mean, why change it to that structure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#22593 (comment)

qs is not stable, so requests to /?a=1&b=2 and /?b=2&a=1 could be cached differently. sort, which is intended to provide stable urls from qs.stringify that was used is not present in the version of qs we depend on, so had no impact.

This is a simple way to get a stable string from an object.

You can see the failing stability test here: https://circleci.com/gh/Automattic/wp-calypso/85034#tests/containers/1

It was run on 73d4558, before this implementation was changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort, which is intended to provide stable urls from qs.stringify that was used is not present in the version of qs we depend on, so had no impact.

I thought the reason might've perhaps been the notorious localCompare typo?

a.localCompare( b ) should be a.localeCompare( b )

This was pointed out and fixed by @Tug in #19036, which had to be reverted, and we forgot to carry over the typo fix...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😬 Could be, I just noticed that the test failed, and compared our version with the changelog:

5.2.0

  • #64 Add option to sort object keys in the query string

https://github.com/ljharb/qs/blob/master/CHANGELOG.md#520

"qs": "4.0.0",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like that string was wrong, but I did a quick verification by applying the below patch to 73d4558. Test still fails.

diff --git a/server/isomorphic-routing/index.js b/server/isomorphic-routing/index.js
index 68f309abfa..753e4487e4 100644
--- a/server/isomorphic-routing/index.js
+++ b/server/isomorphic-routing/index.js
@@ -115,7 +115,7 @@ export function getCacheKey( { cacheQueryKeys, pathname, query } ) {
 		) {
 			// Make a stable string representation
 			// @TODO: qs too old for sort param (added 5.2.0, fixed 6.1.0)
-			return pathname + '?' + qs.stringify( query, { sort: ( a, b ) => a.localCompare( b ) } );
+			return pathname + '?' + qs.stringify( query, { sort: ( a, b ) => a.localeCompare( b ) } );
 		}
 		return null;
 	}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with deterministicStringify for now. I'll file a PR to bump the qs version so we can revisit and go back to normalized routes as cache keys later.

Copy link
Member Author

@sirreal sirreal Mar 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with deterministicStringify for now. I'll file a PR to bump the qs version

Sounds great, I'd be happy to get back to a url-like key when our qs module can handle that 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll file a PR to bump the qs version

#23259

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll file a PR to bump the qs version

#23259

Merged, FWIW 🙂

@@ -122,8 +122,7 @@ export function serverRender( req, res ) {
context.layout &&
! context.user &&
cacheKey &&
isDefaultLocale( context.lang ) &&
! context.query.email_address // Don't do SSR when PIIs are present at the request
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverting #22587

@sirreal sirreal added [Status] Needs Review The PR is ready for review. This also triggers e2e canary tests and wp-desktop tests automatically. [Status] Needs e2e Testing (BETA) and removed [Status] In Progress labels Feb 20, 2018
@sirreal sirreal changed the title Disable cache when query params are not whitelisted SSR: Disable cache when query params are not whitelisted Feb 20, 2018
@@ -59,7 +59,7 @@ describe( 'getCacheKey', () => {
expect( getCacheKey( context ) ).toEqual( getCacheKey( keysSwapped ) );
} );

test( 'should return null if unknown and cahceable query params are mixed', () => {
test( 'should return null if unknown and cacheable query params are mixed', () => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔤 ✅
words 🤦‍♂️ so tough…
😆

@sirreal
Copy link
Member Author

sirreal commented Feb 20, 2018

I get an SSR error for http://calypso.localhost:3000/log-in?client_id=someId. Should redirect_to be a cache key alongside client_id?

context.cacheQueryKeys = [ 'client_id' ];

{ Error: The `redirect_to` query parameter is missing.
    at login (/Users/jonsurrell/a8c/calypso/build/webpack:/client/login/controller.js:52:18)
    […]

@seear
Copy link
Contributor

seear commented Feb 20, 2018

I've given this some testing on the themes routes. On master a url like https://wordpress.com/themes?s=wat is broken, because it is being keyed as /themes:

screen shot 2018-02-20 at 12 11 44

We actually see this broken in production as well:

screen shot 2018-02-20 at 12 38 23

This PR fixes this by not caching any themes url with a ?s param, which is the original intention, because we don't want to clog the cache with searches:

screen shot 2018-02-20 at 12 38 47

On /themes and /theme URLs without query params, caching as working as intended on this branch.

Copy link
Member Author

@sirreal sirreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the docs ❤️ 👍


##### Data Cache
Both caches use the same key, which is the pathname of the URL. URLs with query args are not cached unless the arg name is present in `context.queryCacheKeys`, in which case the argument or arguments are appended to the key.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about being a bit more specific:


[…] unless all of the query arguments are in queryCacheKeys […]

If query arguments not in queryCacheKeys are detected, server-side rendering is disabled.

@sirreal
Copy link
Member Author

sirreal commented Feb 20, 2018

I get an SSR error for http://calypso.localhost:3000/log-in?client_id=someId. Should redirect_to be a cache key alongside client_id?

It's likely that client_id should not be a cache key: #22614

See p1519133919000135-slack-amber-dev

@Tug
Copy link
Contributor

Tug commented Mar 12, 2018

We can't disable SSR with client_id though as it has been explicitly required that woo signup/login should not be changed.

I think we'll have to do it the hard way and make sure it's cached "safely".


At render time, the Redux state is [serialized and cached](../server/render/index.js), using the current path as the cache key, unless there is a query string, in which case we don't cache.

This means that all data that was fetched to render a given page is available the next time the corresponding route is hit. A section controller thus only needs to check if the required data is available (using selectors), and dispatch the corresponding fetching action if it isn't; see the [themes controller](../client/my-sites/themes/controller.jsx) for an example.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure we want to drop this? It's meant to be instructive as to how to actually prime Redux state on the server side, which seems relevant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yeah I've been overzealous with the trimming here, I'll give this a less drastic edit...

@Tug Tug force-pushed the udpate/ssr/get-cache-key branch 2 times, most recently from c107da1 to 108ae47 Compare March 19, 2018 17:14
@sirreal sirreal requested a review from a team March 19, 2018 17:29
sirreal and others added 8 commits March 19, 2018 18:36
There's no requirement for the cache keys to look like URLs. Just use
stable stringification.
Reverts changes from pr #22587
Reverts commit 1ab75ae

These changes were a workaround for what should be handled correctly
by other changes in this PR.
@Tug Tug force-pushed the udpate/ssr/get-cache-key branch from 108ae47 to 06e00a2 Compare March 19, 2018 17:56
@Tug
Copy link
Contributor

Tug commented Mar 19, 2018

Rebased.

@sirreal @ockham would one of you have some time to review the last changes? I think it's on a good trajectory :)

@yurynix
Copy link
Contributor

yurynix commented Mar 20, 2018

This is a major improvement over what we currently have 👍 Thank you for working on it!
I've tested it with some log-in flows and themes.

@Tug Tug force-pushed the udpate/ssr/get-cache-key branch from 5fd5aad to 653dbf9 Compare March 20, 2018 10:15
Tug and others added 2 commits March 20, 2018 11:15
Store has DOM dependency, so it needs to be mocked
in the tests
Copy link
Member Author

@sirreal sirreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tug I have some concerns about this approach and would like to fully understand the implications.

  • The proposal modifies context.query on the server. My first intuition is that this seems very invasive.
  • context.query may be modified on the server, which may lead to differences between client and server rendering. Doesn't this lead to different render results and reconciliation problems, offsetting the benefits of SSR?
  • The query may not only be read via context.query, but also via redux state (see selectors below). Will this produce inconsistencies?

Redux state query selectors:

export const getCurrentQueryArguments = state => get( state, 'ui.route.query.current', null );

export const getInitialQueryArguments = state => get( state, 'ui.route.query.initial', null );

Here's an example (view this logged-out):

https://wordpress.com/log-in?email_address=hello+world
https://calypso.live/log-in?email_address=hello+world&branch=udpate/ssr/get-cache-key

The login block relies on initial query args, which appears to be server-supplied in this case, which does not reflect reality:

userEmail: getInitialQueryArguments( state ).email_address,

There also appear to be issues with oauth. Try logging into https://woocommerce.com using this branch. Notice that the "Create an Account" link does not include oauth2_redirect like it does in master (the query param is present on the link, but empty).

error.status = 401;
return next( error );
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to render this without redirect_to now?

@Tug
Copy link
Contributor

Tug commented Mar 20, 2018

@sirreal Yes it's an important change on SSR, query parameters being completely ignored by default with this patch.
We can now add exceptions in the sections config by passing a cacheQueryKeys attribute with a list of whitelisted query params.

context.query may be modified on the server, which may lead to differences between client and server rendering. Doesn't this lead to different render results and reconciliation problems, offsetting the benefits of SSR?

This is a choice we should take imo. Currently, there are not other routes than /log-in with query parameters that are accessible offline. For /themes for instance, the s query parameter is currently ignored.
The difference should not cause reconciliation problems, components will be rendered again as expected if there is a change when the client loads.

Note that query params that are not read during SSRing are still available on the server (for instance https://github.com/Automattic/wp-calypso/blob/master/server/pages/index.js#L315)

@Tug
Copy link
Contributor

Tug commented Mar 20, 2018

Regarding the issues you mentioned, I think they should be easily fixed and we should address them here before merging 👍

The initial query args should be refreshed on client load which does not appear to be the case at the moment

@ockham
Copy link
Contributor

ockham commented Mar 20, 2018

I'd like to second @sirreal's concerns. Modifying context.query seems like a very low-level modification and thus very invasive, with possibly far-reaching impact.

This is increasing the impact surface of changes performed in order to fix SSR. Let's take a step back and find a solution that's as contained as possible rather than introducing possibly unexpected side-effects at a global application level.

@ockham
Copy link
Contributor

ockham commented Mar 20, 2018

The initial query args should be refreshed on client load which does not appear to be the case at the moment

This sounds in particular like we'd be removing qargs on the server, and adding them back on the client. As a general principle, I'd much prefer to avoid this kind of do-undo operations, since they're quite certainly unexpected by most people; we should allow devs to take things "at face value" as much as possible.

@Tug
Copy link
Contributor

Tug commented Mar 23, 2018

Right, although I agree with the principles, I don't think I have an alternate solution to suggest.

I think query params should be treated as "sensitive" by default and thus removed when given to an SSR engine, especially one that caches requests.

@ockham
Copy link
Contributor

ockham commented Mar 29, 2018

the cache key isn't being sorted. qs is too old. Exposed by tests.

We'll need to upgrade (multiple major versions 😬) or find another solution.

Upgraded per #23259 🙂

@ockham
Copy link
Contributor

ockham commented Apr 16, 2018

I've given this some more thought. Let's try an approach guided by first principles, and logical deduction from there.

  1. For the routes that we do render on the server, we want the DOM that results from the server-side rendered markup to be identical with the initial client-side generated one. Otherwise, React will give us reconciliation errors, and we'll lose the performance benefit that SSR gives us, since React has to re-render (parts of the) DOM on the client side.
  2. If a route has valid query args, they will all be reflected in the generated markup/DOM somehow. Thus, removing some of them (e.g. the ones that contain Personally Identifiable Information (PII)) would break the generated markup, so we shouldn't do that.
  3. Our cache keys should uniquely map to the generated markup, which is fully determined by the route, including valid query args in a normalized order.

The single biggest issue that I see is in using a cache key that discards information: If our route contains various valid query args (that thus end up somehow in the generated markup and Redux state), our cache key needs to include them all. We should never create 'incomplete' cache keys, as this will lead to cross-cache slot pollution, which was the underlying issue with PII spilt across requests.

By consequence, if we have routes with query args some of which hold PII, we have to decide to

  • either disable caching as soon as such a query arg is part of the route
  • or whitelist them for caching, to make sure we have distinct cache slots for different routes. If we want to go with this approach, we need to make sure there's only a limited set of possible query arg values so we don't fill up the cache quickly.

We might find that we need a more granular mechanism to define allowed sets of values for a given query arg (still on a per-route basis, i.e. set in middleware), which I think is feasible.

Stripping out query args on the server side and adding them back on the client IMO doesn't tackle this underlying conceptual issue but rather both obfuscates it, and adds unneeded complexity that will likely make other things more fragile.

@sirreal sirreal closed this Apr 25, 2018
@sirreal sirreal deleted the udpate/ssr/get-cache-key branch April 25, 2018 15:45
@matticbot matticbot removed the [Status] Needs Review The PR is ready for review. This also triggers e2e canary tests and wp-desktop tests automatically. label Apr 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants