Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the rewrite_response function signature to take a RewritableResponse object #301

Merged
merged 14 commits into from
Nov 23, 2021

Conversation

maresb
Copy link
Contributor

@maresb maresb commented Nov 6, 2021

Building on #300, illustrating my idea of implementing a RewritableResponse class. I tried to use Traitlets to make my data class, since it's used throughout the Jupyter ecosystem, but I don't know it very well.

I was originally thinking of having RewritableResponse be the return type as well as the input type, but I found this awkward. Now I have rewrite_response mutating the RewritableResponse object. But this makes the variable names a bit awkward... in order for the variable names to make sense, I now define original_response run rewrite_response(original_response), define rewritten_response as original_response and then delete original_response. There must be a better way which I'm not seeing at the moment.

@maresb
Copy link
Contributor Author

maresb commented Nov 7, 2021

We could also support rewrite_response as string to be exec-ed so that as a shortcut one could say

rewrite_response="response.body = response.body.replace(b'cat', b'dog')"

but then there are potential security issues which I don't know enough how to address. Is there any potential privilege escalation here? (If you can modify the config file, can you already execute arbitrary code as the user running the server?)

@yuvipanda
Copy link
Contributor

Thanks for working on this, @maresb! See my comment in #300 (comment) about a path forward.

@yuvipanda
Copy link
Contributor

Ah, I see - as the objects passed in are mutated, you can just pass in the same objects to a bunch of different functions, and they will be able to do their modifications as a chain. This lets you compose these changes - is that right?

@maresb
Copy link
Contributor Author

maresb commented Nov 7, 2021

Exactly @yuvipanda.

I can do

rewrite_response=[rewrite_function_a, rewrite_function_b]

and then they run one after the other.

@ryanlovett
Copy link
Collaborator

Thanks @maresb ! In jupyter-rsession-proxy, I need access to the request's URI and not the proxied path, e.g. /user/so-and-so/rstudio on a hub, in order to rewrite a Location header. How could I do so if the RewritableRespone object only has attributes for host, port, and path ? Could there be a generic way to pass in information?

@maresb
Copy link
Contributor Author

maresb commented Nov 8, 2021

Hi @ryanlovett, please give raw_request.url a try. I haven't tested it, but I think it should work in theory.

I still need to document things fully.

@yuvipanda
Copy link
Contributor

yuvipanda commented Nov 8, 2021

Paging @minrk to the API design discussion as well. I love the idea of chaining, but modifying arguments passed in implicitly makes me feel bad. I don't know how to balance these feelings! Help, @minrk!

Two things to think about:

  1. If we modify passed in objects, do we still need a separate specialized class? I'm hoping we won't, and can instead just pass in the args the hook asks for. That's what we do for everyone else, right?
  2. Is there a way to do chaining without having to modify the passed in arguments?

My intuition is that I feel stronger about (1) than (2).

Thank you for being awesome and pushing this forward, @maresb and @ryanlovett!

@ryanlovett
Copy link
Collaborator

Ah, sorry I missed that @maresb . I'll give that a try.

@minrk
Copy link
Member

minrk commented Nov 9, 2021

This looks great! Like @yuvipanda, my thought is that if it's mutating the object, it can be (a copy of) the original object, rather than a custom class?

modifying arguments passed in implicitly makes me feel bad.

Modifying inputs for rewriting a response doesn't feel bad to me at all, so I have no problem with that. I don't think it's implicit when it's the sole purpose of the function.

If @yuvipanda feels strongly about immutability, then I think a custom class is appropriate, and each step in the chain would get a new instance of the class. I'm not sure what problem that solves, though.

@maresb
Copy link
Contributor Author

maresb commented Nov 9, 2021

@minrk and @yuvipanda, thanks so much for the feedback.

I actually came to the same conclusion (that I could mutate a copy of the class). It took a little extra boilerplate (no copy() method in Traitlets!?) but my latest commit implements this.

@minrk
Copy link
Member

minrk commented Nov 9, 2021

(no copy() method in Traitlets!?)

There's no copy method because copy is in the standard library and works for most objects: obj_copy = copy.copy(obj), but your method works, too!

Copy link
Contributor

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggestion about passing in the request object too, and I think we're good to go after that!

Thanks a lot, @maresb, @minrk and @ryanlovett!

jupyter_server_proxy/config.py Outdated Show resolved Hide resolved
@maresb
Copy link
Contributor Author

maresb commented Nov 9, 2021

I'm happy that you're convinced by my PR. Now that I know it has support, I think I should add a few quick tests before merging. (For example, to verify that chained rewrites work properly, and that the request object is accessible.)

lambda host, port, path, response: response.body,
non_service_rewrite_response = Union(
default_value=tuple(),
trait_types=[List(), Tuple(), Callable()],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we should support both Lists and Tuples?

@maresb
Copy link
Contributor Author

maresb commented Nov 14, 2021

Hi @yuvipanda, @minrk and @ryanlovett,

My PR seems to have stalled out. I'm waiting for feedback on whether or not to add extra parameters (besides RewritableResponse) to the rewrite_response() function. Then I would like to write tests and hopefully get this merged.

I'll try and summarize the status here:

@yuvipanda wants the rewrite_response signature to be rewrite_response(orig_request: tornado.httpclient.HTTPRequest, response: RewriteableResponse). I argue that this isn't necessary since as with tornado.httpclient.HTTPResponse, the request is provided as an attribute of RewritableResponse. I think it's more elegant to keep the signature short so that the rewrite_response() author only has to deal with the arguments they need (same goes for the extra convenience attributes .host, .port, .path which we provide). If you're unconvinced by my argument, I'm happy to change the signature as you suggest to move things forward.

Then there's @minrk's comment, for which I see two different and interesting interpretations, both of which I have already tried to address.

my thought is that if it's mutating the object, it can be (a copy of) the original object

I originally interpreted "original" to mean the original where I interpret "original" to mean my original RewritableResponse object which I was awkwardly mutating. By implementing an .apply_to_copy(rewrite_function) method, I was able to eliminate my homegrown .copy() method and clean up my awkward code. ✔️

Now I realize that @minrk likely indended that I could eliminate my RewritableResponse class entirely in favor of copying the tornado.httpclient.HTTPResponse object. This has some challenges which I don't yet see how to overcome. One annoyance is that many attributes are classes, so I don't know how "deep" I should make my copy. The more difficult challenge is that Tornado implements .body as a read-only property. I'd like to override this in the copied instance, but the descriptor lives in Tornado's HTTPResponse class, which I don't want to monkey patch. I could subclass Tornado's HTTPResponse and add a setter, but I don't know how to cast the copy to my subclass. (I actually tried by modifying .__class__, but that led to some very bizarre errors.)

Any suggestions on if / how to move this PR forward? Thanks!

@maresb
Copy link
Contributor Author

maresb commented Nov 14, 2021

Sorry, I lost track of @yuvipanda's comment in the previous PR regarding call_with_asked_args. Now his request makes a lot more sense...

@maresb
Copy link
Contributor Author

maresb commented Nov 14, 2021

Ok, if I get the green light from @yuvipanda regarding the implementation, I'll write some tests. Thanks for your patience!

@maresb maresb force-pushed the class_based_rewrite_response branch from f89c299 to 095d95f Compare November 14, 2021 17:45
# convert it to a function of only ``response`` by plugging in the
# known values for all the other parameters. (This is called partial
# evaluation.)
def rewrite_pe(rewritable_response: RewritableResponse):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use functools.partial here?

Copy link
Contributor

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some style suggestions and a note about reducing the API surface further, but I am pretty happy with this! Thank you for working on this and incorporating my feedback, @maresb!!!!

@yuvipanda
Copy link
Contributor

Sorry, I lost track of @yuvipanda's comment in the previous PR regarding call_with_asked_args. Now his request makes a lot more sense...

I was very confused until i read this haha :D Am glad you found it. Thank you for coming back and pushing this PR to completion.

Copy link
Contributor

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryanlovett does this look enough to support our rstudio workflow?

@maresb this looks great to me! A couple tests to test setting status and headers, and am ready to merge this. THANK YOU SO MUCH!

@maresb
Copy link
Contributor Author

maresb commented Nov 22, 2021

Great! I didn't have as much time this weekend as I expected. I'll try to find some time within the next day or two to bring this over the finish line.

@ryanlovett
Copy link
Collaborator

@yuvipanda Yes, I have a branch of jupyter-rstudio-proxy that works with @maresb latest on a local hub and on mybinder.

@maresb I'm also very thankful for this PR. Many downstream folks will be very happy!

@ryanlovett
Copy link
Collaborator

Created maresb#1 which adds tests for status and headers to this PR.

@maresb
Copy link
Contributor Author

maresb commented Nov 23, 2021

Sorry for being the bottleneck on the release here. I think this is finally ready to go.

Thanks everyone for all the support! ❤️

@ryanlovett ryanlovett merged commit 17f70a1 into jupyterhub:master Nov 23, 2021
@yuvipanda
Copy link
Contributor

@maresb thank you for this wonderfully designed PR!

@maresb maresb deleted the class_based_rewrite_response branch November 23, 2021 21:04
@consideRatio consideRatio changed the title Class-based rewrite response Change the rewrite_response function signature to take a RewritableResponse object Nov 28, 2021
@consideRatio
Copy link
Member

I marked this as maintenance (reworks the function signature of the rewrite_response hook). Since the rewrite_response functionality hasn't been released, I don't mark this as breaking which it otherwise would have been I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants