Filtering request using OpenAPI spec #645

fcollonval · 2021-12-22T15:46:47Z

This is a PoC to add allowed and blocked request at runtime.

fcollonval · 2021-12-22T15:53:12Z

@bollwyvl I created this PoC to filter request at runtime using OpenAPI specification for allowed and blocked routes.

For allowed routes, this sounds great. But for blocked routes (blocking only request matching that subset of rules), it may be too complicated. In particular you can not blocked easily any sub path by simply blocking a common root. The other issue I see is that OpenAPI does not support path parameter with slashes.

So... should I switch to something more trivial to define routes like Regex and list of methods?

jupyter_server/tests/test_firewall.py

setup.cfg

bollwyvl · 2021-12-22T16:10:08Z

jupyter_server/extension/application.py

@@ -96,7 +98,9 @@ def _prepare_templates(self):
        self.initialize_templates()
        # Add templates to web app settings if extension has templates.
        if len(self.template_paths) > 0:


could just be if self.template_paths

I won't change it as it is not modified by this PR.

jupyter_server/extension/application.py

jupyter_server/firewall.py

bollwyvl · 2021-12-22T16:19:37Z

you can not blocked easily any sub path by simply blocking a common root.

Right: it's only convention that sub-routes are even related. Regexen would be a way to do it, but could even be applied as a class of patch. The key thing is still to end up with an actual, accurate OpenAPI at the end, whether created by JSON Patch, JSON-e, python, or otherwise.

fcollonval · 2021-12-23T10:25:27Z

Thanks a lot for reviewing this one @bollwyvl

you can not blocked easily any sub path by simply blocking a common root.

Right: it's only convention that sub-routes are even related. Regexen would be a way to do it, but could even be applied as a class of patch. The key thing is still to end up with an actual, accurate OpenAPI at the end, whether created by JSON Patch, JSON-e, python, or otherwise.

Ok then let's keep OpenAPI as expected input format and let interested admin/dev design their tool to generate it. But that does not address the issue with path parameter with slashes that we are using a lot (like for the content API). The way I see it, is to allow to register path transformer that will escape slashes as needed to fulfill OpenAPI expectation. Something like:

/api/contents/path/to/file.ext ➡️ /api/contents/path%2Fto%2Ffile.ext

and 🤞 that openapi library will pick correctly the parameter

codecov-commenter · 2021-12-28T15:51:23Z

Codecov Report

Merging #645 (ccac364) into main (db46446) will increase coverage by 0.19%.
The diff coverage is 90.22%.

@@            Coverage Diff             @@
##             main     #645      +/-   ##
==========================================
+ Coverage   77.88%   78.08%   +0.19%     
==========================================
  Files         110      113       +3     
  Lines       10405    10630     +225     
  Branches     1400     1429      +29     
==========================================
+ Hits         8104     8300     +196     
- Misses       1911     1937      +26     
- Partials      390      393       +3

Impacted Files	Coverage Δ
jupyter_server/serverapp.py	`65.62% <70.83%> (+0.11%)`	⬆️
jupyter_server/specvalidator.py	`86.07% <86.07%> (ø)`
jupyter_server/tests/extension/test_launch.py	`94.11% <89.47%> (-1.34%)`	⬇️
jupyter_server/tests/test_spec_validation.py	`96.61% <96.61%> (ø)`
jupyter_server/extension/application.py	`75.22% <100.00%> (+0.90%)`	⬆️
...pyter_server/tests/extension/mockextensions/app.py	`94.44% <100.00%> (+0.50%)`	⬆️
jupyter_server/tests/test_specvalidator.py	`100.00% <100.00%> (ø)`
jupyter_server/services/kernels/kernelmanager.py	`80.63% <0.00%> (-1.91%)`	⬇️
jupyter_server/base/handlers.py	`65.31% <0.00%> (-0.21%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db46446...ccac364. Read the comment docs.

fcollonval · 2021-12-28T16:00:09Z

I gladly add some documentation about this feature. Could somebody advice me on the best place to do so?

blink1073 · 2021-12-28T18:46:00Z

Maybe these docs?

fcollonval · 2021-12-29T10:54:26Z

I added to the developers section as it is not configurable. So it probably makes more sense there.

blink1073 · 2021-12-29T19:02:42Z

docs/source/developers/extensions.rst

+
+    class MyExtensionApp(ExtensionApp):
+
+        _allowed_spec = {


It feels like these should be public attributes if they are meant to be part of the API

Following the idea of jupyterlab/jupyterlab_server#167, the scenario imagined to design this PR is the following:

James wants to create a SmithApp that inherits from ExtensionApp. That app can be executed either as an Jupyter server extension or as a standalone application. But in the case of the standalone application, he only wants the new handlers of his extension and the get contents endpoint to be available.

To achieve this, James will overrides the protected class attribute _allowed_spec in the child class SmithApp. Users of the SmithApp will never be able to change the endpoints available in standalone mode*.

*of course because this is Python, this is only a convention that protected attribute should not be modified externally.

pinging @bollwyvl to get his opinion

Zsailer · 2021-12-31T15:53:59Z

@fcollonval and @bollwyvl , great stuff here!

Do you mind waiting to merge this until after we have our next Jupyter Server meeting (January 6th)? I'm hoping to get more folks to weigh-in on this feature since we've discussed this in the past.

For context, we added a way to add/remove whole "services" in Jupyter Server by overriding default_services attribute in the ServerApp:

jupyter_server/jupyter_server/serverapp.py

Lines 741 to 757 in 3e78339

    
           # A list of services whose handlers will be exposed. 
        
           # Subclasses can override this list to 
        
           # expose a subset of these handlers. 
        
           default_services = ( 
        
               "api", 
        
               "auth", 
        
               "config", 
        
               "contents", 
        
               "files", 
        
               "kernels", 
        
               "kernelspecs", 
        
               "nbconvert", 
        
               "security", 
        
               "sessions", 
        
               "shutdown", 
        
               "view", 
        
           )

This PR goes a few steps further and allows developers to cherry-pick individual routes (i.e. within individual services), which is really slick.

fcollonval · 2022-01-07T08:09:25Z

Following the discussion at yesterday meeting,

The authorization layer will definitely have overlap with this proposal as it can be seen as adding authorization policy to all users.
At the functionality level, I see two main differences (compared to #165):

This PR enforces the rules on any request - it does not rely of having the handler decorated by tornado.web.authenticated. This is a technical detail but it could be important if you don't control the server extensions loaded within the application.
This is not a RBAC approach (in the opposite of Add authorization layer to server request handlers #165). Authorization policy can take lots of forms (RBAC, ABAC, Open Policy Agent, XACML, Amazon Web Service IAM,...). They have their own logic on what should be defined to create a policy. And they allow different granularity of policy - it is one of the reason external services like the Open Policy Agent will thrive in large organizations but are likely too complex for smaller players.
In particular expressing the rules for this approach is more demanding than what is required for Add authorization layer to server request handlers #165 - admin sys will certainly needs to create their own scripts to generate the OpenAPI specs.

That said I can't formulate the best articulation between this PR and #165. It could make sense to have both simultaneously - although as pointed by @echarles it will likely be confusing.

echarles · 2022-01-07T09:43:18Z

This PR enforces the rules on any request - it does not rely of having the handler decorated by tornado.web.authenticated. This is a technical detail but it could be important if you don't control the server extensions loaded within the application.

There have been discussions around the default behavior in case of absence of @authenticated decorator. The same discussion could happen for @authorized. If a developer does not decorate, should the authorization layer reject the request? I like hearing "security by default", so I would be in favor to reject if the developer does not decorate, but this is a discussion to have in #165

This is not a RBAC approach (in the opposite of Add authorization layer to server request handlers #165). Authorization policy can take lots of forms (RBAC, ABAC, Open Policy Agent, XACML, Amazon Web Service IAM,...).

True. This being said, I don't think #165 enforces RBAC approach. I have e.g. given a demo a few months ago during our developer meeting of a Open Policy Agent (OPA) implementation which nicely integrates with the proposed authorization layer.

fcollonval · 2022-01-07T14:29:41Z

I like hearing "security by default", so I would be in favor to reject if the developer does not decorate, but this is a discussion to have in #165

👍

This is not a RBAC approach (in the opposite of Add authorization layer to server request handlers #165). Authorization policy can take lots of forms (RBAC, ABAC, Open Policy Agent, XACML, Amazon Web Service IAM,...).

True. This being said, I don't think #165 enforces RBAC approach. I have e.g. given a demo a few months ago during our developer meeting of a Open Policy Agent (OPA) implementation which nicely integrates with the proposed authorization layer.

Awesome that the current proposal is so flexible.

So does that means we could transform this PR as an concrete Authorizer implementation (in the sense of the authorization layer) that uses only the request to provide the allow/deny answer?

echarles · 2022-01-07T14:39:09Z

So does that means we could transform this PR as an concrete Authorizer implementation (in the sense of the authorization layer) that uses only the request to provide the allow/deny answer?

This would be a nice convergence. Maybe have a a look at #165 where the Authorizer is a class that can be extended, or that you can inherit from something that will implement this PR features.

Not sure if the the deny by default is implemented in that PR, but it should be possible to make it work like that.

PoC filtering using OpenAPI spec

9bd423c