Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support redirection through webbrowser for OAuth authentication #162

Merged
merged 2 commits into from
Apr 20, 2022

Conversation

mdesmet
Copy link
Contributor

@mdesmet mdesmet commented Feb 27, 2022

Fixes #161

@cla-bot cla-bot bot added the cla-signed label Feb 27, 2022
@mdesmet
Copy link
Contributor Author

mdesmet commented Feb 27, 2022

Adding the keyring feature instead of the threadlocal made the test fail. IMHO the threadlocal seems not so useful, this works different from the Trino CLI and JDBC driver. However maybe from security perspective it's a good default to not cache the token over multiple threads.

In trino CLI:
See https://trino.io/docs/current/installation/cli.html#external-authentication-sso

Further queries in the CLI session do not require additional logins while the authentication token remains valid. Token expiration depends on the external authentication type configuration.

Expired tokens force you to log in again.

in trino JDBC:
See https://trino.io/docs/current/installation/jdbc.html#parameter-reference
externalAuthenticationTokenCache | Allows the sharing of external authentication tokens between different connections for the same authenticated user until the cache is invalidated, such as when a client is restarted or when the classloader reloads the JDBC driver. This is disabled by default, with a value of NONE. To enable, set the value to MEMORY. If the JDBC driver is used in a shared mode by different users, the first registered token is stored and authenticates all users.

@lukasz-walkiewicz : what do you think?

@mdesmet
Copy link
Contributor Author

mdesmet commented Feb 27, 2022

Open items:

  • Thread safety impact: writing and reading token is not thread safe. Might need locking to ensure only one token is obtained simultaneously.
  • Hardcoded name of token, should be based on url and username.

trino/auth.py Outdated
@@ -251,13 +262,34 @@ def _get_token(self, token_server, response, **kwargs):

raise exceptions.TrinoAuthError("Exceeded max attempts while getting the token")

def _get_token_from_cache(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be thread locked, the same for storing a token.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, i see what you mean, but thread locking just reads and writes will not prevent us from retrieving the token multiple times in parallel threads when we get 40X errors. Ideally we will only launch the webbrowser flow only once, even in a multithread case. Still need to test this scenario more in depth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thread locking has been implemented, covering reads/writes to cache and launching the OIDC flow only once per process and per Connection object. I think this is how it should work. The default is now class instance based caching, if keyring is installed within venv, use keyring caching per username and host. If webbrowser is not installed, for example in a headless environment (CLI), the url is printed in the console as before.

I think i can add another test to proof that the caching of the token is per connection and not per cursor as before.

trino/auth.py Outdated
self._thread_local.token = self._get_token(token_server, response, **kwargs)
return self._retry_request(response, **kwargs)
token = self._get_token(token_server, response, **kwargs)
self._store_token_to_cache(token)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, let's not cache a token like in JDBC driver to not introduce a potential breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current behaviour is printing the OIDC url in the console? Could you elaborate on how the current OIDC is used in production?

I actually agree that caching the token in the keychain shouldn't be the default from a security perspective as it will persist even outside of the process context. However within the same process, given the same connection (host and username), I think it can be cached. Woud you agree?

Actually the unit test test_multithreaded_oauth2_authentication_flow makes you think that it only retrieves the token once per thread. However that's not actually happening because the _OAuth2TokenBearer is recreated for each http_session (every time we create a cursor a separate TrinoRequest is being created). Which makes the threadlocal storage only work per cursor. That's why i opted to make the _OAuth2TokenBearer an instance property instead of being recreated for every http_session, IMHO it should be linked to the lifetime of the connection instead of a cursor.


def set_http_session(self, http_session):
        http_session.auth = _OAuth2TokenBearer(http_session, self._redirect_auth_url)
        return http_session

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, are you suggesting caching a token by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default before was already caching the token per cursor and per thread, which to me seems illogical. The newly default introduced in this PR is a thread-safe per connection cache, giving the user the choice to either create a connection once and share it over multiple threads or create a connection per thread. If you look at the test (test_multithreaded_oauth2_authentication_flow) you can see that before this change the token is already cached: the token is only retrieved once per thread, while there are many more requests. However the test didn't take into account that the Bearer class is instantiated for each cursor, so from a user perspective it seemed like no caching is performed.

I think for most users the keyring will provide the most seamless experience. they can either do pip install keyring or pip install 'trino-python-client[local-secure-storage]' to even get less browser popups as they will solely depend on the lifetime of the token.

README.md Outdated
@@ -169,7 +169,7 @@ the [`JWT` authentication type](https://trino.io/docs/current/security/jwt.html)

- `OAuth2Authentication` class can be used to connect to a Trino cluster configured with
the [OAuth2 authentication type](https://trino.io/docs/current/security/oauth2.html).
- A callback to handle the redirect url can be provided via param `redirect_auth_url_handler`, by default it just outputs the redirect url to stdout.
- A callback to handle the redirect url can be provided via param `redirect_auth_url_handler`, by default it will try to launch a web browser to go through the authentication flow, if that fails it just outputs the redirect url to stdout.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add info about caching to README.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added more details about the caching in the README.

trino/auth.py Outdated
self._thread_local.token = self._get_token(token_server, response, **kwargs)
return self._retry_request(response, **kwargs)
token = self._get_token(token_server, response, **kwargs)
self._store_token_to_cache(token)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, are you suggesting caching a token by default?

Copy link
Member

@hovaesco hovaesco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i can add another test to proof that the caching of the token is per connection and not per cursor as before.

That would be great.

@lukasz-walkiewicz
Copy link
Member

lukasz-walkiewicz commented Mar 3, 2022

First of all, thanks for the contribution @mdesmet.
I've tried running this locally in a new virtual environment however it didn't work:

(keyring)  ✘  trino-python-client   feature/webbrowser  python main.py                                                                                      ~/dev/prestosql/trino-python-client
error could be tests
Exception in thread 1: :
Traceback (most recent call last):
  File "/nix/store/lrw7ifwnpz7rskfkphq9mj9fwm5i9ldc-python3-3.9.6/lib/python3.9/threading.py", line 973, in _bootstrap_inner
Exception in thread 2: :
Traceback (most recent call last):
  File "/nix/store/lrw7ifwnpz7rskfkphq9mj9fwm5i9ldc-python3-3.9.6/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/Users/lukasz/dev/prestosql/trino-python-client/main.py", line 30, in run
    self.run()
  File "/Users/lukasz/dev/prestosql/trino-python-client/main.py", line 30, in run
    cur.execute('SELECT * FROM system.runtime.nodes')
  File "/Users/lukasz/dev/prestosql/trino-python-client/trino/dbapi.py", line 441, in execute
    cur.execute('SELECT * FROM system.runtime.nodes')
  File "/Users/lukasz/dev/prestosql/trino-python-client/trino/dbapi.py", line 441, in execute
    result = self._query.execute()
  File "/Users/lukasz/dev/prestosql/trino-python-client/trino/client.py", line 561, in execute
    result = self._query.execute()
  File "/Users/lukasz/dev/prestosql/trino-python-client/trino/client.py", line 561, in execute
    response = self._request.post(self._sql, additional_http_headers)
  File "/Users/lukasz/dev/prestosql/trino-python-client/trino/client.py", line 361, in post
    response = self._request.post(self._sql, additional_http_headers)
  File "/Users/lukasz/dev/prestosql/trino-python-client/trino/client.py", line 361, in post
    http_response = self._post(
  File "/Users/lukasz/dev/prestosql/trino-python-client/trino/exceptions.py", line 136, in decorated
    http_response = self._post(
  File "/Users/lukasz/dev/prestosql/trino-python-client/trino/exceptions.py", line 136, in decorated
    raise error
  File "/Users/lukasz/dev/prestosql/trino-python-client/trino/exceptions.py", line 123, in decorated
    raise error
  File "/Users/lukasz/dev/prestosql/trino-python-client/trino/exceptions.py", line 123, in decorated
    result = func(*args, **kwargs)
  File "/Users/lukasz/venvs/keyring/lib/python3.9/site-packages/requests/sessions.py", line 577, in post
    result = func(*args, **kwargs)
  File "/Users/lukasz/venvs/keyring/lib/python3.9/site-packages/requests/sessions.py", line 577, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/Users/lukasz/venvs/keyring/lib/python3.9/site-packages/requests/sessions.py", line 515, in request
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/Users/lukasz/venvs/keyring/lib/python3.9/site-packages/requests/sessions.py", line 515, in request
    prep = self.prepare_request(req)
  File "/Users/lukasz/venvs/keyring/lib/python3.9/site-packages/requests/sessions.py", line 443, in prepare_request
    prep = self.prepare_request(req)
  File "/Users/lukasz/venvs/keyring/lib/python3.9/site-packages/requests/sessions.py", line 443, in prepare_request
    p.prepare(
  File "/Users/lukasz/venvs/keyring/lib/python3.9/site-packages/requests/models.py", line 322, in prepare
    p.prepare(
  File "/Users/lukasz/venvs/keyring/lib/python3.9/site-packages/requests/models.py", line 322, in prepare
    self.prepare_auth(auth, url)
  File "/Users/lukasz/venvs/keyring/lib/python3.9/site-packages/requests/models.py", line 558, in prepare_auth
    self.prepare_auth(auth, url)
  File "/Users/lukasz/venvs/keyring/lib/python3.9/site-packages/requests/models.py", line 558, in prepare_auth
    r = auth(self)
  File "/Users/lukasz/dev/prestosql/trino-python-client/trino/auth.py", line 192, in __call__
    r = auth(self)
    token = self._get_token_from_cache(*user_host)
  File "/Users/lukasz/dev/prestosql/trino-python-client/trino/auth.py", line 287, in _get_token_from_cache
  File "/Users/lukasz/dev/prestosql/trino-python-client/trino/auth.py", line 192, in __call__
    return self._keyring.get_password(
  File "/Users/lukasz/venvs/keyring/lib/python3.9/site-packages/keyring/core.py", line 55, in get_password
    token = self._get_token_from_cache(*user_host)
  File "/Users/lukasz/dev/prestosql/trino-python-client/trino/auth.py", line 287, in _get_token_from_cache
    return get_keyring().get_password(service_name, username)
  File "/Users/lukasz/venvs/keyring/lib/python3.9/site-packages/keyring/backends/fail.py", line 25, in get_password
    return self._keyring.get_password(
  File "/Users/lukasz/venvs/keyring/lib/python3.9/site-packages/keyring/core.py", line 55, in get_password
    raise NoKeyringError(msg)
keyring.errors.NoKeyringError: No recommended backend was available. Install a recommended 3rd party backend package; or, install the keyrings.alt package if you want to use the non-recommended backends. See https://pypi.org/project/keyring for details.
    return get_keyring().get_password(service_name, username)
  File "/Users/lukasz/venvs/keyring/lib/python3.9/site-packages/keyring/backends/fail.py", line 25, in get_password
    raise NoKeyringError(msg)
keyring.errors.NoKeyringError: No recommended backend was available. Install a recommended 3rd party backend package; or, install the keyrings.alt package if you want to use the non-recommended backends. See https://pypi.org/project/keyring for details.

Environment:

(keyring)  trino-python-client   feature/webbrowser  python                                                                                                  ~/dev/prestosql/trino-python-client
Python 3.9.6 (default, Oct 31 2021, 15:37:00)
[Clang 7.1.0 (tags/RELEASE_710/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform
>>> platform.platform()
'macOS-10.16-x86_64-i386-64bit'

Which platform you tested it on?

@mdesmet
Copy link
Contributor Author

mdesmet commented Mar 3, 2022

Tested on following platform:

>>> platform.platform()
'macOS-12.2.1-arm64-arm-64bit'

Checking on https://pypi.org/project/keyring/

Compatibility - macOS
macOS keychain support macOS 11 (Big Sur) and later requires Python 3.8.7 or later with the “universal2” binary. See #525 for details.

@mdesmet
Copy link
Contributor Author

mdesmet commented Mar 4, 2022

First of all, thanks for the contribution @mdesmet. I've tried running this locally in a new virtual environment however it didn't work:

So it's possible that even if you have installed keyring, it doesn't work as there is no suitable backend. I adapted the code that it will fallback to just store the token as if keyring is not installed.

@mdesmet
Copy link
Contributor Author

mdesmet commented Mar 4, 2022

I think i can add another test to proof that the caching of the token is per connection and not per cursor as before.

That would be great.

I added the test: test_token_only_retrieved_once_per_connection

@mdesmet
Copy link
Contributor Author

mdesmet commented Mar 21, 2022

@lukasz-walkiewicz, @hovaesco : Any other remarks or concerns?

FYI launching the webbrowser and storing credentials in keyring is also what Snowflake is doing in their python connector, I think it will greatly improve the user experience of dbt in combination with oauth authentication. Am I missing any usage patterns that this behavior for OAuth authentication won't be compatible with?

Copy link
Member

@hovaesco hovaesco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. One question, if keyring is installed then a token is cached by default? Basically, having keyring installed is like an enabler to cache a token?

trino/auth.py Outdated

@staticmethod
def _determine_user_host(url, headers) -> Tuple[str, str]:
user = headers.get("X-Trino-User")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a default value for user if there is no X-Trino-User (just in case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a point here: the user is not really required for using OAuth. Maybe we should just get rid of trying to determine the user and only use the host.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it used to create a key for saving a password by keyring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's it. Now given that the lifetime of a token isn't generally that long, I would think it's probably OK to get rid of the user. Also the keyring is not installed by default, so it means the user opts in on this behavior.

Copy link
Member

@hovaesco hovaesco Mar 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we use a different user between connections? The token in keyring will be overwritten by the newest one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you don't use keyring, the token is stored as instance variable within the Authentication class. So it's not reused unless you would reuse that same instance over multiple connections.

With keyring it will just try to reuse the existing token, only after it expires it will prompt again. Now the earlier code was dependent on the user set in the trino.dbapi.Connection object, this user is straight from user input. It is not required to set it up for OAuth connections.

The whole point about using keyring is mostly about providing a default user-friendly experience for user ran CLI applications like dbt, I think it would be quite uncommon if they would have multiple users for the same host and also using OAuth. The workaround would be then to not use keyring in this case. Maybe we need to describe this better in the docs?

I also don't think a lot of server applications will use this webbrowser and keyring feature. They can support redirection by adding their own handle_redirect_auth_url.

I also would never install keyring by default, it's the users them self that opt in by installing keyring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to describe this better in the docs?

We did something similar for JDBC driver: https://trino.io/docs/current/installation/jdbc.html -> externalAuthenticationTokenCache

README.md Outdated
* DBAPI
A callback to handle the redirect url can be provided via param `redirect_auth_url_handler`, by default it will try to launch a web browser to go through the authentication flow, if that fails it just outputs the redirect url to stdout.

The OAuth2 token will be cached either per `trino.auth.OAuth2Authentication` instance or, when keyring is installed, it will be cached within a secure backend (macos keychain, windows credential locker, ...) under a key including host and user of the Trino connection. Keyring can be installed using `pip install trino-python-client['local-secure-storage']`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pip install 'trino[secure-local-storage]' or pip install -e '.[secure-local-storage]'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what three dots mean in this sentence, maybe it's better to include here all storages across all OS or just add etc

(macos keychain, windows credential locker, ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pip install 'trino[secure-local-storage]' or pip install -e '.[secure-local-storage]'

I would only add the first one as the second one is only for local development (might confuse normal users).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's ok.

@hovaesco hovaesco requested review from hashhar and ebyhr March 22, 2022 09:45
@mdesmet
Copy link
Contributor Author

mdesmet commented Mar 22, 2022

Looks good. One question, if keyring is installed then a token is cached by default? Basically, having keyring installed is like an enabler to cache a token?

Even without keyring installed the token is always cached per connection (actually per instance of OAuth2Authentication), compared to the old code where the token was cached per cursor. When you install keyring, the token will be reused even when you restart or rerun your application (eg multiple dbt runs won't prompt authentication again if your token is still valid).

trino/auth.py Outdated
@@ -157,8 +161,12 @@ def __eq__(self, other):


def handle_redirect_auth_url(auth_url):
print("Open the following URL in browser for the external authentication:")
print(auth_url)
print("Opening your browser for the external authentication:")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there 2 unrelated changes:

  • opening web browser on authentication,
  • keyring to store the obtained token.
    Please use seperate commits for them.


@httprettified
def test_token_only_retrieved_once_per_connection(sample_post_response_data):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pylint didn't pick this up?



@httprettified
def test_token_only_retrieved_once_per_connection(sample_post_response_data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it once per connection or once per OAuth2Authentication instance? Does having a test in which multiple connections reuse the same OAuth2Authentication instance make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's like you said, we can add a test to proof this is actually working.

trino/auth.py Outdated

@staticmethod
def _determine_host(url) -> Tuple[str, str]:
host = urlparse(url).hostname
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return urlparse(url).hostname

self._keyring = None
print("Warning: keyring module not found. OAuth2 token will not be stored in keyring.")
self._token = None
self._token_lock = threading.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there are 2 locks needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The token lock is for reads and writes to the token storage.

The token attempt lock is to only have one oauth attempt at a time and block the other responses to be retried. As part of the oauth attempt the token is written, hence we need two locks.

To be honest I wasn't entirely sure about this but apparently my intuition was right as following code shows

import threading

test = threading.Lock()

with test:
    print("first lock acquired")
    print(test.locked())
    with test:
        # not getting here as above line is waiting until the lock is released
        print("second lock acquired")
        print(test.locked())
    print("second lock released")
    print(test.locked())
    
print(test.locked())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. So there are two critical sections:

  • get/set token from the cache,
  • obtain and replace the token.
    For the latter _inside_oauth_attempt_lock does guarantee that only one thread tries to get a new token at a given time but I believe other threads will still try to get a new token after obtaining this lock even though the token was just refreshed and is still valid:
with self._inside_oauth_attempt_lock:
    if not self._inside_oauth_attempt:
        self._inside_oauth_attempt = True
        self._attempt_oauth(response, **kwargs)
        self._inside_oauth_attempt = False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're indeed right, it's not correct. I will make the change and add a test for this behaviour.

trino/auth.py Outdated
self._keyring = importlib.import_module("keyring")
except ImportError:
self._keyring = None
print("Warning: keyring module not found. OAuth2 token will not be stored in keyring.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be printed for all clients which are not using keyring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as the _OAuth2TokenBearer is instantiated as part of the init of the OAuth2Authentication class, it will be shown only once. I thought it's a good way to remind the user that they can reduce the amount of browser oauth popups by installing keyring. It will indeed be shown if you don't install keyring and use the OAuth2Authentication class.

Maybe we can remove this and just write good docs. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the motiviation but it should use a logger and maybe not on warning. It's a hint, not an indication of a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will use a logger. Not sure if anyone runs this with logging set to INFO, but I agree to your point that it's not really a warning from a logging perspective. Anyway it's explained in the docs already.

trino/auth.py Outdated
except self._keyring.errors.NoKeyringError:
pass

# keyring is not installed, so we fall back to a per connection cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per instance of OAuth2Authentication right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's indeed correct, also mentioned in earlier reviews. I will update the comment.

trino/auth.py Outdated
print(auth_url)
print("Opening your browser for the external authentication:")
result = webbrowser.open_new(auth_url)
if not result:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about having console and browser handlers seperately? This would allow users to use either

connect(
  "coordinator",
  auth=OAuth2Authentication(redirect_auth_url_handler=WebBrowserHandler())

or

connect(
  "coordinator",
  auth=OAuth2Authentication(redirect_auth_url_handler=ConsoleHandler())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no objection against having both handlers, but what's the use of the ConsoleHandler? If needed, a user can provide their own handler and override the default behaviour of launching the webbrowser. They would probably do something with the url instead of just printing it to the console?

The behaviour is actually pretty similar to how the JDBC driver works. See https://trino.io/docs/current/installation/jdbc.html?highlight=externalAuthentication

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jdbc driver actually defines an interface: https://github.com/trinodb/trino/blob/master/client/trino-client/src/main/java/io/trino/client/auth/external/RedirectHandler.java with multiple implementations:

DESKTOP_OPEN(new DesktopBrowserRedirectHandler()),
SYSTEM_OPEN(new SystemOpenRedirectHandler()),
PRINT(new SystemOutPrintRedirectHandler()),
OPEN(new CompositeRedirectHandler(ImmutableList.of(SYSTEM_OPEN, DESKTOP_OPEN))),
ALL(new CompositeRedirectHandler(ImmutableList.of(OPEN, PRINT)))

and even though the default is ALL users can also use PRINT which does not try to open up the browser.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will adapt the implementation to provide a ConsoleHandler and WebBrowserHandler.

@mdesmet mdesmet force-pushed the feature/webbrowser branch 4 times, most recently from c92323e to 6c24cdd Compare March 30, 2022 12:27
@hovaesco
Copy link
Member

hovaesco commented Apr 1, 2022

Awesome job 🎉 One general question. Have you fully tested it with dbt-trino?

@mdesmet
Copy link
Contributor Author

mdesmet commented Apr 2, 2022

Awesome job 🎉 One general question. Have you fully tested it with dbt-trino?

Haven't tried that. Will try it during next week. On first sight we might have to ensure that only one instance of trino.auth.OAuth2Authentication() is created.

Copy link
Member

@lukasz-walkiewicz lukasz-walkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks much much better now 👍
One last comment, could you please change the name of the PR to something more descriptive?

trino/auth.py Outdated
principal: Optional[str] = None,
delegate: bool = False,
ca_bundle: Optional[str] = None,
self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo or move to a different commit

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick skim, still need to look at the _OAuth2TokenBearer changes for correctness.

setup.py Outdated Show resolved Hide resolved
tests/unit/oauthtest.py Outdated Show resolved Hide resolved
trino/auth.py Outdated Show resolved Hide resolved
@mdesmet mdesmet changed the title Feature/webbrowser Support redirection through webbrowser for OAuth authentication Apr 4, 2022
trino/auth.py Outdated
print(auth_url)
class RedirectHandler(metaclass=abc.ABCMeta):
"""
Abstract class for Oauth redirect handlers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: improve docstrings here and for classes below, like OAuth, remove extra spaces, keep dots etc.

@mdesmet mdesmet force-pushed the feature/webbrowser branch 2 times, most recently from 1f551ea to ba27735 Compare April 7, 2022 18:37
@mdesmet
Copy link
Contributor Author

mdesmet commented Apr 7, 2022

Awesome job 🎉 One general question. Have you fully tested it with dbt-trino?

I have succesfully tested the feature with dbt and also added the changes in the dbt PR. We still will need to update the trino-python-client version there, after release.

If you want to try it yourself, you can run following commands

pip install git+https://github.com/mdesmet/dbt-trino.git@feature/oidc 
pip install git+https://github.com/mdesmet/trino-python-client.git@feature/webbrowser

@hashhar hashhar self-requested a review April 7, 2022 19:08
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impl looks good to me. I'd love to be able to integration test these instead of doing manually but that's separate effort.

Some comments on README.

README.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
setup.py Show resolved Hide resolved
@mdesmet
Copy link
Contributor Author

mdesmet commented Apr 8, 2022

Impl looks good to me. I'd love to be able to integration test these instead of doing manually but that's separate effort.

There was still some issue with the locking that i fixed yesterday. The integration tests actually lock on each query, which means that every request including retry is queued instead of executed in parallel. The locking is required to make httpretty threadsafe. So the existing tests are not completely covering the real nature of a multiple requests blocking until one finishes the oauth authentication process.

with RunningThread.lock:
response = request.post("select 1")

@hashhar
Copy link
Member

hashhar commented Apr 8, 2022

I don't think cursor is expected to be thread-safe. What does DB-API say about it?

Or do you mean we lock per-instance of connection?

@mdesmet
Copy link
Contributor Author

mdesmet commented Apr 8, 2022

I don't think cursor is expected to be thread-safe. What does DB-API say about it?

Or do you mean we lock per-instance of connection?

It's not that we want to share a cursor over multiple threads. It's about the requests that happen within the cursor execution. The cursor is the first place where we actually talk to the coordinator, as the connection is just a cursor factory. So within a mulithreaded execution (each thread with their own cursor), multiple requests receive a 401 error and should wait until the first lock goes through the OAuth authentication flow. When that one thread completes and stores the token, the other threads should wake up and retry the request.

Within the integration tests, as we put a lock on every cursor execution, this effectively lets only one thread go through the request, 401, retrieve token and retry request flow. while all other threads are waiting and then execute with the token provided by the first thread. This doesn't match the real behaviour where actually multiple threads fail on 401 (as mentioned above).

I tried without the lock but then the httpretty request counts not match up anymore. So the issue lies within the test framework as mentioned in the comment in the code.

# apparently HTTPretty in the current version is not thread-safe
# https://github.com/gabrielfalcao/HTTPretty/issues/209
with RunningThread.lock:
response = request.post("select 1")
self.token = response.request.headers["Authorization"].replace("Bearer ", "")

Copy link
Member

@hovaesco hovaesco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it with dbt-trino and it works flawlessly, thank you for great work!

@mdesmet
Copy link
Contributor Author

mdesmet commented Apr 8, 2022

keyring.errors.NoKeyringError: No recommended backend was available. Install a recommended 3rd party backend package; or, install the keyrings.alt package if you want to use the non-recommended backends. See https://pypi.org/project/keyring for details.

We can't add add localstorage_require to all_require as within the tests there is no keyring backend. I will revert that change. I also don't think we should install keyring without explicit approval from the user.

@hashhar
Copy link
Member

hashhar commented Apr 8, 2022

Thanks, please add that as a comment to localstorage_require about why it's missing from all_require.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % better name if anyone has any

setup.py Outdated Show resolved Hide resolved
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your patience @mdesmet and contributing this useful feature.

@hashhar hashhar merged commit 939cab0 into trinodb:master Apr 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Improve Oauth authentication flow to launch webbrowser and cache token
5 participants