Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

authentication error in /meta/handshake sends the client into a 0 interval retry loop #542

Open
bughit opened this issue Aug 4, 2023 · 12 comments

Comments

@bughit
Copy link

bughit commented Aug 4, 2023

/meta/handshake is the appropriate time to do authentication (as opposed authorization) as that's the first request the client makes and if it can't authenticate it should not be allowed to remain connected.

The following server and client code is responsible:

server

faye/src/protocol/server.js

Lines 150 to 173 in 60141e8

_advize: function(response, connectionType) {
if (array.indexOf([Channel.HANDSHAKE, Channel.CONNECT], response.channel) < 0)
return;
var interval, timeout;
if (connectionType === 'eventsource') {
interval = Math.floor(this._engine.timeout * 1000);
timeout = 0;
} else {
interval = Math.floor(this._engine.interval * 1000);
timeout = Math.floor(this._engine.timeout * 1000);
}
response.advice = response.advice || {};
if (response.error) {
assign(response.advice, { reconnect: 'handshake' }, false);
} else {
assign(response.advice, {
reconnect: 'retry',
interval: interval,
timeout: timeout
}, false);
}
},

client

faye/src/protocol/client.js

Lines 353 to 362 in 60141e8

_handleAdvice: function(advice) {
assign(this._advice, advice);
this._dispatcher.timeout = this._advice.timeout / 1000;
if (this._advice.reconnect === this.HANDSHAKE && this._state !== this.DISCONNECTED) {
this._state = this.UNCONNECTED;
this._dispatcher.clientId = null;
this._cycleConnection();
}
},

If there's an error in /meta/handshake the server sends:
assign(response.advice, { reconnect: 'handshake' }, false);
note that it doesn't even pass the interval to the client causing 0 interval retries

And the client does this._cycleConnection(); on response.

This doesn't seem like good default behavior. At least optionally, an explicit /meta/handshake error should bubble up, ending retries and letting the user of the client decide what to do about it.

@bughit
Copy link
Author

bughit commented Aug 4, 2023

an explicit /meta/handshake error should bubble up

Perhaps some kind of event/callback which returns something telling the client how to proceed (abort, retry_interval).

@jcoglan
Copy link
Collaborator

jcoglan commented Aug 4, 2023

How are you trying to implement authentication?

@bughit
Copy link
Author

bughit commented Aug 4, 2023

For the purpose of reproducing this, only the authentication failure case matters:

bayeux.addExtension({
  incoming(message, request, callback) {
    if (message.channel === '/meta/handshake')
      message.error = 'authentication failed';
    callback(message);
  }
}

@jcoglan
Copy link
Collaborator

jcoglan commented Aug 4, 2023

I need to think about this. The intention with the design of extensions was that authn/authz should be applied to the things you do with the connection, i.e. publish and subscribe, rather than to the connection itself. Most applications won't want to grant blanket access to channels just b/c a client connected successfully.

Still, it's possible that if an extension sets error on a handshake or connect message, then the server should return reconnect: 'none' in the response to stop the client retrying. Really depends what kind of error has happened though.

It would help if you could explain a more detailed use case where you want to apply authn/authz on handshakes, and then how you restrict publish and subscribe after that.

@jcoglan
Copy link
Collaborator

jcoglan commented Aug 4, 2023

It's also better for the client to handle errors if they're applied to publish/subscribe actions, because the client actually calls publish() and subscribe() explicitly and can have those methods return an error that it can react to. Applications do not call handshake() or connect() explicitly, usually, so this makes it unergonomic to route application errors (rather than protocol errors) through those code paths.

@bughit
Copy link
Author

bughit commented Aug 4, 2023

use case where you want to apply authn/authz on handshakes

Not authorization but definitely authentication. If you only do a combined authentication/authorization in /meta/subscribe, then a client that fails authentication will remain connected indefinitely, consuming resources (websocket connection and /meta/connect and ping traffic). There is no good reason for this. Such a client is invalid and should not be able to proceed past the initial /meta/handshake request.

how you restrict publish and subscribe after that

If further authorization is needed for subscription it can be done as in your example in the docs. Can it be assumed in /meta/subscribe that the client has passed authentication in /meta/handshake? I am assuming yes, but it just occurred to me that you have to fix/record its identity somehow at the point of /meta/handshake authentication, so it can't alter in a /meta/subscribe. Otherwise you'd need to repeat the authentication there.

@bughit
Copy link
Author

bughit commented Aug 4, 2023

Applications do not call handshake() or connect() explicitly, usually, so this makes it unergonomic to route application errors (rather than protocol errors) through those code paths.

Well, raising an event is not unreasonable. It comes down to this, can you justify anonymous or auth failed clients connecting without restriction, opening websockets and staying connected and ping/ponging indefinitely? I can't, that's unreasonably open.

To me its self evident that authentication needs to happen as early as possible (/meta/handshake) and failure should result in disconnection and without immediate automatic retries.

And as I realized, the identity used for auth in /meta/handshake needs to be cached somehow (associated with the clientid or connection) so it can be used later in /meta/subscribe for additional authorization. What are your thoughts on the sensibility/feasibility of that?

@bughit
Copy link
Author

bughit commented Aug 9, 2023

It would help if you could explain a more detailed use case where you want to apply authn/authz on handshakes, and then how you restrict publish and subscribe after that.

After some more thought and testing, here's a more concrete plan:

  • main goals
    • perform authn as early as possible so that anon and authn failed clients don't remain connected indefinitely
    • avoid automatic retries (regardless of interval) on failed authn
    • remember the application level identity of the client used for successful authn so it can be used later for authz
    • authorize subscriptions using the stored client identity
  • details
    • authn in an extension incoming /meta/handshake
    • associate and store the authenticated client identity in an extension outgoing /meta/handshake
      • this is when the clientId is first made available
      • but unfortunately the authenticated client identity is lost between incoming and outgoing
        • I can either patch _makeResponse to copy it to outgoing
        • or encode it in the already copied message.id and revert it in outgoing
        • if you know a better way, please share
      • having both I can make a map clientId => client_authn_id
      • remove these entries via the "disconnected" event
    • authz in an extension incoming /meta/subscribe
      • using client_authn_id from the (clientId => client_authn_id) map I'm maintaining
    • client will need to be modified, per this issue, to not auto retry on /meta/handshake auth failure.

Do you see any problems with this?

There are several things you could improve to make the above easier:

  • maintain a per clientId store where extensions can keep whatever
    • e.g. you'd pass this store to incoming /meta/handshake, and even though the clientId hasn't been generated yet, I could save the client_authn_id in it and you'd save this store later, keyed by clientId
    • this would obviate the need for the next one
  • make possible the copying of extra fields between incoming and outgoing messages
    • this could be internal only. You set an "internal" prop in incoming and its available in outgoing and stripped before sending
  • the client changes per this issue

@jcoglan
Copy link
Collaborator

jcoglan commented Aug 9, 2023

I've been pondering this for a few days and trying out some examples and your last message covers many of the reasons I'm reluctant to introduce these changes. In short, it introduces a lot of new coupling and complexity and I think there are simpler and more ergonomic solutions to this problem.

It's important to note that the Faye client/server do not only talk to each other. They're implementations of the Bayeux protocol and people use the client with different server implementations and vice versa. Therefore the client can't build in assumptions about Faye-specific behaviour without breaking compatibility.

Extensions are designed with the same mentality: they act like a proxy layer that can only see messages going in and out and doesn't make any non-portable assumptions about the implementation. The only exception we make to this is that the error field carries through from the "request" message to the "response", because it facilitates access control without a lot of other custom interfaces being needed.

This surfaces the first problem: the client can't have special behaviour for authentication failures, because the Bayeux spec does not formalise those in any way. The job of the client is to maintain a consistent connection by sending /meta/handshake and /meta/connect as necessary. It's not aware of any specific failure type that should cause it to stop doing that. The standard way to tell a client to stop connecting is to send it a message with advice.reconnect = 'none'. This can be accomplished using a server-side extension:

server.addExtension({
  incoming (message, callback) {
    if (message.channel === '/meta/handshake') {
      if (message.ext?.password !== 'secret') {
        message.error = '801::invalid credential'
      }
    }
    callback(message)
  },

  outgoing (message, callback) {
    if (message.error?.startsWith('801::')) {
      message.advice = message.advice || {}
      message.advice.reconnect = 'none'
    }
    callback(message)
  }
})

That will stop a well behaved client from reconnecting, but clients that are not under your control are still free to open WebSocket connections to your server and send whatever messages they like.

Then there's the question of managing state. If you put the credentials for subscribing anywhere other than inside the /meta/subscribe message, then you need to use clientId (or some ext field) as a credential, and look it up in a session store of some kind. Faye is not going to introduce a session store of its own other than what's needed to run the messaging service, that is fully out of scope. If you want to do this, you will need to invent a store that works for your system. (One reason we're not going to get into solving this problem in general is there are many different storage technologies people may want to use for this, especially when running Faye as a cluster.)

It is also not advisable in general to use clientId as a credential in this way; the spec is unclear about its use. Faye will not forward one client's ID to other clients, because that would let you poll for someone else's subscriptions, but I would have to check what other implementations do.

The other element of state management you've identified is that server-side extensions do not pass state from incoming() to outgoing() hooks for the same message. This is partly for reasons of decoupling/layering mentioned above, but also because it doesn't make sense for all message types, especially published messages that will be incoming once but outgoing many times. It can also be true for /meta/* messages if they fail to send and need to be retried.

I think making the server pass all extra fields through from incoming to outgoing hooks adds too much complexity, and potential safety problems and deviations from the standard. We opted to only do this for the error field for usability reasons as mentioned above.

If you really must pass state through in this way, it is technically possible to use the request object, but I would consider this a hack and strongly advise against it.

server.addExtension({
  incoming (message, request, callback) {
    // auth logic
    request.__state__ = { sessionId: '...' }
    callback(message)
  },

  outgoing (message, request, callback) {
    message.ext = { sessionId: request.__state__.sessionId }
    callback(message)
  }
})

Faye tries to hide the network/HTTP layer from the application and the above interface is really only intended for accessing request headers, so I would consider this an abuse of existing functionality. There's no guaranteed one-to-one relationship between HTTP requests and Bayeux client sessions, so this caries some risk of going wrong in surprising ways, especially if request is a long-lived WebSocket object.

You also propose overloading the id field to represent client identity/auth state. I would strongly recommend against this as message IDs should not be considered private, and it runs the risk of causing a client's message IDs to become non-unique, which would break a lot of the client's logic around delivery and failure handling.

Then there's the issue of how to handle this failure on the client side. Applications don't call handshake() or connect() explicitly, they call subscribe(), and if you make a server side extension block a subscription attempt, rather than a handshake, this method throws an error:

try {
  await client.subscribe('/foo', () => {})
} catch (error) {
  // error = { code: 801, params: [ '' ], message: 'invalid credential' }
}

If the server blocks handshakes, this just silently fails, because clients generally assume that handshake/connect will keep retrying as needed to maintain the connection and will eventually succeed when the client is online. Detecting this failure requires writing a client-side extension to detect the receipt of advice.reconnect = 'none' and handling this nowhere near the specific interaction that failed, requiring some global state. It's much nicer at an application level to handle errors in the form of exceptions from the subscribe() and publish() methods.

We could arrange things so that if the client goes into a reconnect = 'none' state, then all pending subscriptions and publications resolve to an error, but this is complexity I'm again not keen on adding for a niche use case that's not the recommended/common way of doing things.

To sum up, I would recommend performing access control at the subscribe/publish level, as it will be less complex and more reliable. You can prevent clients retrying on certain kinds of failure by setting advice.reconnect = 'none' in a server-side extension.

@bughit
Copy link
Author

bughit commented Aug 9, 2023

I would recommend performing access control at the subscribe/publish level

I can't justify persistent anonymous and authn failed connections. They need to be stopped at /meta/handshake. Going to try to follow my plan, with your tips (request, advice), thanks.

Wanted to understand client behavior on /meta/handshake failure with advice.reconnect = 'none'. First thing I noticed that the subscribe promise doesn't settle, which should be considered a bug, a promise api should never hang. In this case the subscribe obviously did not succeed so it should reject.

Are there other scenarios where a /meta/handshake would be re-attempted at some future point, long after the initial successful subscription?

@bughit
Copy link
Author

bughit commented Aug 9, 2023

Are there other scenarios where a /meta/handshake would be re-attempted at some future point, long after the initial successful subscription?

I guess when the server disconnects the client when it doesn't hear from it. Then the client would do the full reconnection with the handshake? Haven't tested it yet.

What would be the expected client behavior in that case, on /meta/handshake failure with advice.reconnect = 'none'? Will the client stop completely? Or will the retry opt kick in?

@bughit
Copy link
Author

bughit commented Aug 14, 2023

What would be the expected client behavior in that case, on /meta/handshake failure with advice.reconnect = 'none'? Will the client stop completely? Or will the retry opt kick in?

The client does stop, albeit after one final attempt to reschedule:

      } else {
        this.info('Handshake unsuccessful');
        global.setTimeout(function() { self.handshake(callback, context) }, this._dispatcher.retry * 1000);
        this._state = this.UNCONNECTED;
      }

and a client extension can detect this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants