Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite agent websocket client #2560

Merged
merged 7 commits into from Jul 26, 2017
Merged

Conversation

SpComb
Copy link
Contributor

@SpComb SpComb commented Jul 5, 2017

Fixes #2500 by replacing the EventMachine-based faye-websocket client with the new kontena-websocket-client running as a Celluloid actor, which has full support for SSL certificate verification.

  • Drop eventmachine and faye-websocket dependencies

    Now with 100% less EventMachine.

  • Add kontena-websocket-client dependency

    Supports SSL certification verification.

  • Agent WebsocketClient is now a celluloid actor

    The new Kontena::Websocket::Client should implement open/ping/close timeouts with keepalive pings at least as robustly as the old EM-based client, probably even more robustly.

  • Fix agent RPC server/client websocket message sends to log a warning and abort the celluloid call on websocket send errors

Testing

Default behavior with KONTENA_URI=wss:// + invalid cert

Intended to be backwards-compatible, but show a warning with the validation error + cert details.

I, [2017-07-13T12:31:42.360771 #1]  INFO -- Kontena::WebsocketClient: connecting to master at wss://kontena.test:9293
W, [2017-07-13T13:48:19.964864 #1]  WARN -- Kontena::WebsocketClient: insecure connection established with SSL errors: certificate verify failed: self signed certificate: /CN=kontena.test (issuer /CN=kontena.test)

Default behavior with KONTENA_URI=wss:// + valid cert

Backwards compatible, but suggesting to configure KONTENA_SSL_VERIFY=true to protect from MITM attacks:

I, [2017-07-14T05:31:49.094554 #1]  INFO -- Kontena::WebsocketClient: connecting to master at wss://aws-master-1.kontena.io/
W, [2017-07-14T05:31:54.470200 #1]  WARN -- Kontena::WebsocketClient: secure connection established without KONTENA_SSL_VERIFY=true: /CN=aws-master-1.kontena.io (issuer /C=US/O=Let's Encrypt/CN=Let's Encrypt Authority X3)

Error case with KONTENA_SSL_VERIFY=true (not signed by default CAs)

I, [2017-07-13T12:30:35.566300 #1]  INFO -- Kontena::WebsocketClient: connecting to master at wss://kontena.test:9293
E, [2017-07-13T12:30:35.619952 #1] ERROR -- Kontena::WebsocketClient: unable to connect to SSL server with KONTENA_SSL_VERIFY=true: certificate verify failed: self signed certificate

Error case with verify + CA, but wrong host

I, [2017-07-13T12:34:41.603775 #1]  INFO -- Kontena::WebsocketClient: connecting to master at wss://192.168.66.1:9293
E, [2017-07-13T12:34:41.714384 #1] ERROR -- Kontena::WebsocketClient: unable to connect to SSL server with KONTENA_SSL_VERIFY=true: Server certificate did not match hostname 192.168.66.1: /CN=kontena.test

Error case with valid public cert, but invalid server (not a kontena master)

I, [2017-07-13T13:00:33.856768 #1]  INFO -- Kontena::WebsocketClient: connecting to master at wss://socket.kontena.io
E, [2017-07-13T13:00:39.522231 #1] ERROR -- Kontena::WebsocketClient: unexpected response from server, check url: Error during WebSocket handshake: Unexpected response code: 404

Error case with DNS errors

I, [2017-07-13T12:46:14.146281 #1]  INFO -- Kontena::WebsocketClient: connecting to master at ws://api.kontena.io
E, [2017-07-13T12:46:19.189549 #1] ERROR -- Kontena::WebsocketClient: unable to connect to server: getaddrinfo: Name does not resolve

Success behavior with valid custom cert (using SSL_CERT_FILE=/etc/kontena-agent/ca.pem)

I, [2017-07-13T12:32:50.385409 #1]  INFO -- Kontena::WebsocketClient: connecting to master at wss://kontena.test:9293
I, [2017-07-13T12:32:50.527173 #1]  INFO -- Kontena::WebsocketClient: secure connection established with SSL: /CN=kontena.test (issuer /CN=kontena.test)

Success with a public cert (using a hosted master)

I, [2017-07-13T13:15:31.962935 #1]  INFO -- Kontena::WebsocketClient: connecting to master at wss://XXX.platforms.us-east-1.kontena.cloud/
I, [2017-07-13T13:15:37.702015 #1]  INFO -- Kontena::WebsocketClient: secure connection established with SSL: /CN=*.platforms.us-east-1.kontena.cloud (issuer /C=US/O=Amazon/OU=Server CA 1B/CN=Amazon)
E, [2017-07-13T13:15:37.709481 #1] ERROR -- Kontena::WebsocketClient: master does not accept our token, shutting down ...

Success with a custom KONTENA_SSL_HOSTNAME=kontena.test

I, [2017-07-14T08:20:59.766136 #1]  INFO -- Kontena::WebsocketClient: connecting to master at wss://192.168.66.1:9293
I, [2017-07-14T08:20:59.855784 #1]  INFO -- Kontena::WebsocketClient: secure connection established with KONTENA_SSL_VERIFY: /CN=kontena.test (issuer /CN=kontena.test)

Error case with DNS MITM + invalid self-signed cert

I, [2017-07-13T13:18:16.266705 #1]  INFO -- Kontena::WebsocketClient: connecting to master at wss://XXX.platforms.us-east-1.kontena.cloud/
E, [2017-07-13T13:18:16.364581 #1] ERROR -- Kontena::WebsocketClient: unable to connect to SSL server with KONTENA_SSL_VERIFY=true: certificate verify failed: self signed certificate

Error case with DNS MITM + valid cert, but wrong subject

I, [2017-07-13T13:21:07.320320 #1]  INFO -- Kontena::WebsocketClient: connecting to master at wss://XXX.platforms.us-east-1.kontena.cloud/
E, [2017-07-13T13:21:07.600087 #1] ERROR -- Kontena::WebsocketClient: unable to connect to SSL server with KONTENA_SSL_VERIFY=true: Server certificate did not match hostname XXX.platforms.us-east-1.kontena.cloud: /OU=Domain Control Validated/OU=PositiveSSL Wildcard/CN=*.kontena.io


case msg
when String
@ws.send(msg)
Copy link
Contributor Author

@SpComb SpComb Jul 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: consider having a separate defer thread or something to send these... these calls can block the actor, but they are also not thread-safe, because each @ws.send -> multiple socket write calls may not be atomic with large messages?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defer does not make sense here because it messes order of message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean a single defer thread that each send call goes through, to preserve the order and atomicity of messages.

Alternatively, instead of assuming that the Websocket::Driver class is thread-safe for concurrent parse, text/binary, ping, close etc calls (which I doubt...), I think I'll just throw in a mutex in the Kontena::Websocket::Driver to serialize all of those calls.


case msg
when String
@ws.send(msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defer does not make sense here because it messes order of message.

# Valid after on :open
#
# @return [Websocket::Driver::Headers]
def heaers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo?

@SpComb
Copy link
Contributor Author

SpComb commented Jul 6, 2017

I think the Kontena::Websocket::Client API is now slowly taking a different kind of shape... it does away with the traditional on :open, :close, :error etc API, and instead uses a more synchronous API:

    def websocket_connect
      # connecting
      ws = Kontena::Websocket::Client.new(url, ...)
      ws.listen do |message|
        actor.on_message(message)
      end
      ws.run do
        actor.on_open
      end
      # server closed connection cleanly with code 1000
    rescue Kontena::Websocket::CloseError => error
      actor.on_close
    rescue Kontena::Websocket::Error => error
      actor.on_error
    ensure
      # disconnected
    end
  end

Whereby the ws.send etc methods are now threadsafe: the underlying Websocket::Driver object is hidden away and protected using a Mutex, so that things like ws.send can't race with other ws.send or ws.parse etc calls.

EDIT: the api has now been slightly changed:

    def websocket_connect
      # connecting
      @ws = Kontena::Websocket::Client.new(url, ...)
      @ws.on_message do |message|
        actor.on_message(message)
      end
      @ws.run do
        actor.on_open
      end
      actor.on_close(@ws.close_code, @ws.close_reason)
    rescue Kontena::Websocket::Error => exc
      actor.on_error(exc)
    ensure
      @ws = nil
    end
  end

@SpComb
Copy link
Contributor Author

SpComb commented Jul 6, 2017

Started writing some pseudo-e2e certs, where the Kontena::Websocket::Client connects to some random localhost port, testing simple error cases where the server returns:

  • TCP EOF
  • HTTP 404
  • SSL with an invalid cert

Still needs a bit more work to test the actual websocket messages :)

EDIT: done

  • SSL with a custom ca cert
  • SSL with the wrong hostname
  • websocket echo server

api_token: api_token
api_token: api_token,
ssl_verify: ENV['KONTENA_SSL_VERIFY'],
ssl_ca: ENV['KONTENA_SSL_CA'],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't necessarily need the KONTENA_SSL_CA, because openssl itself understands SSL_CERT_DIR= and SSL_CERT_FILE= when using the default cert store?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't need this then we probably should not add it (hard to remove later).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... somehow just replacing KONTENA_SSL_CA= with SSL_CERT_FILE= didn't work... keeping this for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: For the CLI SSL_CERT_FILE=, excon actually handles it: https://github.com/excon/excon/blob/v0.57.1/lib/excon/ssl_socket.rb#L34

I find the behavior of the libssl SSL_CERT_FILE= env in the kontena/agent container odd... it doesn't seem to be respected by the default cert store:

irb(main):015:0> ssl_cert = OpenSSL::X509::Certificate.new File.read(ENV['SSL_CERT_FILE'])
=> #<OpenSSL::X509::Certificate: subject=#<OpenSSL::X509::Name:0x0055eafa90aef8>, issuer=#<OpenSSL::X509::Name:0x0055eafa90af20>, serial=#<OpenSSL::BN:0x0055eafa90af48>, not_before=2017-07-06 08:07:56 UTC, not_after=2018-07-06 08:07:56 UTC>
irb(main):013:0> ssl_store = OpenSSL::X509::Store.new
=> #<OpenSSL::X509::Store:0x0055eafa9b6280 @verify_callback=nil, @error=nil, @error_string=nil, @chain=nil, @time=nil>
irb(main):014:0> ssl_store.set_default_paths
=> nil
irb(main):016:0> ssl_store.verify(ssl_cert)
=> false

Have to explicitly set the ENV['SSL_CERT_FILE'] as a CA path for it to work:

irb(main):017:0> ssl_store = OpenSSL::X509::Store.new
=> #<OpenSSL::X509::Store:0x0055eafa8ecf70 @verify_callback=nil, @error=nil, @error_string=nil, @chain=nil, @time=nil>
irb(main):018:0> ssl_store.add_file ENV['SSL_CERT_FILE']
=> #<OpenSSL::X509::Store:0x0055eafa8ecf70 @verify_callback=nil, @error=nil, @error_string=nil, @chain=nil, @time=nil>
irb(main):019:0> ssl_store.verify(ssl_cert)
=> true

Same thing with ssl_context.set_params vs ssl_context.set_params(ca_file: ENV['SSL_CERT_FILE']).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compare this to ruby 2.3.1 on Ubuntu 16.04:

irb(main):001:0> require 'openssl'
=> true
irb(main):002:0> ssl_cert = OpenSSL::X509::Certificate.new File.read(ENV['SSL_CERT_FILE'])
=> #<OpenSSL::X509::Certificate: subject=#<OpenSSL::X509::Name:0x000000026ad130>, issuer=#<OpenSSL::X509::Name:0x000000026ad158>, serial=#<OpenSSL::BN:0x000000026ad180>, not_before=2017-07-06 08:07:56 UTC, not_after=2018-07-06 08:07:56 UTC>
irb(main):003:0> ssl_store = OpenSSL::X509::Store.new
=> #<OpenSSL::X509::Store:0x000000026b49f8 @verify_callback=nil, @error=nil, @error_string=nil, @chain=nil, @time=nil>
irb(main):004:0> ssl_store.set_default_paths
=> nil
irb(main):005:0> ssl_store.verify(ssl_cert)
=> true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why the libssl envs are broken on the agent alpine image, but I implemented env handling in the websocket client class, similarly to what excon does.

@SpComb
Copy link
Contributor Author

SpComb commented Jul 7, 2017

Agent specs are now ✅, and the websocket client specs are now much more comprehensive than with the EM-based websocket.

The timeouts still need work.

@SpComb SpComb added this to the 1.4.0 milestone Jul 10, 2017
@SpComb SpComb force-pushed the feature/agent-websocket-client branch from 8da6b47 to a2a80dc Compare July 10, 2017 12:46
@SpComb SpComb changed the base branch from refactor/kontena-lib to master July 10, 2017 12:47
@SpComb
Copy link
Contributor Author

SpComb commented Jul 10, 2017

Untangled from #2559 by moving the Kontena::Websocket::Client directly to agent/lib/kontena/websocket

start_em
@client.ensure_connect
# XXX: does not re-start after crash
Celluloid::Actor[:websocket_client].async.start
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO

@SpComb
Copy link
Contributor Author

SpComb commented Jul 11, 2017

Timeouts are now implemented, including automatic keepalive pings:

D, [2017-07-11T14:07:45.690019 #1] DEBUG -- Kontena::RpcClient: waited 0.1s of 30.0s until: request /containers/health has response wth id=817598346 yielded Array
D, [2017-07-11T14:07:46.390248 #1] DEBUG -- Kontena::Websocket::Client::Connection: write #buf=338: size=338
D, [2017-07-11T14:07:46.507903 #1] DEBUG -- Kontena::Websocket::Client::Connection: write #buf=296: size=296
D, [2017-07-11T14:07:46.516422 #1] DEBUG -- Kontena::Websocket::Client::Connection: read size=4096: #buf=11
D, [2017-07-11T14:07:46.517303 #1] DEBUG -- Kontena::Websocket::Client::Connection: wait read: timeout=20.667216587
D, [2017-07-11T14:07:46.521203 #1] DEBUG -- Kontena::RpcClient: waited 0.0s of 30.0s until: request /containers/cleanup has response wth id=1871178194 yielded Array
D, [2017-07-11T14:07:52.117173 #1] DEBUG -- Kontena::Websocket::Client::Connection: read size=4096: #buf=2
D, [2017-07-11T14:07:52.118438 #1] DEBUG -- Kontena::Websocket::Client::Connection: write #buf=6: size=6
D, [2017-07-11T14:07:52.119013 #1] DEBUG -- Kontena::Websocket::Client::Connection: wait read: timeout=15.065585636
D, [2017-07-11T14:07:55.089342 #1] DEBUG -- Kontena::Websocket::Client::Connection: write #buf=263: size=263
D, [2017-07-11T14:07:58.158210 #1] DEBUG -- Kontena::Websocket::Client::Connection: write #buf=338: size=338
D, [2017-07-11T14:08:06.082537 #1] DEBUG -- Kontena::Workers::NodeInfoWorker: publishing node information
D, [2017-07-11T14:08:06.344516 #1] DEBUG -- Kontena::Websocket::Client::Connection: write #buf=788: size=788
D, [2017-07-11T14:08:07.200609 #1] DEBUG -- Kontena::Websocket::Client: pinging with id=2
D, [2017-07-11T14:08:07.205136 #1] DEBUG -- Kontena::Websocket::Client::Connection: write #buf=7: size=7
D, [2017-07-11T14:08:07.207211 #1] DEBUG -- Kontena::Websocket::Client::Connection: read size=4096: #buf=3
D, [2017-07-11T14:08:07.207757 #1] DEBUG -- Kontena::Websocket::Client: pong with id=2
D, [2017-07-11T14:08:07.208078 #1] DEBUG -- Kontena::Websocket::Client: ping-pong with id=2 in 0.002140989s
D, [2017-07-11T14:08:07.209029 #1] DEBUG -- Kontena::WebsocketClient: server ping 0.00s of 5.00s timeout
D, [2017-07-11T14:08:07.209723 #1] DEBUG -- Kontena::Websocket::Client::Connection: wait read: timeout=29.996302699
D, [2017-07-11T14:08:07.366557 #1] DEBUG -- Kontena::Workers::StatsWorker: starting collection
D, [2017-07-11T14:08:07.447259 #1] DEBUG -- Kontena::Websocket::Client::Connection: write #buf=1223: size=1223
D, [2017-07-11T14:08:07.449855 #1] DEBUG -- Kontena::Websocket::Client::Connection: write #buf=1192: size=1192
D, [2017-07-11T14:08:07.450948 #1] DEBUG -- Kontena::Websocket::Client::Connection: write #buf=1206: size=1206
D, [2017-07-11T14:10:19.489928 #1] DEBUG -- Kontena::Websocket::Client: pinging with id=2
D, [2017-07-11T14:10:19.495811 #1] DEBUG -- Kontena::Websocket::Client::Connection: write #buf=7: size=7
D, [2017-07-11T14:10:19.497385 #1] DEBUG -- Kontena::Websocket::Client::Connection: wait read: timeout=4.999922161
D, [2017-07-11T14:10:19.889806 #1] DEBUG -- Kontena::Websocket::Client::Connection: write #buf=94: size=94
D, [2017-07-11T14:10:20.392642 #1] DEBUG -- Kontena::RpcClient: waiting 0.5s of 30.0s until: request /node_volumes/list has response wth id=1013724617
D, [2017-07-11T14:10:20.777132 #1] DEBUG -- Kontena::Websocket::Client::Connection: write #buf=99: size=99
D, [2017-07-11T14:10:20.896924 #1] DEBUG -- Kontena::RpcClient: waiting 1.0s of 30.0s until: request /node_volumes/list has response wth id=1013724617
D, [2017-07-11T14:10:21.287443 #1] DEBUG -- Kontena::RpcClient: waiting 0.5s of 30.0s until: request /node_service_pods/list has response wth id=128613361
D, [2017-07-11T14:10:21.302875 #1] DEBUG -- Kontena::Websocket::Client::Connection: write #buf=1645: size=1645
D, [2017-07-11T14:10:21.779819 #1] DEBUG -- Kontena::RpcClient: waiting 1.0s of 30.0s until: request /node_service_pods/list has response wth id=128613361
D, [2017-07-11T14:10:21.813062 #1] DEBUG -- Kontena::RpcClient: waiting 0.5s of 30.0s until: request /nodes/update has response wth id=932864748
D, [2017-07-11T14:10:21.895282 #1] DEBUG -- Kontena::RpcClient: waiting 2.0s of 30.0s until: request /node_volumes/list has response wth id=1013724617
D, [2017-07-11T14:10:22.307906 #1] DEBUG -- Kontena::RpcClient: waiting 1.0s of 30.0s until: request /nodes/update has response wth id=932864748
D, [2017-07-11T14:10:22.785409 #1] DEBUG -- Kontena::RpcClient: waiting 2.0s of 30.0s until: request /node_service_pods/list has response wth id=128613361
D, [2017-07-11T14:10:23.310446 #1] DEBUG -- Kontena::RpcClient: waiting 2.0s of 30.0s until: request /nodes/update has response wth id=932864748
D, [2017-07-11T14:10:23.633105 #1] DEBUG -- Kontena::Websocket::Client::Connection: write #buf=145: size=145
D, [2017-07-11T14:10:23.648941 #1] DEBUG -- Kontena::Websocket::Client::Connection: write #buf=3943: size=3943
D, [2017-07-11T14:10:23.897370 #1] DEBUG -- Kontena::RpcClient: waiting 4.0s of 30.0s until: request /node_volumes/list has response wth id=1013724617
D, [2017-07-11T14:10:24.137698 #1] DEBUG -- Kontena::RpcClient: waiting 0.5s of 30.0s until: request /containers/event has response wth id=1706983049
D, [2017-07-11T14:10:24.158743 #1] DEBUG -- Kontena::RpcClient: waiting 0.5s of 30.0s until: request /containers/save has response wth id=868539399
D, [2017-07-11T14:10:24.498258 #1] DEBUG -- Kontena::Websocket::Client: disconnect
E, [2017-07-11T14:10:24.506895 #1] ERROR -- Kontena::WebsocketClient: websocket error: read timeout after 4.999922161s while waiting 5.0s for pong

@SpComb SpComb changed the title [WiP] Agent websocket client Agent websocket client Jul 11, 2017
@SpComb SpComb changed the title Agent websocket client Rewrite agent websocket client Jul 11, 2017
@connection = nil

# TODO: errors and timeout? SSLSocket.close in particular is bidirectional?
@socket.close if @socket
Copy link
Contributor Author

@SpComb SpComb Jul 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it's still possible for this to hang without a timeout, particularly looking at OpenSSL::SSL::SSLSocket#close -> ossl_ssl_shutdown, which makes multiple calls to SSL_shutdown, which waits for the server to reply to the close alert... not sure now to deal with that yet?

A plain TCPSocket#close could also block, depending on SO_LINGER, but this is murky stuff...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick testing with ip route add blackhole shows that this close does not block with an SSL connection:

D, [2017-07-11T14:44:46.857275 #1] DEBUG -- Kontena::Websocket::Client: ping on read timeout after 29.995182353s
D, [2017-07-11T14:44:46.857926 #1] DEBUG -- Kontena::Websocket::Client: pinging with id=20
D, [2017-07-11T14:44:46.859518 #1] DEBUG -- Kontena::Websocket::Client::Connection: write #buf=8: size=8
D, [2017-07-11T14:44:46.860144 #1] DEBUG -- Kontena::Websocket::Client::Connection: wait read: timeout=4.99993131
D, [2017-07-11T14:44:47.169036 #1] DEBUG -- Kontena::RpcClient: waiting 16.0s of 30.0s until: request /nodes/update has response wth id=1343584851
D, [2017-07-11T14:44:48.881232 #1] DEBUG -- Kontena::RpcClient: waiting 16.0s of 30.0s until: request /node_service_pods/list has response wth id=857451443
D, [2017-07-11T14:44:51.865077 #1] DEBUG -- Kontena::Websocket::Client: disconnect
E, [2017-07-11T14:44:51.865846 #1] ERROR -- Kontena::WebsocketClient: websocket error: read timeout after 4.99993131s while waiting 5.0s for pong
I, [2017-07-11T14:44:51.906012 #1]  INFO -- Kontena::WebsocketClient: connecting to master at wss://kontena.test:9293

Which is a little surprising, I might have expected it to...

@socket.write_nonblock(buf)
end
debug "write #buf=#{buf.size}: size=#{size}"
buf = buf[size..-1]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See reidmorrison/net_tcp_client#10 for why this is important.

api_token: api_token
api_token: api_token,
ssl_verify: ENV['KONTENA_SSL_VERIFY'],
ssl_ca: ENV['KONTENA_SSL_CA'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't need this then we probably should not add it (hard to remove later).

@@ -58,7 +59,7 @@ def handle_request(ws_client, message)
# @param [WebsocketClient] ws_client
# @param [Array, Hash] msg
def send_message(ws_client, msg)
ws_client.send_message(MessagePack.dump(msg).bytes)
ws_client.async.send_message(MessagePack.dump(msg).bytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why async ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ws_client.send_message can block or raise errors... no point calling it sync, because the rpc server can't really do anything about that. They will get logged as warnings by the websocket client.

@SpComb
Copy link
Contributor Author

SpComb commented Jul 13, 2017

TODO: agent probably also needs some KONTENA_SSL_CN=test to verify and accept the generic CN=test cert used by plugin-provisioned masters

ssl_verify_context = OpenSSL::X509::StoreContext.new(ssl_cert_store)
ssl_verify_context.error = verify_result

Kontena::Websocket::SSLVerifyError.new(ssl_verify_context.error, "certificate verify failed: #{ssl_verify_context.error_string}")
Copy link
Contributor Author

@SpComb SpComb Jul 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a hack with the OpenSSL::X509::StoreContext#error=, but it was the only way I could figure out how to call the libssl X509_verify_cert_error_string on the OpenSSL::SSL::SSLSocket#verify_result and get nice verify error messages likecertificate verify failed: self signed certificate.

end

def handle_invalid_token
error 'master does not accept our token, shutting down ...'
EM.next_tick { abort('Shutting down ...') }
Kernel.abort('Shutting down ...')
Copy link
Contributor Author

@SpComb SpComb Jul 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: This doesn't work in a celluloid actor:

E, [2017-07-13T13:08:35.672300 #1] ERROR -- Kontena::WebsocketClient: master does not accept our token, shutting down ...
Shutting down ...
E, [2017-07-13T13:08:35.672802 #1] ERROR -- : Actor crashed!
SystemExit: Shutting down ...
	/app/lib/kontena/websocket_client.rb:261:in `abort'
	/app/lib/kontena/websocket_client.rb:261:in `handle_invalid_token'
	/app/lib/kontena/websocket_client.rb:247:in `on_close'
	/app/lib/kontena/websocket_client.rb:133:in `run_websocket'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `public_send'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `dispatch'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/call/async.rb:7:in `dispatch'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:50:in `block in dispatch'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:76:in `block in task'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/actor.rb:339:in `block in task'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/task.rb:44:in `block in initialize'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/task/fibered.rb:14:in `block in create'
E, [2017-07-13T13:08:35.679187 #1] ERROR -- : thread crashed
SystemExit: Shutting down ...
	/app/lib/kontena/websocket_client.rb:261:in `abort'
	/app/lib/kontena/websocket_client.rb:261:in `handle_invalid_token'
	/app/lib/kontena/websocket_client.rb:247:in `on_close'
	/app/lib/kontena/websocket_client.rb:133:in `run_websocket'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `public_send'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/calls.rb:28:in `dispatch'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/call/async.rb:7:in `dispatch'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:50:in `block in dispatch'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/cell.rb:76:in `block in task'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/actor.rb:339:in `block in task'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/task.rb:44:in `block in initialize'
	/usr/lib/ruby/gems/2.3.0/gems/celluloid-0.17.3/lib/celluloid/task/fibered.rb:14:in `block in create'
I, [2017-07-13T13:08:35.684933 #1]  INFO -- Kontena::WebsocketClient: initialized with token tBsoc...
I, [2017-07-13T13:08:36.687613 #1]  INFO -- Kontena::WebsocketClient: connecting to master at wss://...

# SSL_connect returned=1 errno=0 state=error: certificate verify failed
if exc.message.end_with? 'certificate verify failed'
# ssl_socket.peer_cert is not set on errors :(
raise ssl_verify_error(ssl_socket.verify_result)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lack of ssl_socket.peer_cert on verify errors means that the resulting agent websocket client errors with KONTENA_SSL_VERIFY=true are less informative:

E, [2017-07-13T14:09:22.617530 #1] ERROR -- Kontena::WebsocketClient: unable to connect to SSL server with KONTENA_SSL_VERIFY=true: certificate verify failed: self signed certificate

Compared to the KONTENA_SSL_VERIFY= warning, when using ssl_cert! -> ssl_verify_cert!:

W, [2017-07-13T14:10:03.892937 #1]  WARN -- Kontena::WebsocketClient: insecure connection established with SSL errors: certificate verify failed: self signed certificate: /CN=kontena.test (issuer /CN=kontena.test)

It might be possible to include the cert information in these verify errors by replacing the built-in verify_mode: OpenSSL::SSL::VERIFY_PEER with the equivalent (?) verify checks that already happen in ssl_verify_cert!, but I'm not 100% sure yet that those are the same?

end
elsif ssl_cert
if !ssl_verify
warn "secure connection established without KONTENA_SSL_VERIFY=true: #{ssl_cert.subject} (issuer #{ssl_cert.issuer})"
Copy link
Contributor Author

@SpComb SpComb Jul 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning because while this specific connection is secure, the configuration is still vulnerable to MITM attacks, because the agent will still accept an invalid cert unless you update the configuration to use KONTENA_SSL_VERIFY=true.

def initialize(opts)
@@instance = self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a singleton to me?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean include Singleton?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Means that it's now configure(opts), because it doesn't look like a Singleton can be initialized with parameters?

@close_timeout = close_timeout
@write_timeout = write_timeout

unless @uri.scheme == 'ws' || @uri.scheme == 'wss'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might break old installations that have http(s) scheme?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent bin/kontena-agent normalizes the URI:

if api_uri.match(/^http.*/)
  api_uri = api_uri.sub('http', 'ws')
end

end
end

#protected XXX: called by specs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wat?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... the specs call the protected methods. And the send method breaks the first workaround that came to mind :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed the send(message) method to send_message, and restored the protected methods... the specs use subject.send :method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the send_message vs protected send specs, this is now back to #protected XXX which is TODO on refactoring those protected methods called by specs out to separate classes... most of that is the TCP/SSL connection stuff, which is entirely independent of the websocket client.

end

# TODO: connect_deadline to impose a single deadline on the entire process
# XXX: specs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main remaining TODO is that ssl_connect can take longer than the connect_timeout, because the same timeout is used for each read/write step of the SSL handshake... needs to be fixed + spec'd to use a combined deadline.

@SpComb
Copy link
Contributor Author

SpComb commented Jul 18, 2017

API design review:

High level

def websocket_client
  Kontena::Websocket::Client.connect(url, **options) do |client|
    # websocket handshake complete

    # other threads can start writing messages
    Thread.new { client.send('foo') }

    # this blocks until the websocket is closed
    client.read do |msg|
      if msg == 'go away'
        client.close
      end
    end
  end
rescue Kontena::Websocket::CloseError => error
  # server closed connection
rescue Kontena::Websocket::ConnectError => error
  # failed to open
rescue Kontena::Websocket::Error => error
  # ...
else
  # client closed connection, and server completed close handshake with close frame
end

Low level

ws_client = Kontena::Websocket::Client.new(url, **options)

Thread.new {
   sleep 1 until ws_client.open?
   ws_client.send('foo') # raises if not connected
   ws_client.close
}

ws_client.on_pong do |time, delay|

end

def connect_websocket(ws_client)
  ws_client.connect # blocks until open, raises Kontena::Websocket::ConnectError

  # alternatively: ws_client.read do |msg|
  # returns nil (EOF) once client-initiated close handshake complete (server responded with close)
  while msg = ws_client.read

  end
rescue Kontena::Websocket::Error
  # can raise from connect or read
ensure
  ws_client.disconnect
end

jakolehm
jakolehm previously approved these changes Jul 19, 2017
Copy link
Contributor

@jakolehm jakolehm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

debug "ping-pong at #{ping_at} in #{ping_delay}s"

# XXX: defer call without mutex?
@on_pong.call(ping_delay) # TODO: also pass ping_at
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the @queue blocks with just @message_queue, so now the driver ping callback -> client on_pong callback gets called directly, with the driver mutex held... that means that the following fails with a ThreadError: deadlock; recursive locking ATM:

client.on_pong do
  client.close
end

Need to either accept that, or restore some kind of deferred-pong-callback-execution thing.

@SpComb SpComb force-pushed the feature/agent-websocket-client branch from 99b8ed6 to 3d0a2bd Compare July 19, 2017 13:36
@SpComb
Copy link
Contributor Author

SpComb commented Jul 19, 2017

Split the Kontena::Websocket::Client implementation into a new kontena-websocket-client gem, and rewrote this branch to strip out the history.

This branch installs it directly from github while the kontena-websocket-client gem is WIP; I'll fix it to be a normal versioned install once it's released.

@SpComb SpComb dismissed jakolehm’s stale review July 19, 2017 13:39

Needs re-approval once the final kontena-websocket-client is released

@@ -7,7 +7,7 @@ RUN apk update && apk --update add tzdata ruby ruby-irb ruby-bigdecimal \
ADD Gemfile /app/
ADD Gemfile.lock /app/

RUN apk --update add --virtual build-dependencies ruby-dev build-base openssl-dev && \
RUN apk --update add --virtual build-dependencies ruby-dev build-base openssl-dev git && \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git is required for bundle install when testing changes to kontena-websocket-client using gems from git.

@SpComb SpComb requested a review from jakolehm July 24, 2017 08:34
@SpComb
Copy link
Contributor Author

SpComb commented Jul 26, 2017

Quick testing to verify that the examples cases still work with the latest code.

@SpComb SpComb merged commit 9619fa6 into master Jul 26, 2017
@SpComb SpComb deleted the feature/agent-websocket-client branch July 26, 2017 12:16
jnummelin added a commit to kontena/docs that referenced this pull request Aug 31, 2017
jakolehm pushed a commit to kontena/docs that referenced this pull request Oct 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Agent does not validate KONTENA_URI=wss:// SSL certs
2 participants