-
Notifications
You must be signed in to change notification settings - Fork 168
how to run a cluster in containers #86
Comments
Reading the logs, if an edit: Maybe you can paste the complete set of flags you use to start the processes? |
Hi, You can see the startup commands in the docker-compose file here: https://github.com/timwebster9/rancher-catalog/blob/master/templates/oklog/0/docker-compose.yml e.g. I'm not sure if an OK Log process can resolve those hostnames, but Is there a command I can try with oklog to test out what you are asking for? BTW I have this running fine as a single node, but we typically run multiple hosts per environment so would be nice to have the cluster as an option in the future (i.e. not a deal-breaker or anything) |
OK, so the first error you get is telling you that I built an oklog:dev container
and then I made this stripped-down docker-compose.yaml version: "2"
services:
oklog-1:
image: oklog:dev
command: '/oklog ingeststore -debug -store.segment-replication-factor 3 -cluster oklog-1 -peer oklog-1 -peer oklog-2 -peer oklog-3'
oklog-2:
image: oklog:dev
command: '/oklog ingeststore -debug -store.segment-replication-factor 3 -cluster oklog-2 -peer oklog-1 -peer oklog-2 -peer oklog-3'
oklog-3:
image: oklog:dev
command: '/oklog ingeststore -debug -store.segment-replication-factor 3 -cluster oklog-3 -peer oklog-1 -peer oklog-2 -peer oklog-3' which reproduced the warning messages (bad), but also successfully formed a cluster (good). Can you try that ^^ and see if you get the same behavior? I've also filed #88 to improve the logging situation, which I'll merge directly. Once that's in, you can try that rev instead, if you like. |
Hi ok yeah that works now - although did you make a functional change or was the logging just misleading? Sort of related - but there isn't much information to go on in the docs/help about setting up a cluster. For example, do I really want |
Next I tried to run in in Rancher (I think this would be similar to a k8s setup), with just one container 'definition' but telling it to replicate on every host. I followed your advice on the readme, and only used 1 peer flag (because I won't know the hostnames or IPs of the other hosts). Both the TCP and UI endpoints are also now behind a load-balancer (HAProxy):
It seemed to start up ok (that is, the cluster formed and there were no horrible errors in the logs), and logspout seemed to be fowarding logs. However I wasn't seeing any in the UI. Looking in the Then I looked at the OK Log container logs and one of them had this in it (see below).
So I don't know if this has anything to do with it or not, but the way the Rancher internal DNS works is that in this case the hostname
|
No functional change, just logging (so far).
The replication factor is the number of times each log record will be duplicated within the cluster. (Note that records are always deduplicated when returned by a query.) It's a way of accommodating things like failed nodes, broken disks, etc. at a cost of increased disk usage. If you have a 3-node cluster, then a replication factor of 3 means each log record will be duplicated on each node. In my opinion it's a little excessive. A replication factor of 2 should be OK.
This probably isn't going to work, because...
No :) Each OK Log node needs to be uniquely addressable. In your Rancher example, you'll need to give each node a unique hostname or IP, like what happens in the docker-compose example. And then you either need to bind the OK Log cluster listener to that hostname/IP via
Those logs tell us that each of the nodes is advertising itself as reachable on the address Your next step to getting this working in Rancher is to give each node a unique hostname, like oklog-{1..N}, and changing the flags to use that unique hostname as the |
Hi thanks for your reply - yeah it would work fine with the explicitly defined services like in the docker-compose file... I guess to make this work with things like autoscaling (or manual scaling of the 'click a button type') we would have to get into 'service discovery' territory? I haven't taken the time to fully understand how the clustering works, but with the 'gossip' protocol you are using I wouldn't think it would be a big stretch to make this work. For example, would a node be able to automatically determine its own IP address and advertise that, without it having to be explicitly configured somehow? As long as there was one node that was resolvable by an explicit IP/hostname (say, a 'bootstrapping node' that only needs to be available/explicitly resolvable when the cluster is started), then the others could be scaled up with that information? I don't want to try and shoehorn my requirement into your project - but just rather thinking out loud how this could work in an autoscaling fashion in any orchestration environment. Autoscaling services kind of go hand-in-hand with container orchestration, although I realise that most of these types of services are usually the stateless type. But with OK Log I feel it is on the right track with replication and the gossip protocol you have in place... |
As a general handwavey statement, it's not really possible to autoscale stateful clustered applications like OK Log — or, perhaps better said, it's possible to do, but very easy to do really badly, and end up worse off than if you hadn't tried in the first place. Service discovery is an important part of that, but so, too, is figuring out how to manage persistence volumes, spread ingest load, etc. etc.
In general it's not possible for a process to determine its own IP address. There are too many factors at play: multiple network interfaces on a box or virtual machine, layers of NAT (especially acute in our post-Docker containerized world!!), firewalls, etc. etc. The safest, easiest, and most reliable way of doing this is to declare which interface(s) the process should bind to, and (when necessary) declare which IP address the process should advertise itself as being reachable via. These correspond to the -cluster and -cluster.advertise-addr flags respectively.
Understood. I've thought long and hard about these things :) and have been on the frontlines of deploying and operating clustered distributed systems for a long time. OK Log represents my current best effort guess at how this should work. With that said, there may be interesting opportunities for making OK Log more elastic, if we're willing to sacrifice some performance. I'll think on this a bit. — I am quite curious to hear if you manage to get it going, though. Please let me know if you hit any other roadblocks, I'll be happy to help however I can. |
Hi - great project :-)
I'm trying to get an OK Log cluster running as containers in a Rancher environment. I see all the command-line options for starting a cluster, but I think I'm running into similar issues as #51. It was closed so I opened a new one.
After reading through that issue, I'm not sure what to put for the
-cluster
option, or if I should be using-cluster.advertise-addr
somehow. Ideally I wouldn't have to use hard-coded IP addresses or FQDNs - with Rancher each node is accessible by its service name (basically the name of the service in the docker-compose.yml file). I can ping these no problem from inside the running containers. Maybe I'm going about it the wrong way?Here's the output of one of the nodes:
The text was updated successfully, but these errors were encountered: