-
Notifications
You must be signed in to change notification settings - Fork 555
Ideas
The goal here is to have a central page where we can collect ideas that would be interesting to explore for Zeebe. Essentially a collection of topics that are relevant to Zeebe and its development - ranging from feature ideas, development process proposals, tooling changes, etc.
For each idea, there is a list of people interested in exploring it - this does not mean they have to work on it, but simply a way to show interest. That way, for example, if you want to start a proof of concept about one of these ideas, you can contact that person as they may already have some preliminary ideas to share with you.
- Distributed Tracing
- Structured Logging
- MicroMeter
- Loki
- Garden
- Telepresence
- Data Integrity Checks
- Quarkus
- Micronaut
- Different VM's and GC's
- Chaostoolkit extension
- RocksDB as Log
- Google Cloud Profiler
Interested in: @npepinpe
Distributed applications are notoriously more difficult to observe than their more centralized counterparts - tracing the lifetime of an entity (e.g. request) across 1, 5, or even 100 hops, increases the burden on the poor developer when it comes time to diagnose an issue - what caused this particular request to fail? to be slower? to be faster? to go through this service instead of that one?
By capturing context at various boundaries, we can propagate it across those same boundaries, allowing us to effectively capture the causal chain of events that make up the life of a single request through our distributed application.
While the main use case is to help the developers to diagnose issues (especially related to outliers), we can leverage the same tools for distributed profiling, inter-service dependency analysis, capacity planning, etc.
Proof of concept: https://github.com/zeebe-io/zeebe/pull/3341
OpenCensus, having originated at Google, has much tighter integration with Stackdriver (and many Google projects, e.g. gRPC), which is a major plus when it comes to our development workflow. As the next official standard, OpenTelemetry, will provide bridge adapters for both OpenCensus and OpenTracing, it's definitely a strong point for us picking OpenCensus (i.e. no need for us to operate an external tracer, e.g. Jaeger).
One of the common complaints about OpenTracing is usually the poor UI/UX of its implementations (at least its FOSS implementations). Here's where Stackdriver can help us with its Stackdriver Trace offering.
- Monitoring in the Time of Cloud Native
- Distributed Tracing Microservices
- Distributed Tracing: we've been doing it wrong
- OpenCensus
- OpenTracing
- OpenTelemetry
Interested in: @npepinpe
Interested in: @npepinpe
Interested in: @npepinpe
Interested in: @npepinpe
Interested in: @npepinpe
Zeebe currently only performs very basic integrity checks when starting up - it will scan the log and compare the checksum of each entry to the real checksum. This detects data corruption, but does not detect bugs such as inconsistent log ordering, for example. It would be interesting to add more integrity checks - ideally without slowing down the runtime cost (start up cost is probably okay-ish to increase).
Interested in: @npepinpe , @zelldon
Interested in: @npepinpe , @zelldon
- High performance Akka: take away is that it's possible to use Aeron as transport for Akka 🔥
Interested in: @zelldon
Quarkus tailors your application for GraalVM and HotSpot. Amazingly fast boot time, incredibly low RSS memory > (not just heap size!) offering near instant scale up and high density memory utilization in container orchestration platforms like Kubernetes. We use a technique we call compile time boot
A modern, JVM-based, full-stack framework for building modular, easily testable microservice and serverless applications.
Interested in: @zelldon, @npepinpe
Would like to test out different VM's like Graal and GC's like Zulu etc.
Interested in: @zelldon
Build a extension/plugin to chaos test Zeebe easier.
Interested in: @zelldon
Interested in: @npepinpe
The motivation here is that many existing Raft implementation delegate management of the persistent log storage to RocksDB or LMDB. I have a hunch that they will both perform much better than our current journal, and will most likely be more stable, so I would like to test them out.
Interested in: @npepinpe
It seems it's possible to profile distributed applications (to what extent?) using Google Cloud Profiler. It would be interesting to get more knowledge in this area.