Known Issues

There are a few things that are known as either concerns or problems that need to be addressed:

There is no monitoring on whether or not Burrow is keeping up with the __consumer_offsets topic.
If there is a network interruption between Burrow and the Kafka cluster, it is possible for the internal queues to get overwhelmed. This would lead to an excessive use of memory which snowballs until the application is unuseable. For this reason, the internal queue is buffered at 10,000 requests. If the queue is full, incoming offsets (both broker and consumer) are dropped until the queue is cleared.
The metric for the 99.9% time for OffsetRequests on the Kafka cluster may go up because of the size of the requests that are performed to retrieve the broker offsets. This is ultimately not much of a problem, as it only shows that the requests that Burrow is performing are the largest OffsetRequests. This type of request is not common in normal operation of the cluster, only needed when consumers need to get an offset to start at.
A configurable threshold for the percentage of lag increase during a measurement window was considered, and rejected. This would have hidden warnings for partitions that had a small number of messages of increased lag over the window as long as the consumer offsets are moving forwards. However, there is no ability to determine a continual increase that stays under the threshold. If you want to hide small increases, do it in whatever system you are using to process the group status.

Provide feedback