Skip to content

Getting Started With JCTools

Nitsan Wakart edited this page Jan 13, 2022 · 2 revisions

Welcome to the JCTools! In all probability you have been using JCTools for a while without knowing it, but we are delighted to have a more direct relationship with your code. Having said that, JCTools serves a fairly narrow set of requirements. This page should help you understand what this library is about, when to use it, and what to expect. Let's start!

Give a X a program, and X is frustrated for a day. Teach X how to program and X will be frustrated for a lifetime.

Overview

JCTools offers complimentary concurrent data structures for the Java ecosystem which are not covered by the JDK:

  • Specialized Queues -> if you have a specific thread model, allocation requirements or a need to go lock-less, we might be able to help!
  • NonBlockingHashMap and friends -> NBHM was developed by Dr. Cliff Click to help Java users running on multi-core machines with lots of CPUs (e.g. ~700 core Vega platform). The code was picked up by many users, but did not have a permanent address until it was contributed to JCTools. It has since enjoyed several bug fixes and enhancements, so it's not just rotting here :-). NBHM offers a strictly non-blocking alternative to the JDK ConcurrentHashMap as well as being an open-addressing map implementation with lower allocation overheads. We also offer a couple of specialisations for primitive long keys, int sets, and an Identity variant of the map.
  • Concurrent counters -> the JDK offers us AtomicLong for concurrent counters/sequences and LongAdder for striped counters where contention is an issue. JCTools offers a fixed size striped counter that serves a similar purpose to LongAdder. This can be useful when running on JDK7 (no LongAdder), or when a fixed striping approach is more suitable (e.g. upfront allocation of cells is desirable).

If one of those sounds like what you are after, dive into the appropriate section below.

We Have The Qs

JCTools offers a variety of concurrent(i.e. can be used safely from multiple threads) java.util.Queues. JCTools queues do not implement java.util.concurrent.BlockingQueue. The JDK offers the following concurrent queues which can be used in this fashion:

  • ConcurrentLinkedQueue
  • LinkedTransferQueue
  • ArrayBlockingQueue
  • LinkedBlockingQueue
  • LinkedBlockingDeque

If you are using one of the above queues for inter-thread async message passing, JCTools can probably help. Here's a quick process for choosing the right Q for U:

  1. Do you have strictly:
  • Single threaded access to Queue::offer (single producer) -> Choose one of the SP(Single-Producer) queues (e.g. SpmcArrayQueue).
  • Single threaded access to Queue::poll (single consumer) -> Choose one of the SC(Single-Consumer) queues (e.g. MpscArrayQueue).
  • BOTH! -> Choose one of the SPSC(Single-Producer+Single-Consumer) queues (e.g. SpscArrayQueue).
  • Neither -> Choose on of the MPMC(Multi-Producer+Multi-Consumer) queues (e.g. MpmcArrayQueue).
  1. Is your queue bounded or unbounded?
  • Bounded -> there's a definite known size beyond which some mitigation is required (e.g. back pressure). Pick one of the array queues, but no an Unbounded one.
  • Unbounded -> I live in a fantasy where I have infinite memory ;-). Typically the reason why we choose an unbounded queue is because we are not sure how big is big enough, or we are worried about pre-allocating too much space for a queue that is normally very small. JCTools has unbounded queues, but we discourage you from using them and offer good alternatives to handle the footprint issue.
  1. Queue usage allocates?
  • Queue usage should not cause allocation -> This strictly implies the queue is bounded and has a fixed memory footprint. Use only the ?p?cArrayQueue queues.
  • Queue usage can cause limited allocation -> This implies allocation can happen, but it is bounded, and therefore so is the queue. We offer MPSC and SPSC versions of a GrowableArrayQueue queues which start small and grow up to a designated size. Once the queue size has stabilised(or reach its designated max) no further allocation happens. Once a queue grows it does not shrink though. This means that if a queue is filled and then emptied the unused backing storage cannot be reclaimed by the JVM.
  • Queue usage can cause allocation -> We can choose a LinkedQueue or a Chunked/Growable/Unbounded variant. These queues will allocate either per element in the queue or once an array-node in the queue has been filled.

Example: Selector operations queue

The Java network classes, and in particular the Selector work best when used from a single thread. To have many threads use the same Selector we might have a single thread own the Selector(Single-Consumer) with other threads(Multi-Producer) forwarding required work to that thread. We can choose an MpscArrayQueue. This requires us to settle on the maximum number of requests we allow in the selector queue before mitigation must happen (e.g. blocking the sending threads, or dropping messages...).

What if the maximum number of requests is high? This would require us to commit a large backing array. If we set the number at 1 Million the backing array will have a typical footprint of 4MiB (a reference array of 2^20 with compressed oops). If we feel this is a lot of memory to commit to upfront we can choose between:

  • MpscGrowableArrayQueue - We can pick a small initial size. The footprint will grow with application usage to accommodate the traffic and settle. If the maximum size was picked very cautiously the typical application might save a lot of memory. On the down side, a spike in traffic might enlarge this queue which will never shrink. A traffic spike a month ago might not be worth keeping extra memory around for.
  • MpscChunkedArrayQueue - We can pick a chunk size. A new chunk will be allocated if the queue grows beyond this size, but once the queue shrinks back down the unused chunks will be released and GCed. This will give us a known footprint at rest.

Maybe we don't want to pick a maximum, or we want a soft bound maintained externally to the queue. In this case we can choose from the following:

  • MpscUnboundedArrayQueue - Similar to the Chunked variant, but no limit is set on the growth of the queue.
  • MpscXaddUnboundedArrayQueue - A chunked queue utilizing the XADD atomic instruction. The users always progress from one chunk to the next (as opposed to remaining within the same chunk as they would in other linked array queues). The queue employs a chunk recycling facility to avoid allocation within a range of use. This queue offers great performance under high contention.
  • MpscLinkedQueue - A simple node-per-element queue. While each node results in allocation, this queue has the least footprint when empty and offers good performance under contention.

Phew! That's a lot of choices to make! The MPSC case is a very common use-case, and so offering appropriate specializations pays off. We hope you find the queue that's just right for you.

Example: A simple object pool

Sometimes, in order to reduce allocation or high initilization costs, we might need to re-use objects. A simple way to implement a concurrent object pool is to use a Multi-Producer-Multi-Consumer queue. It needs to be MPMC because we want to reuse objects across threads and the threads can both offer(return an object) and poll(take an object). If we use ConcurrentLinkedQueue we will have a node worth of garbage overhead for each take+return pair. If we use a *BlockingQueue the pool access will involve taking a lock. We have choice here:

  • Bounded pool -> Use MpmcArrayQueue.
  • Unbounded pool -> Use MpmcXaddUnboundedArrayQueue. There might be allocation overhead if we exceed the size cater for by the pooled chunks, but it will be significantly lower than the overhead imposed by ConcurrentLinkedQueue.

Note that if we want the lowest footprint when empty we could use ConcurrentLinkedQueue, but for this use case we have a queue that typically has some occupancy. If we estimate the occupancy right we will have lower footprint when using MpmcXaddUnboundedArrayQueue.

Example: Single Source, Many Workers

Consider a server handling requests from many clients. Each request involves a relatively high amount of cpu usage, or perhaps involves some blocking API usage. We can use a Selector to handle accepting connections, but when a Channel is ready we would want to delegate the work to a different thread. This is a Single-Producer-Multi-Consumer scenario, and we can tackle it with the SpmcArrayQueue. While MPMC scenarios are very common, SPMC seem far less in demand and so we don't have the same range of options here. In theory at least all the MPSC algorithms can be reversed to build SPMC variations.

Example: Single Source, Sharded Workers

In this scenario we might have request that must always be handled by the same thread (e.g. to maintain client state in non-thread safe data structures). Since the messages are now destined for a particular thread we have a Single-Producer-Single-Consumer scenario here. SPSC queues come in many flavours and so similar to the MPSC example we can pick the most appropriate one based on the bounded/unbounded/footprint requirements.

We Have The Maps!

The use case for NonBlockingHashMap is as an alternative to ConcurrentHashMap or Hashtable usage. Choose NBHM over CHM if the following appeals to your requirements:

  • No allocation on insert: NBHM does not allocate map entries on new mappings, where as CHM allocates a node.
  • No locks: NBHM is non-blocking, where as CHM locks on the head entry for the hash bucket on every put*/compute*/remove/replace operation. This results in improved performance for some use cases, but also in different guarantees for the compute* methods where the call to the value supplier is not atomic. This is in line with the ConcurrentMap API.
  • Open addressing: depending on your keys hashes distribution this may be a good or a bad thing. NBHM does not handle bucket collisions as well as CHM which has a dual linked buckets/tree node scheme to help in such cases.
  • Footprint: For the same sized entries table NHBM memory footprint is ~28 less per entry. For small entries this can be a significant overhead.
  • Concurrent HashSet: The JDK does not offer a concurrent set, but developers can and do use a CHM to construct one (e.g. using Collections.newSetFromMap). JCTools offers a direct set implementation which further reduces memory footprint (no need for value storage).
  • Specialized implementation for long keys: reducing boxing and abstraction costs.
  • Specialized implementation for positive int sets: the concurrent int set is implemented as a bit vector. This is great if the int range is relatively low. Space is used in proportion to the largest element, as opposed to the number of elements (as is the case with hash-table based Set implementations). Space is approximately (largest_element/8 + 64) bytes.
  • Specialized identity map

We Have Concurrent Counters!

The use case for the fixed sized striped counters is similar to LongAdder, namely providing an alternative to contended AtomicLong counters.