Skip to content

Implementation of streamed inner-join operation, illustrating an exact method (hash-based join) and an approximate method (bloom filter based join)

Notifications You must be signed in to change notification settings

benjaminbillet/kafka-streams-join

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kafka Streams Join Processor

The purpose of this Kafka Streams processor is to perform a continuous inner-join over a set of streams. Basically, the processor accumulates all unique record value of the streams. When a new record is processed, the processor looks if the record value has been seen so far on all other streams. If so, the processor forward the record as part of the joined stream.

This project includes two implementation:

  • HashJoinProcessor: deterministic implementation, based on hash tables (more precisely, Java HashSet).
  • BloomJoinProcessor: probabilistic implementation, based on Bloom filters. This implementation consumes order of magnitude less memory, but has a probability of false positive (i.e., a few items in the result streams should have not been part of the join).

About

Implementation of streamed inner-join operation, illustrating an exact method (hash-based join) and an approximate method (bloom filter based join)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages