Skip to content

A unit test for Apache Beam streaming, to check if your window would drop data, and how many times would the window be triggered

License

Notifications You must be signed in to change notification settings

iht/beam-late-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Understanding Exactly-Once Processing And Windowing In Streaming Pipelines (with Apache Beam)

This repo contains the code showcased in the Beam Summit talk Understanding Exactly-Once Processing And Windowing In Streaming Pipelines

In that talk, I explained how windowing works in streaming pipelines, and what are the decisions you have to make in order to do complex event processing in streaming with Apache Beam.

Whenever we apply a window, there are always doubts about whether the window will drop data, or how many times (and when) will be the output triggered.

This repo contains a sample pipeline that uses unit testing to check if your window would drop data, and how many times would the window be trigered. You write your window in a function, and then use the unit test to check the output of that pipeline. If the window drops data, the test will fail. In addition, you get a CSV output that you can examine to see how and when your window produced output.

Watch the talk

Watch the video at the Beam Summit website, or at YouTube:

Check also the slides and the notes of each slide.

The tested pipeline

The pipeline processes 60 messages. 50 messages produced on time, and 10 messages that arrive after the watermark (late data).

How to test your own window?

First: add your window

Add a new window to src/main/java/com/google/cloud/pso/windows/SomeSampleWindow.java.

For that, just add a new method with this signature:

public Window<KV<String, MyDummyEvent>> myCustomWindow()

(maybe with some input parameters if you want to use those in your window).

See some examples of windows in that file

Second: apply your window

TODO

Copyright

Copyright 2020 Google LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

A unit test for Apache Beam streaming, to check if your window would drop data, and how many times would the window be triggered

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published