Skip to content

Materialize rehydration of Kafka/Kinesis from S3? #9154

Answered by bobbyiliev
danthegoodman1 asked this question in Q&A
Discussion options

You must be logged in to vote

Hi there,

This should be doable with a UNION ALL statement where you would filter out the latest records from the S3 bucket and only pull all the S3 data that is not available in Kenesis/Kafka.

The SELECT statement would look something like this:

SELECT * FROM s3_source
  WHERE event_timestamp < (SELECT min(event_timestamp) FROM kinesis_source)
  UNION ALL
  SELECT * FROM kinesis_source
  • First, we select all of the records from the S3 bucket and we filter them out with a subquery so we get the records from S3 that are not present in Kafka/Kinesis
  • Then we use UNION ALL to union the S3 data with all currently available records from Kenesis.

That way, even if you restart the historical data…

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@danthegoodman1
Comment options

@danthegoodman1
Comment options

@ruf-io
Comment options

@danthegoodman1
Comment options

Answer selected by danthegoodman1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants