You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My current task requires to extract data (w. kafka-connect) from several postgres tables, perform joins (w. ksqlDB), then insert those denormalized data into a data warehouse (also a postgres table in this case). Each ETL pipeline may involve joining 3-4 tables, and each table has ~1M rows. I can't use stream-stream or windowed JOIN, since there are no constraints on the time range when joining records. Therefore, all the JOIN are full table-table joins.
An example of my ksqlDB queries:
CREATETABLEsink1ASSELECT ... FROM source1 JOIN source2 ...;
CREATETABLEsink2ASSELECT ... FROM sink1 JOIN source3 ...;
-- etc...
CREATE SINK CONNECTOR sinkconnector WITH (...);
My current setup (replica count/CPU/memory) is something like this on k8s:
I'm observing the ksqldb server often get OOMKilled or high saturation. It also takes a long time to process each new record coming from source tables that are very large.
Could increasing cpu/memory for kafka or ksqldb-server be helpful here? Or should I have chosen another solution like batch processing instead?
The text was updated successfully, but these errors were encountered:
My current task requires to extract data (w. kafka-connect) from several postgres tables, perform joins (w. ksqlDB), then insert those denormalized data into a data warehouse (also a postgres table in this case). Each ETL pipeline may involve joining 3-4 tables, and each table has ~1M rows. I can't use stream-stream or windowed
JOIN
, since there are no constraints on the time range when joining records. Therefore, all theJOIN
are full table-table joins.An example of my ksqlDB queries:
My current setup (replica count/CPU/memory) is something like this on k8s:
# Units: mCPU and MB NAME↑ READY STATUS AGE RESTARTS CPU MEM PF CPU/R:L MEM/R:L %CPU/R %CPU/L %MEM/R %MEM/L ksqldb-server-558c658d5b-7zm5q 1/1 Running 67m 3 131 20457 ● 2000:0 22528:22528 6 n/a 90 90 ksqldb-server-558c658d5b-f7wbt 1/1 Running 8d 1 127 21162 ● 2000:0 22528:22528 6 n/a 93 93 ksqldb-server-558c658d5b-l82x4 1/1 Running 6d21h 2 143 20159 ● 2000:0 22528:22528 7 n/a 89 89 ksqldb-server-558c658d5b-t8xzg 1/1 Running 59m 0 290 20844 ● 2000:0 22528:22528 14 n/a 92 92 schema-registry-6d4f7f5765-kcnrz 1/1 Running 10d 3 3 336 ● 10:1000 400:1024 30 0 84 32 kafka-0 1/1 Running 10d 0 397 3916 ● 500:0 4096:4096 79 n/a 95 95 kafka-connect-8569b5dd48-9d2bk 2/2 Running 6d17h 0 34 2896 ● 550:0 3104:3272 6 n/a 93 88
I'm observing the ksqldb server often get OOMKilled or high saturation. It also takes a long time to process each new record coming from source tables that are very large.
Could increasing cpu/memory for kafka or ksqldb-server be helpful here? Or should I have chosen another solution like batch processing instead?
The text was updated successfully, but these errors were encountered: