Dreaded "can't add a new key into hash during iteration" again #381

jcrood · 2024-04-19T14:26:34Z

I installed fluentd on k8s using fluentd-kubernetes-daemonset v1.16.5 and ocasionally have it logging an error from the metadata plugin:

2024-04-19 09:18:20 +0000 [warn]: #0 dump an error event: error_class=RuntimeError error="can't add a new key into hash during iteration" location="/fluentd/vendor/bundle/ruby/3.2.0/gems/fluent-plugin-kubernetes_metadata_filter-3.4.0/lib/fluent/plugin/kubernetes_metadata_cache_strategy.rb:86:in 'merge!'" tag="kubernetes.var.log.containers.calico-node-hrnj2_kube-system_calico-node-455d05a2c5e6b90ce4444c083d2aeeebf89c454d1588ea6fe7d314a7eaa61e6e.log" time=2024-04-14 22:15:41.969400595 +0000 record={"stream"=>"stdout", "logtag"=>"F", "message"=>"2024-04-14 22:15:41.968 [INFO][94] felix/summary.go 100: Summarising 11 dataplane reconciliation loops over 1m4.6s: avg=7ms longest=22ms ()", "time"=>"2024-04-15T00:15:41.969400595+02:00", "docker"=>{"container_id"=>"455d05a2c5e6b90ce4444c083d2aeeebf89c454d1588ea6fe7d314a7eaa61e6e"}, "kubernetes"=>{"container_name"=>"calico-node", "namespace_name"=>"kube-system", "pod_name"=>"calico-node-hrnj2", "container_image"=>"quay.io/calico/node:v3.24.5", "container_image_id"=>"sha256:54637cb36d4a1c029fb994c6fc88af04791c1f2dcbd12a24aa995c0bffaacdb1", "pod_id"=>"b3573c8d-a8a1-4d46-bd63-a447b609e71f", "pod_ip"=>"10.254.6.12", "host"=>"m02.c042.k8s.example.com", "labels"=>{"controller-revision-hash"=>"79ddc99967", "k8s-app"=>"calico-node", "pod-template-generation"=>"4"}, "master_url"=>"https://10.233.0.1:443/api"}, "@timestamp"=>"2024-04-14T22:15:41.969400595+00:00"}

This appears to be similar to #237 and #260, except it occurs in a different part of the caching code and in a newer version. I've tried shuffling some code around, taking out the namespace_cache fetch and isolating the merge! call and it keeps failing on that. It's very well possible that it trips up over ruby's own https://github.com/ruby/ruby/blob/ruby_2_3/hash.c#L2315.

Edited to add: having dug a bit deeper info what's happening, it appears that this can occur because both merge!s (here and [here]https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/blob/v3.4.0/lib/fluent/plugin/kubernetes_metadata_cache_strategy.rb#L86)) can happen on cache entries of the pod metadata and is not thread safe. I've currently "fixed" it for myself by using plain merge and writing back the updated copy back to the cache, which is synchronised.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dreaded "can't add a new key into hash during iteration" again #381

Dreaded "can't add a new key into hash during iteration" again #381

jcrood commented Apr 19, 2024 •

edited

Dreaded "can't add a new key into hash during iteration" again #381

Dreaded "can't add a new key into hash during iteration" again #381

Comments

jcrood commented Apr 19, 2024 • edited

jcrood commented Apr 19, 2024 •

edited