Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible data corruption with memcached #525

Open
raivil opened this issue Dec 13, 2022 · 4 comments
Open

Possible data corruption with memcached #525

raivil opened this issue Dec 13, 2022 · 4 comments

Comments

@raivil
Copy link

raivil commented Dec 13, 2022

Hi!
First of all thanks for the great work on this gem.
Here's my situation/scenario: I'm deploying a Rails 7 app to AWS and using Elasticache Memcached as the store for Identity cache, rails cache and cache store.

The app runs as expected, users are able to access, the data is cached into memcached and I can see the usage metrics (bytes read/write, conn numbers, cpu, unused memory, etc)
After some time, the app started raising a lot of exceptions and users were no longer able to access it.
The app was barely used (testing infrastructure), and memcached metrics/health looks ok (lots of spare resources).
Some examples of exceptions:

  • Dalli::UnmarshalError: Unable to unmarshal value: marshal data too short
  • Dalli::DalliError: Response error: "?x??UOk$E7*?
  • Dalli::DalliError: Response error: VA
  • Dalli::DalliError: Response error: NS
  • Dalli::DalliError: Response error: ".\u0000[:idc_cached_nilf1670773635.8781602

Cleaning the memcached data fixed the issue, but it happened again the next day.

What could be causing this? I've double checked configs everywhere (sample below) and no clue so far on what could be causing it. I'm moving the data from identity cache to an isolated memcached instance.

Any clues? Thoughts?

Thank you!

gems:
dalli (3.2.3)
identity_cache (1.2.0)
rails (7.0.4)

memcached version 1.6.17 (docker version running on CI) and on AWS Elasticache (1.6.16)

app configs:

  config.session_store :cache_store,
                       key: "the_session_key",
                       expire_after: 24.hours

  config.cache_store = :mem_cache_store, "memcached.server.url",
                       { pool: { size: 30 }, protocol: :meta, expires_in: 24.hours,
                         failover: false } # avoids more cache consistency issues

  config.action_policy.cache_store = :mem_cache_store, "memcached.server.url",
                                     { pool: { size: 30 }, protocol: :meta, expires_in: 1.hour,
                                       failover: false } # avoids more cache consistency issues
  config.identity_cache_store = :mem_cache_store, "memcached.server.url",
                                { pool: { size: 30 }, protocol: :meta, expires_in: 6.hours,
                                  failover: false } # avoids more cache consistency issues

Update 1:
Issues happen when updating the model classes and doing a new deploy.

Example stack trace

Error message:
no implicit conversion of Symbol into Integer
…r/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:71:in `[]'
…r/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:71:in `record_from_coder'
…r/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:21:in `block in decode'
…dle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/notifications.rb:208:in `instrument'
…r/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:20:in `decode'
…/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cached/primary_index.rb:74:in `cache_decode'
…ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cache_key_loader.rb:38:in `load'
…/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cached/primary_index.rb:17:in `fetch'
…by/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/with_primary_index.rb:127:in `fetch_by_id'
…by/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/with_primary_index.rb:67:in `fetch_by_slug'

@raivil raivil changed the title Question: Possible data corruption with memcached on AWS Elasticache Possible data corruption with memcached on AWS Elasticache Dec 15, 2022
@raivil
Copy link
Author

raivil commented Dec 15, 2022

Here's a full stack trace.
The problem just occurred without any app or server changes.

/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:71:in `[]'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:71:in `record_from_coder'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:21:in `block in decode'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/notifications.rb:208:in `instrument'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/encoder.rb:20:in `decode'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cached/primary_index.rb:74:in `cache_decode'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cache_key_loader.rb:38:in `load'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/cached/primary_index.rb:17:in `fetch'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/with_primary_index.rb:127:in `fetch_by_id'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/identity_cache-1.2.0/lib/identity_cache/with_primary_index.rb:67:in `fetch_by_slug'
/opt/main-app/current/app/utilities/account_domain_utils.rb:10:in `current_account_by_request'
/opt/main-app/current/app/utilities/account_domain_utils.rb:5:in `matches?'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/routing/mapper.rb:40:in `block in matches?'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/routing/mapper.rb:39:in `all?'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/routing/mapper.rb:39:in `matches?'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/routing/mapper.rb:46:in `serve'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/journey/router.rb:50:in `block in serve'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/journey/router.rb:32:in `each'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/journey/router.rb:32:in `serve'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/routing/route_set.rb:852:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/rack-attack-6.6.1/lib/rack/attack.rb:127:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/rack/agent_hooks.rb:30:in `traced_call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
<truncated 45 additional frames>
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/rack-2.2.4/lib/rack/runtime.rb:22:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/rack-timeout-0.6.3/lib/rack/timeout/core.rb:148:in `block in call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/rack-timeout-0.6.3/lib/rack/timeout/support/timeout.rb:19:in `timeout'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/rack-timeout-0.6.3/lib/rack/timeout/core.rb:147:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/cache/strategy/local_cache_middleware.rb:29:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/middleware/executor.rb:14:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/rack-2.2.4/lib/rack/sendfile.rb:110:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/middleware/ssl.rb:77:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/actionpack-7.0.4/lib/action_dispatch/middleware/host_authorization.rb:137:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/railties-7.0.4/lib/rails/engine.rb:530:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/newrelic_rpm-8.13.1/lib/new_relic/agent/instrumentation/middleware_tracing.rb:99:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/configuration.rb:268:in `call'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/request.rb:89:in `block in handle_request'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/thread_pool.rb:343:in `with_force_shutdown'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/request.rb:88:in `handle_request'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/server.rb:430:in `process_client'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/server.rb:233:in `block in run'
/opt/main-app/current/vendor/bundle/ruby/3.1.0/gems/puma-6.0.0/lib/puma/thread_pool.rb:150:in `block in spawn_thread'

The model that is failing to be cached/retrieved has some relations, but in short this is it:

class Account < ApplicationRecord
  extend FriendlyId
  include IdentityCache

  cache_index :slug, unique: true

  has_one :model1, dependent: :destroy, class_name: "model1"
  has_one :model2, dependent: :destroy, class_name: "model2"
  has_one :model3, dependent: :destroy, class_name: "model3"
  cache_has_one :model1, embed: true
  cache_has_one :model2, embed: true
  cache_has_one :model3, embed: true

@raivil
Copy link
Author

raivil commented Dec 19, 2022

I've started seeing the same exception on CI while running specs.
They fail randomly and fetch_by_slug calls (cache_index).

@raivil raivil changed the title Possible data corruption with memcached on AWS Elasticache Possible data corruption with memcached Dec 19, 2022
@driv3r
Copy link
Contributor

driv3r commented Jan 17, 2023

In the second exception, was the message "no implicit conversion of Symbol into Integer" as well? Looks like for some reason we get Array instead of Hash from Cache, are there any other places where you utilise Rails cache? maybe there are some key collisions?

I would try to hook into AS Notifications and log stuff on cache_write.active_support & dehydration.identity_cache messages, and try to see whenever it's only IDC or also something else.

@driv3r
Copy link
Contributor

driv3r commented Jan 17, 2023

Also if you had this inside tests as well, please keep the "seed", it will help reproduce the issue locally and potentially find the core of the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants