Unable to find subscription with identifier: {"channel":"GraphqlChannel","channelId":"18bccb1dbd6"} #4702

Amnesthesia · 2023-11-15T02:31:02Z

Describe the bug

This is possibly not a bug in graphql-ruby, and I apologize in advance if that's the case, but I've just run out of ideas on what this could be and thought I'd open this bug report in case this is something anybody else has run into before, or if it's a common problem that I just can't find much information on.

Basically, we see a lot of these errors in our AppSignal monitoring, but I can't reproduce it locally. We don't know if it's something that happens when we deploy new versions and somebody is using the site with (possibly now disconnected websockets?) or what's been going on really

Versions

graphql version: 2.1.1
rails (or other framework): 7.0.8
other applicable versions (graphql-batch, etc):

graphql-pro (1.24.11)
    graphql (2.1.1)
    graphql-metrics (5.0.7)
    graphql-rails_logger (1.2.4)
    rubocop-graphql (1.4.0)
graphql (= 2.1.1)

Steps to reproduce

I'm sorry but honestly, no clue. We can't reproduce this locally, it only happens in our production and staging environments (on Amazon Fargate using ElastiCache Redis)

Expected behavior

Not to throw this error, or to find the subscription

Actual behavior

What specifically went wrong?

Place full backtrace here (if a Ruby exception is involved):

Click to view exception backtrace


lib/action_cable/connection/subscriptions.rb:75 find |  
-- | --
lib/action_cable/connection/subscriptions.rb:47 remove |  
lib/action_cable/connection/subscriptions.rb:18 execute_command |  
lib/action_cable/connection/base.rb:89 dispatch_websocket_message |  
lib/action_cable/server/worker.rb:59 block in invoke |  
lib/active_support/callbacks.rb:118 block in run_callbacks |  
lib/semantic_logger/base.rb:190 block in tagged |  
lib/semantic_logger/semantic_logger.rb:346 tagged |  
lib/semantic_logger/base.rb:202 tagged |  
lib/rails_semantic_logger/extensions/action_cable/tagged_logger_proxy.rb:8 tag |  
lib/action_cable/server/worker/active_record_connection_management.rb:16 with_database_connections |  
lib/active_support/callbacks.rb:127 block in run_callbacks |  
lib/action_cable/engine.rb:71 block (4 levels) in <class:Engine> |  
lib/active_support/execution_wrapper.rb:92 wrap |  
lib/action_cable/engine.rb:66 block (3 levels) in <class:Engine> |  
lib/active_support/callbacks.rb:127 instance_exec |  
lib/active_support/callbacks.rb:127 block in run_callbacks |  
lib/active_support/callbacks.rb:138 run_callbacks |  
lib/action_cable/server/worker.rb:42 work |  
lib/action_cable/server/worker.rb:58 invoke |  
lib/action_cable/server/worker.rb:53 block in async_invoke |  
lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:352 run_task |  
lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:343 block (3 levels) in create_worker |  
lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:334 loop |  
lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:334 block (2 levels) in create_worker |  
lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:333 catch |  
lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:333 block in create_worker

The text was updated successfully, but these errors were encountered:

Amnesthesia · 2023-11-15T02:39:20Z

Actually, this being raised in subscriptions.rb when the identifier can't be found in the @subscriptions array, could this be because we run multiple containers, and the user may be subscribed at another container?

rmosolgo · 2023-11-15T18:49:32Z

Hi! Sorry for the trouble and thanks for the detailed report. What I expect on re-deploy is for clients to reconnect, re-sending GraphQL subscriptions as needed. A couple of quick thoughts:

What caused this error to start occurring? Did you just add GraphQL subscriptions to your app, or did something else change that caused these to appear?
Reviewing the backtrace above, I don't see anything GraphQL-related. Is there any indication that this is GraphQL-specific? (Are there other uses of ActionCable that work properly in your application?)
You mention that you can't replicate the error locally. If you're using the app in production during a deploy, do you see anything happening in your browser?
In AppSignal, can you see any trends in the occurences in this bug? (Maybe browser? I can imagine browsers handling this event in different ways...)

Amnesthesia · 2023-11-16T01:28:56Z

@rmosolgo Right, so to address your questions (and what I think is going on here as well):

This error has been occurring for a very long time for us, but it doesn't seem to cause any issues. We've had a lot of ideas on what it could be, and none of those have turned out to be it. We haven't really prioritized it because like I said, it seems pretty harmless
You're right, and I realized after posting this here that this may not be graphql-related and could be actioncable-related
3 & 4. We're not seeing anything related to the browser type for this, seems to be a mixed bag

I think it's that we have this in our GraphQLChannel:

def subscribed
    @subscription_ids = []
end

This would start each instance on a blank slate of connections, so if we're scaling and running 3-4 instances, they're all pushing to the same redis. One instance may pick up a subscriptions.trigger(...) from puma on another instance, but doesn't have the gid in its @subscription_ids, because its not being shared.

Perhaps @subscription_ids should be pointing to redis to maintain a shared list of subscribed clients?

rmosolgo · 2023-12-18T20:25:55Z

My intention on @subscription_ids is for that to be a per-instance list of subscribers to that instance. So it works like:

subscriptions.trigger is called in one Ruby process, it calls .broadcast to notify other processes of the trigger:

graphql-ruby/lib/graphql/subscriptions/action_cable_subscriptions.rb

Lines 116 to 122 in 8270480

    
           # An event was triggered; Push the data over ActionCable. 
        
           # Subscribers will re-evaluate locally. 
        
           def execute_all(event, object) 
        
             stream = stream_event_name(event) 
        
             message = @serializer.dump(object) 
        
             @action_cable.server.broadcast(stream, message) 
        
           end

meanwhile, any ActionCable process that received a subscription should have setup a stream_from for that subscriber:

graphql-ruby/lib/graphql/subscriptions/action_cable_subscriptions.rb

Line 169 in 8270480

channel.stream_from(stream_event_name(initial_event), coder: @action_cable_coder) do |message|
So, when a process receives a notification from a trigger, it runs that stream_from block and generates a payload. Then it broadcasts again, using the specific subscription ID, to update the client:

graphql-ruby/lib/graphql/subscriptions/action_cable_subscriptions.rb

Line 129 in 8270480

@action_cable.server.broadcast(stream_subscription_name(subscription_id), payload)

I hope that helps!

Amnesthesia · 2024-01-02T21:58:53Z

@rmosolgo Thank you for taking the time out of the day to explain this a bit, appreciate it! :) We still haven't worked this issue out. We realized we weren't running ActionCable standalone in production, and thought perhaps this would be solved when we move ActionCable to be its own standalone service, but (I presume) because we also scale this out, this "issue" still persists.

We don't really know to what extent its an issue, or if it is an issue at all — e.g, are we getting a certain percentage of lost websockets pushes? or are these errors completely harmless?

So, when a process receives a notification from a trigger, it runs that stream_from block and generates a payload. Then it broadcasts again, using the specific subscription ID, to update the client

In this sense, should an ActionCable process that does not have the subscription_id in its list of subscribers simply not pick this up? Do you see a way that it would happen in a multi-ActionCable setup that ActionCable processes routinely try to handle messages meant for subscriber_ids that they dont have?

Amnesthesia · 2024-04-04T09:09:01Z

@rmosolgo We've moved away from scaling actioncable horizontally and have instead scaled it up, increasing the spec of the machine significantly and only running one machine. We are still seeing these errors every day and we're quite lost at where they come from

We've reduced our identified_by attributes as well to a single one, identified_by :_strategy where _strategy is a custom class managing the JWT token and current user

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to find subscription with identifier: {"channel":"GraphqlChannel","channelId":"18bccb1dbd6"} #4702

Unable to find subscription with identifier: {"channel":"GraphqlChannel","channelId":"18bccb1dbd6"} #4702

Amnesthesia commented Nov 15, 2023

Amnesthesia commented Nov 15, 2023 •

edited

rmosolgo commented Nov 15, 2023

Amnesthesia commented Nov 16, 2023

rmosolgo commented Dec 18, 2023

Amnesthesia commented Jan 2, 2024

Amnesthesia commented Apr 4, 2024 •

edited

Unable to find subscription with identifier: {"channel":"GraphqlChannel","channelId":"18bccb1dbd6"} #4702

Unable to find subscription with identifier: {"channel":"GraphqlChannel","channelId":"18bccb1dbd6"} #4702

Comments

Amnesthesia commented Nov 15, 2023

Amnesthesia commented Nov 15, 2023 • edited

rmosolgo commented Nov 15, 2023

Amnesthesia commented Nov 16, 2023

rmosolgo commented Dec 18, 2023

Amnesthesia commented Jan 2, 2024

Amnesthesia commented Apr 4, 2024 • edited

Amnesthesia commented Nov 15, 2023 •

edited

Amnesthesia commented Apr 4, 2024 •

edited