Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECONNREFUSED Error when adding Job to Queue #83

Closed
pariola opened this issue Dec 8, 2019 · 6 comments
Closed

ECONNREFUSED Error when adding Job to Queue #83

pariola opened this issue Dec 8, 2019 · 6 comments

Comments

@pariola
Copy link

pariola commented Dec 8, 2019

I created an IORedis instance and initiated a queue,

const Redis = require("ioredis");
const { Queue } = require("bullmq");

const { REDIS_URI } = process.env;
const connection = new Redis(REDIS_URI);

const q = new Queue("base", { connection });

q.add("one", { name: "NAME" });

then it fails with the error below when i try to add a Job to Queue

{ Error: connect ECONNREFUSED 10.59.255.141:6379
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1107:14)
  errno: 'ECONNREFUSED',
  code: 'ECONNREFUSED',
  syscall: 'connect',
  address: '10.59.255.141',
  port: 6379 }

While debugging i thought it was my Redis connection then i added the code below before calling Queue.add and it returns the keys successfully, while Queue.add still fails

connection.keys("*", (err, keys) => {
  if (err) return console.log(err);

  console.log(keys);
});

I have tried downgrading to 1.4.3 too

Any help would be greatly appreciated!

@pariola pariola changed the title ECONNREFUSED Error when adding to Queue ECONNREFUSED Error when adding Job to Queue Dec 8, 2019
@eric-hc
Copy link
Contributor

eric-hc commented Dec 10, 2019

Is your app running in a Docker container? Right now I'm having an issue, but it only refuses connection when the API is running in its own container. I think it's similar to this ioredis issue that has a few potential solutions though.

@pariola
Copy link
Author

pariola commented Dec 11, 2019

Yeah, my app is running in a kubernetes cluster

@elucidsoft
Copy link

I have mine also running in Kubernetes with my Redis setup with Redis Sentinel and 3 instance. It works great for a while. I use to get this exact error before setting up Sentinel, so I figured I would setup Sentinel to make it more redundant. Now, it's all working but there is some truly strange weirdness that I can't figure out.

  1. It works fine as long as Redis is up and running when you first connect via IORedis.
  2. If Redis goes down, or if ALL sentinels are taken down AND you try to use the connection you get an IORedis error that it can't connect which is expected.
  3. If you take Redis back up, and try again you continue to get the error it can't connect, even though in the error message it states it will try to reconnect, it never succeeds at reconnecting a connection that died after initial connection.
  4. This is where things get weird, if you call getJobStatus or clean, etc. at Step 3 those work and return data and connect to redis. Even though if you repeat Step 3, it still throws an exception that it can't connect. I have verified over and over again that I am using same connection, definitely using same connection and ioredis instance. It's just beyond bizarre.
  5. If you restart the container all is fine again until you mess up the connection again like in Step 2.

What I don't know if this behavior is specific to IORedis or if its some odd way BullMQ is using it? I don't know tbh.

@manast
Copy link
Contributor

manast commented Dec 21, 2019

@elucidsoft my gut feeling is that this is an ioredis issue, it would be great if you can verify it just with a simple redis example. I already posted an issue regarding connection in the ioredis repo:
redis/ioredis#610

@elucidsoft
Copy link

Is there a call I can use for my health checks other than queue.add that would cause this? Would love a method called isHealthy() would make doing health checks a freaking breeze with Bull. I think I'll put in a request for this.

@elucidsoft
Copy link

This a a non-issue, after months of testing, troubleshooting, and refining here is the deal. If redis is terminated forcefully, for instance if your on a preemptive node or a spot instance type of situation and your redis instance is given zero notice for graceful shutdown you can end up in this situation. But it has nothing to do with BullMq. It's an odd issue with connection state with Redis, when this happens it seems when Redis comes back up it performs some actions to restore it's state, etc. During that time, IORedis will be connecting to it, those connections that connect to it before Redis is ready end up only able to perform a very limited set of actions for whatever reason, and IORedis is not capable of recovering from this situation. The only way out is a full restart of the application, some internal state in IORedis is preventing a recovery of this state.

In any case, I have not had this happen to me in several months now, so I am 100% stable. If the clients get killed it does not affect anything. You will see this error occasionally when that happens, but in all instances I've seen a 100% full recovery in a matter of seconds. Morale of the story, don't put StatefulSets on volatile node types. In fact, if you read Kubernetes documentation, they actually mention this being unsupported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants