Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for sending to > 1,000 registration tokens #167

Open
eladnava opened this issue Oct 17, 2015 · 9 comments
Open

Support for sending to > 1,000 registration tokens #167

eladnava opened this issue Oct 17, 2015 · 9 comments

Comments

@eladnava
Copy link
Collaborator

As discussed in #42 with @hypesystem, we'd like to be able to support more than 1,000 registration tokens in the sender.send and sender.sendNoRetry calls -- today, sending more than 1,000 registration tokens will invoke an error, as GCM will reject the request.

Modules

Things to consider

  1. What happens when one of the batch requests fails for some reason?
    • I think we should retry that request if we've been invoked via sender.send.
    • Does GCM send out notifications to some devices in the batch request even though it failed?
    • In case the request still fails X number of times (amount specified or the default for sender.send), I think we should forget about that batch and keep going -- the other ones might not error out.
    • After completing all other requests, we should pass an error to the provided callback, with info on how many batch requests failed, for example, to let the application know.
  2. Are there legitimate cases where the batch request may fail and we could do something to make it succeed?
    • We should review the GCM error response codes to see if there are errors that could be prevented, such as sending a malformed registration token, that could cause the whole batch request to fail.
@eladnava eladnava changed the title Parallel batching support for > 1,000 registration tokens Support for sending to > 1,000 registration tokens Oct 17, 2015
@eladnava
Copy link
Collaborator Author

@hypesystem did you get a chance to take a look at this, mate?

@hypesystem
Copy link
Collaborator

Not yet. The biggest challenge, as I see it, is if some batches succeed and other fails: this needs to be returned in a meaningful way to the user.

I'll give an example with some smaller sizes: we want to send 6 notifications and have a batch size of 2. This happens:

  • Batch 1 succeeds, all notifications being sent
  • Batch 2 has one succeeding and one failing notification. One of the notifications failed to send, and got an error back from GCM.
  • Batch 3 failed altogether: after three retries it still could not get a non-500 code from the GCM server.

How do we represent the result here?

We cannot just return an error, because some of the notifications were sent. The user needs to see this, and not resend them.

The best suggestion I have is to return success and a result object looking something like this:

[
  { success: true },
  { success: true },
  { success: true },
  { error: "original error that happened" },
  { error: "some error we made up for 500-codes" },
  { error: "some error we made up for 500-codes" }
]

For failing batches we return a result-object for each notification that explains the error.

Again, this is my best suggestion at the moment, and there might be a better solution. No matter what it is a tricky situation to handle.

@eladnava
Copy link
Collaborator Author

@hypesystem what do you mean by notification?

For failing batches we return a result-object for each notification that explains the error.

Are you referring to each individual notification sent to every device in the current batch?

@eladnava eladnava reopened this Oct 30, 2015
@hypesystem
Copy link
Collaborator

By notification I mean the individual message sent to a device. I should have probably just called it "message" to lessen confusion.

Are you referring to each individual notification sent to every device in the current batch?

Yes. And also the final response when merging all the results before returning the result to the user.

@eladnava
Copy link
Collaborator Author

@hypesystem OK, that makes sense.

This is the format of the current result object returned in the sender.send callback:

{
    "failure" : 1, // Number of messages that could not be processed
    "success" : 5, // Number of messages that were sent successfully
    "canonical_ids": [...],
    "results": [...],
    ...
}

Given that apps already interface with it, I think that we should not modify its format. Batch requests should return the same object, incrementing the failure count or erroring out completely if necessary. And they could concat the results and canonical_ids from all of the batch requests into that single array which is returned to the app, in the same order that the registration tokens were supplied.

GCM Error Response Handling for Batch Requests

I think that we need to define what "batch request fails" means. There are 4 types of possible responses from the GCM downstream HTTP API.

This is how I think we should handle the HTTP response codes for each batch request:

  • 400 Bad Request - bad JSON sent. No retry mechanism for the current request, as we'd be generating the same bad JSON again. Skip the current batch and send the next one, incrementing the failure count with the number of registration tokens in the current batch.
  • 401 Unauthorized - bad API key. Error out completely, don't try any other batches, as this error is not specific to the current batch.
  • 5xx Internal Server Error - GCM error, possibly a timeout on their end. We should retry the current batch while honoring the exponential back-off requirement. If we run out of tries and still can't get the message to be sent, this is where things get tricky. I think we should skip the current batch and execute the next one, incrementing the failure count with the number of registration tokens in the current batch that failed. It beats erroring out and not sending any other batches, in my opinion.
  • 200 OK + results[] - similar behavior to how things work today. Inspect the results[] array and make a list of failed registration tokens from all the batch requests. When we're done with all the batches, try to re-send to those failed registration tokens (splitting those up into batches as well if necessary). Again, increment the failure count as necessary to reflect the number of failed messages for all batch requests.

As per your example, the result returned should be:

{
    "failure" : 1, // Number of messages that could not be processed
    "success" : 5, // Number of messages that were sent successfully
    "canonical_ids": [...],
    "results": [...],
    ....
}

What do you think?

@rraziel
Copy link

rraziel commented Aug 5, 2016

Checking the error responses is nice but what if the request times out?
Should it be considered like a failure, even though the messages may have gone through?
I suppose the same issue exists for submissions <= 1000 requests as well though.

@hypesystem
Copy link
Collaborator

I think a timeout should just result in a retry; if it times out all the tries, it will fail altogether for all the affected registration ids.

@BackPackerDz
Copy link

Hi, it's not available ?

@eladnava
Copy link
Collaborator Author

@BackPackerDz No, it is not. Check out a workaround here:
#42 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants