Fix dead lettering #11174

ansd · 2024-05-06T13:15:00Z

What?

This commit fixes #11159, #11160, #11173, and supersedes #11048

How?

Background

RabbitMQ allows to dead letter messages for four different reasons, out
of which three reasons cause messages to be dead lettered automatically
internally in the broker: (maxlen, expired, delivery_limit) and 1 reason
is caused by an explicit client action (rejected).

RabbitMQ also allows dead letter topologies. When a message is dead
lettered, it is re-published to an exchange, and therefore zero to
multiple target queues. These target queues can in turn dead letter
messages. Hence it is possible to create a cycle of queues where
messages get dead lettered endlessly, which is what we want to avoid.

Alternative approach

One approach to avoid such endless cycles is to use a similar concept of
the TTL field of the IPv4 datagram, or the hop limit field of an IPv6
datagram. These fields ensure that IP packets aren't cicrulating forever
in the Internet. Each router decrements this counter. If this counter
reaches 0, the sender will be notified and the message gets dropped.

We could use the same approach in RabbitMQ: Whenever a queue dead
letters a message, a dead_letter_hop_limit field could be decremented.
If this field reaches 0, the message will be dropped.
Such a hop limit field could have a sensible default value, for example
32. The sender of the message could override this value. Likewise, the
client rejecting a message could set a new value via the Modified
outcome.

Such an approach has multiple advantages:

No dead letter cycle detection per se needs to be performed within
the broker which is a slight simplification to what we have today.
Simpler dead letter topologies. One very common use case is that
clients re-try sending the message after some time by consuming from
a dead-letter queue and rejecting the message such that the message
gets republished to the original queue. Instead of requiring explicit
client actions, which increases complexity, a x-message-ttl argument
could be set on the dead-letter queue to automatically retry after
some time. This is a big simplification because it eliminates the
need of various frameworks that retry, such as
https://docs.spring.io/spring-cloud-stream/reference/rabbit/rabbit_overview/rabbitmq-retry.html
No dead letter history information needs to be compressed because
there is a clear limit on how often a message gets dead lettered.
Therefore, the full history including timestamps of every dead letter
event will be available to clients.

Disadvantages:

Breaks a lot of clients, even for 4.0.

3.12 approach

Instead of decrementing a counter, the approach up to 3.12 has been to
drop the message if the message cycled automatically. A message cycled
automatically if no client expliclity rejected the message, i.e. the
mesage got dead lettered due to maxlen, expired, or delivery_limit, but
not due to rejected.

In this approach, the broker must be able to detect such cycles
reliably.
Reliably detecting dead letter cycles broke in 3.13 due to #11159 and #11160.

To reliably detect cycles, the broker must be able to obtain the exact
order of dead letter events for a given message. In 3.13.0 - 3.13.2, the
order cannot exactly be determined because wall clock time is used to
record the death time.

This commit uses the same approach as done in 3.12: a list ordered by
death recency is used with the most recent death at the head of the
list.

To not grow this list endlessly (for example when a client rejects the
same message hundreds of times), this list should be compacted.
This commit, like 3.12, compacts by tuple {Queue, Reason}:
If this message got already dead lettered from this Queue for this
Reason, then only a counter is incremented and the element is moved to
the front of the list.

Streams & AMQP 1.0 clients

Dead lettering from a stream doesn't make sense because:

a client cannot reject a message from a stream since the stream must
maintain the total order of events to be consumed by multiple clients.
TTL is implemented by Stream retention where only old Stream segments
are automatically deleted (or archived in the future).
same applies to maxlen

Although messages cannot be dead lettered from a stream, messages can be dead lettered
into a stream. This commit provides clients consuming from a stream the death history: #11173

Additionally, this commit provides AMQP 1.0 clients the death history via
message annotation x-opt-deaths which contains the same information as
AMQP 0.9.1 header x-death.

Both, storing the death history in a stream and providing death history
to an AMQP 1.0 client, use the same encoding: a message annoation
x-opt-deaths that contains an array of maps ordered by death recency.
The information encoded is the same as in the AMQP 0.9.1 x-death header.

Instead of providing an array of maps, a better approach could be to use
an array of a custom AMQP death type, such as:

<amqp name="rabbitmq">
    <section name="custom-types">
        <type name="death" class="composite" source="list">
            <descriptor name="rabbitmq:death:list"/>
            <field name="queue" type="string" mandatory="true" label="the name of the queue the message was dead lettered from"/>
            <field name="reason" type="symbol" mandatory="true" label="the reason why this message was dead lettered"/>
            <field name="count" type="ulong" default="1" label="how many times this message was dead lettered from this queue for this reason"/>
            <field name="time" mandatory="true" type="timestamp" label="the first time when this message was dead lettered from this queue for this reason"/>
            <field name="exchange" type="string" default="" label="the exchange this message was published to before it was dead lettered for the first time from this queue for this reason"/>
            <field name="routing-keys" type="string" default="" multiple="true" label="the routing keys this message was published with before it was dead lettered for the first time from this queue for this reason"/>
            <field name="ttl" type="milliseconds" label="the time to live of this message before it was dead lettered for the first time from this queue for reason ‘expired’"/>
        </type>
    </section>
</amqp>

However, encoding and decoding custom AMQP types that are nested within
arrays which in turn are nested within the message annotation map can be
difficult for clients and the broker. Also, each client will need to
know the custom AMQP type. For now, therefore we use an array of maps.

Feature flag

The new way to record death information is done via mc annotation
deaths_v2.
Because old nodes do not know this new annotation, recording death
information via mc annotation deaths_v2 is hidden behind a new feature
flag message_containers_deaths_v2.

If this feature flag is disabled, a message will continue to use the
3.13.0 - 3.13.2 way to record death information in mc annotation
deaths, or even the older way within x-death header directly if
feature flag message_containers is also disabled.

Only if feature flag message_containers_deaths_v2 is enabled and this
message hasn't been dead lettered before, will the new mc annotation
deaths_v2 be used.

kjnilsson

I'm still unsure about introducing a new record where the old one would work fine and was specifically designed for extension but it will ofc work fine so let's leave that.

The feature flag state needed should be passed in to mc as an environment however.

I will prepare a separate PR to remove other feature flag use

deps/rabbit/include/mc.hrl

deps/rabbit/src/mc.erl

deps/rabbit/src/mc_amqpl.erl

Address PR feedback #11174 (comment)

Addresses PR feedback #11174 (comment)

Address PR feedback #11174 (comment)

Addresses PR feedback #11174 (comment)

# What? This commit fixes #11159, #11160, #11173. # How? ## Background RabbitMQ allows to dead letter messages for four different reasons, out of which three reasons cause messages to be dead lettered automatically internally in the broker: (maxlen, expired, delivery_limit) and 1 reason is caused by an explicit client action (rejected). RabbitMQ also allows dead letter topologies. When a message is dead lettered, it is re-published to an exchange, and therefore zero to multiple target queues. These target queues can in turn dead letter messages. Hence it is possible to create a cycle of queues where messages get dead lettered endlessly, which is what we want to avoid. ## Alternative approach One approach to avoid such endless cycles is to use a similar concept of the TTL field of the IPv4 datagram, or the hop limit field of an IPv6 datagram. These fields ensure that IP packets aren't cicrulating forever in the Internet. Each router decrements this counter. If this counter reaches 0, the sender will be notified and the message gets dropped. We could use the same approach in RabbitMQ: Whenever a queue dead letters a message, a dead_letter_hop_limit field could be decremented. If this field reaches 0, the message will be dropped. Such a hop limit field could have a sensible default value, for example 32. The sender of the message could override this value. Likewise, the client rejecting a message could set a new value via the Modified outcome. Such an approach has multiple advantages: 1. No dead letter cycle detection per se needs to be performed within the broker which is a slight simplification to what we have today. 2. Simpler dead letter topologies. One very common use case is that clients re-try sending the message after some time by consuming from a dead-letter queue and rejecting the message such that the message gets republished to the original queue. Instead of requiring explicit client actions, which increases complexity, a x-message-ttl argument could be set on the dead-letter queue to automatically retry after some time. This is a big simplification because it eliminates the need of various frameworks that retry, such as https://docs.spring.io/spring-cloud-stream/reference/rabbit/rabbit_overview/rabbitmq-retry.html 3. No dead letter history information needs to be compressed because there is a clear limit on how often a message gets dead lettered. Therefore, the full history including timestamps of every dead letter event will be available to clients. Disadvantages: 1. Breaks a lot of clients, even for 4.0. ## 3.12 approach Instead of decrementing a counter, the approach up to 3.12 has been to drop the message if the message cycled automatically. A message cycled automatically if no client expliclity rejected the message, i.e. the mesage got dead lettered due to maxlen, expired, or delivery_limit, but not due to rejected. In this approach, the broker must be able to detect such cycles reliably. Reliably detecting dead letter cycles broke in 3.13 due to #11159 and #11160. To reliably detect cycles, the broker must be able to obtain the exact order of dead letter events for a given message. In 3.13.0 - 3.13.2, the order cannot exactly be determined because wall clock time is used to record the death time. This commit uses the same approach as done in 3.12: a list ordered by death recency is used with the most recent death at the head of the list. To not grow this list endlessly (for example when a client rejects the same message hundreds of times), this list should be compacted. This commit, like 3.12, compacts by tuple `{Queue, Reason}`: If this message got already dead lettered from this Queue for this Reason, then only a counter is incremented and the element is moved to the front of the list. ## Streams & AMQP 1.0 clients Dead lettering from a stream doesn't make sense because: 1. a client cannot reject a message from a stream since the stream must maintain the total order of events to be consumed by multiple clients. 2. TTL is implemented by Stream retention where only old Stream segments are automatically deleted (or archived in the future). 3. same applies to maxlen Although messages cannot be dead lettered **from** a stream, messages can be dead lettered **into** a stream. This commit provides clients consuming from a stream the death history: #11173 Additionally, this commit provides AMQP 1.0 clients the death history via message annotation `x-opt-deaths` which contains the same information as AMQP 0.9.1 header `x-death`. Both, storing the death history in a stream and providing death history to an AMQP 1.0 client, use the same encoding: a message annoation `x-opt-deaths` that contains an array of maps ordered by death recency. The information encoded is the same as in the AMQP 0.9.1 x-death header. Instead of providing an array of maps, a better approach could be to use an array of a custom AMQP death type, such as: ```xml <amqp name="rabbitmq"> <section name="custom-types"> <type name="death" class="composite" source="list"> <descriptor name="rabbitmq:death:list" code="0x00000000:0x000000255"/> <field name="queue" type="string" mandatory="true" label="the name of the queue the message was dead lettered from"/> <field name="reason" type="symbol" mandatory="true" label="the reason why this message was dead lettered"/> <field name="count" type="ulong" default="1" label="how many times this message was dead lettered from this queue for this reason"/> <field name="time" mandatory="true" type="timestamp" label="the first time when this message was dead lettered from this queue for this reason"/> <field name="exchange" type="string" default="" label="the exchange this message was published to before it was dead lettered for the first time from this queue for this reason"/> <field name="routing-keys" type="string" default="" multiple="true" label="the routing keys this message was published with before it was dead lettered for the first time from this queue for this reason"/> <field name="ttl" type="milliseconds" label="the time to live of this message before it was dead lettered for the first time from this queue for reason ‘expired’"/> </type> </section> </amqp> ``` However, encoding and decoding custom AMQP types that are nested within arrays which in turn are nested within the message annotation map can be difficult for clients and the broker. Also, each client will need to know the custom AMQP type. For now, therefore we use an array of maps. ## Feature flag The new way to record death information is done via mc annotation `deaths_v2`. Because old nodes do not know this new annotation, recording death information via mc annotation `deaths_v2` is hidden behind a new feature flag `message_containers_deaths_v2`. If this feature flag is disabled, a message will continue to use the 3.13.0 - 3.13.2 way to record death information in mc annotation `deaths`, or even the older way within `x-death` header directly if feature flag message_containers is also disabled. Only if feature flag `message_containers_deaths_v2` is enabled and this message hasn't been dead lettered before, will the new mc annotation `deaths_v2` be used.

* Fix dead lettering # What? This commit fixes #11159, #11160, #11173. # How? ## Background RabbitMQ allows to dead letter messages for four different reasons, out of which three reasons cause messages to be dead lettered automatically internally in the broker: (maxlen, expired, delivery_limit) and 1 reason is caused by an explicit client action (rejected). RabbitMQ also allows dead letter topologies. When a message is dead lettered, it is re-published to an exchange, and therefore zero to multiple target queues. These target queues can in turn dead letter messages. Hence it is possible to create a cycle of queues where messages get dead lettered endlessly, which is what we want to avoid. ## Alternative approach One approach to avoid such endless cycles is to use a similar concept of the TTL field of the IPv4 datagram, or the hop limit field of an IPv6 datagram. These fields ensure that IP packets aren't cicrulating forever in the Internet. Each router decrements this counter. If this counter reaches 0, the sender will be notified and the message gets dropped. We could use the same approach in RabbitMQ: Whenever a queue dead letters a message, a dead_letter_hop_limit field could be decremented. If this field reaches 0, the message will be dropped. Such a hop limit field could have a sensible default value, for example 32. The sender of the message could override this value. Likewise, the client rejecting a message could set a new value via the Modified outcome. Such an approach has multiple advantages: 1. No dead letter cycle detection per se needs to be performed within the broker which is a slight simplification to what we have today. 2. Simpler dead letter topologies. One very common use case is that clients re-try sending the message after some time by consuming from a dead-letter queue and rejecting the message such that the message gets republished to the original queue. Instead of requiring explicit client actions, which increases complexity, a x-message-ttl argument could be set on the dead-letter queue to automatically retry after some time. This is a big simplification because it eliminates the need of various frameworks that retry, such as https://docs.spring.io/spring-cloud-stream/reference/rabbit/rabbit_overview/rabbitmq-retry.html 3. No dead letter history information needs to be compressed because there is a clear limit on how often a message gets dead lettered. Therefore, the full history including timestamps of every dead letter event will be available to clients. Disadvantages: 1. Breaks a lot of clients, even for 4.0. ## 3.12 approach Instead of decrementing a counter, the approach up to 3.12 has been to drop the message if the message cycled automatically. A message cycled automatically if no client expliclity rejected the message, i.e. the mesage got dead lettered due to maxlen, expired, or delivery_limit, but not due to rejected. In this approach, the broker must be able to detect such cycles reliably. Reliably detecting dead letter cycles broke in 3.13 due to #11159 and #11160. To reliably detect cycles, the broker must be able to obtain the exact order of dead letter events for a given message. In 3.13.0 - 3.13.2, the order cannot exactly be determined because wall clock time is used to record the death time. This commit uses the same approach as done in 3.12: a list ordered by death recency is used with the most recent death at the head of the list. To not grow this list endlessly (for example when a client rejects the same message hundreds of times), this list should be compacted. This commit, like 3.12, compacts by tuple `{Queue, Reason}`: If this message got already dead lettered from this Queue for this Reason, then only a counter is incremented and the element is moved to the front of the list. ## Streams & AMQP 1.0 clients Dead lettering from a stream doesn't make sense because: 1. a client cannot reject a message from a stream since the stream must maintain the total order of events to be consumed by multiple clients. 2. TTL is implemented by Stream retention where only old Stream segments are automatically deleted (or archived in the future). 3. same applies to maxlen Although messages cannot be dead lettered **from** a stream, messages can be dead lettered **into** a stream. This commit provides clients consuming from a stream the death history: #11173 Additionally, this commit provides AMQP 1.0 clients the death history via message annotation `x-opt-deaths` which contains the same information as AMQP 0.9.1 header `x-death`. Both, storing the death history in a stream and providing death history to an AMQP 1.0 client, use the same encoding: a message annoation `x-opt-deaths` that contains an array of maps ordered by death recency. The information encoded is the same as in the AMQP 0.9.1 x-death header. Instead of providing an array of maps, a better approach could be to use an array of a custom AMQP death type, such as: ```xml <amqp name="rabbitmq"> <section name="custom-types"> <type name="death" class="composite" source="list"> <descriptor name="rabbitmq:death:list" code="0x00000000:0x000000255"/> <field name="queue" type="string" mandatory="true" label="the name of the queue the message was dead lettered from"/> <field name="reason" type="symbol" mandatory="true" label="the reason why this message was dead lettered"/> <field name="count" type="ulong" default="1" label="how many times this message was dead lettered from this queue for this reason"/> <field name="time" mandatory="true" type="timestamp" label="the first time when this message was dead lettered from this queue for this reason"/> <field name="exchange" type="string" default="" label="the exchange this message was published to before it was dead lettered for the first time from this queue for this reason"/> <field name="routing-keys" type="string" default="" multiple="true" label="the routing keys this message was published with before it was dead lettered for the first time from this queue for this reason"/> <field name="ttl" type="milliseconds" label="the time to live of this message before it was dead lettered for the first time from this queue for reason ‘expired’"/> </type> </section> </amqp> ``` However, encoding and decoding custom AMQP types that are nested within arrays which in turn are nested within the message annotation map can be difficult for clients and the broker. Also, each client will need to know the custom AMQP type. For now, therefore we use an array of maps. ## Feature flag The new way to record death information is done via mc annotation `deaths_v2`. Because old nodes do not know this new annotation, recording death information via mc annotation `deaths_v2` is hidden behind a new feature flag `message_containers_deaths_v2`. If this feature flag is disabled, a message will continue to use the 3.13.0 - 3.13.2 way to record death information in mc annotation `deaths`, or even the older way within `x-death` header directly if feature flag message_containers is also disabled. Only if feature flag `message_containers_deaths_v2` is enabled and this message hasn't been dead lettered before, will the new mc annotation `deaths_v2` be used. (cherry picked from commit 6b300a2) # Conflicts: # deps/rabbit/app.bzl # deps/rabbit/src/mc_amqp.erl # deps/rabbit/src/rabbit_core_ff.erl # deps/rabbit/test/amqp_client_SUITE.erl * Fix conflicts and failing tests Extend message_containers_deaths_v2_SUITE to send 3 messages whose death histories will be stored in 3 different ways: 1. with feature flag message_containers disabled 2. with feature flag message_containers enabled, but message_containers_deaths_v2 disabled 3. with feature flag message_containers_deaths_v2 enabled --------- Co-authored-by: David Ansari <david.ansari@gmx.de>

This commit is a follow up of #11174 which broke the following Java client test: ``` ./mvnw verify -P '!setup-test-cluster' -Drabbitmqctl.bin=DOCKER:rabbitmq -Dit.test=DeadLetterExchange#deadLetterNewRK ``` The desired documented behaviour is the following: > routing-keys: the routing keys (including CC keys but excluding BCC ones) the message was published with This behaviour should be respected also for messages dead lettered into a stream. Therefore, instead of first including the BCC keys in the `#death.routing_keys` field and removing it again in mc_amqpl before sending the routing-keys to the client as done in v3.13.2 in https://github.com/rabbitmq/rabbitmq-server/blob/dc25ef53292eb0b34588ab8eaae61082b966b784/deps/rabbit/src/mc_amqpl.erl#L527 we instead omit directly the BCC keys from `#death.routing_keys` when recording a death event. This commit records the BCC keys in their own mc `bcc` annotation in `mc_amqpl:init/1`.

This commit is a follow up of #11174 which broke the following Java client test: ``` ./mvnw verify -P '!setup-test-cluster' -Drabbitmqctl.bin=DOCKER:rabbitmq -Dit.test=DeadLetterExchange#deadLetterNewRK ``` The desired documented behaviour is the following: > routing-keys: the routing keys (including CC keys but excluding BCC ones) the message was published with This behaviour should be respected also for messages dead lettered into a stream. Therefore, instead of first including the BCC keys in the `#death.routing_keys` field and removing it again in mc_amqpl before sending the routing-keys to the client as done in v3.13.2 in https://github.com/rabbitmq/rabbitmq-server/blob/dc25ef53292eb0b34588ab8eaae61082b966b784/deps/rabbit/src/mc_amqpl.erl#L527 we instead omit directly the BCC keys from `#death.routing_keys` when recording a death event. This commit records the BCC keys in their own mc `bcc` annotation in `mc_amqpl:init/1`. (cherry picked from commit 90a4010)

ansd added the backport-v3.13.x label May 6, 2024

mergify bot added the bazel label May 6, 2024

ansd force-pushed the deaths-v2 branch 6 times, most recently from b460de0 to e9bb1ef Compare May 8, 2024 09:25

ansd marked this pull request as ready for review May 8, 2024 09:26

ansd force-pushed the deaths-v2 branch from e9bb1ef to 9063f63 Compare May 8, 2024 09:34

kjnilsson self-requested a review May 8, 2024 09:38

kjnilsson requested changes May 10, 2024

View reviewed changes

deps/rabbit/include/mc.hrl Outdated Show resolved Hide resolved

deps/rabbit/src/mc.erl Outdated Show resolved Hide resolved

deps/rabbit/src/mc_amqpl.erl Outdated Show resolved Hide resolved

ansd added a commit that referenced this pull request May 10, 2024

Reuse death record

033d66d

Address PR feedback #11174 (comment)

ansd added a commit that referenced this pull request May 13, 2024

Do not depend on feature flags in mc

d3f50fb

Addresses PR feedback #11174 (comment)

ansd added a commit that referenced this pull request May 13, 2024

Reuse death record

9a84c34

Address PR feedback #11174 (comment)

ansd added a commit that referenced this pull request May 13, 2024

Do not depend on feature flags in mc

0a9afb9

Addresses PR feedback #11174 (comment)

ansd force-pushed the deaths-v2 branch from d3f50fb to 0a9afb9 Compare May 13, 2024 08:32

ansd force-pushed the deaths-v2 branch from a64f5f5 to 6b300a2 Compare May 13, 2024 09:00

ansd requested a review from kjnilsson May 13, 2024 10:10

kjnilsson approved these changes May 13, 2024

View reviewed changes

kjnilsson merged commit c35a0b8 into main May 13, 2024
18 checks passed

kjnilsson deleted the deaths-v2 branch May 13, 2024 11:23

mergify bot mentioned this pull request May 13, 2024

Fix dead lettering (backport #11174) #11219

Merged

This was referenced May 13, 2024

Add dead lettering docs for AMQP clients rabbitmq/rabbitmq-website#1913

Merged

3.13.0 - 3.13.2: Wrong warning messages that dead letter messages get dropped #11160

Closed

Include x-death header in Stream messages #11173

Closed

ansd mentioned this pull request May 14, 2024

Remove BCC from x-death routing-keys #11230

Merged

mergify bot mentioned this pull request May 14, 2024

Remove BCC from x-death routing-keys (backport #11230) #11232

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix dead lettering #11174

Fix dead lettering #11174

ansd commented May 6, 2024 •

edited

kjnilsson left a comment

Fix dead lettering #11174

Fix dead lettering #11174

Conversation

ansd commented May 6, 2024 • edited

What?

How?

Background

Alternative approach

3.12 approach

Streams & AMQP 1.0 clients

Feature flag

kjnilsson left a comment

Choose a reason for hiding this comment

ansd commented May 6, 2024 •

edited