Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.x: reduce default maximum message size further (e.g. to 64 or 50 MiB) #11187

Open
michaelklishin opened this issue May 7, 2024 · 12 comments
Open
Milestone

Comments

@michaelklishin
Copy link
Member

This is a follow-up to #1812.

We can and should lower the limit from its current 128 MiB. The idea in #1812 was to gradually lower it from its original 512 MiB default but we have missed the opportunity to go to, say, 64 MiB in 3.13.x. Oh well.

I guess 4.0 can use a 64 MiB or a 50 MiB limit.

Background: #11072.

@michaelklishin michaelklishin changed the title 4.x: reduce maximum message size further (e.g. to 64 or 50 MiB) 4.x: reduce default maximum message size further (e.g. to 64 or 50 MiB) May 7, 2024
@michaelklishin michaelklishin added this to the 4.0.0 milestone May 7, 2024
@gomoripeti
Copy link
Contributor

thank you for creating this ticket.

I don't have a good sense of what is a typical and acceptable large message, we don't have good statistics yet about them. We only notice large messages when they cause problems.

Therefore I'm in favour of using the opportunity of 4.0 being a major version bump with breaking changes and reduce the default more drastically, let's say 1MB or 4MB. (This is just default, so users who need them can adjust it)

It's easier to gather metrics about average sizes than maximum size, but I will try to gather some data before 4.0.

@michaelklishin
Copy link
Member Author

@gomoripeti 50 MiB should be plenty for most, plus this is the default, not the hard limit. We should keep reducing the hard limit all the way down to something like 100 MiB.

@michaelklishin
Copy link
Member Author

This is one of those settings where any default is "wrong". What we can do is be more defensive with a lower limit, so 50 MiB is as good as 64 MiB.

@carlhoerberg
Copy link
Contributor

carlhoerberg commented May 16, 2024

For the record, I'm fully against this change. While it might not be optimal the broker should handle large messages and any new artificial limit will break peoples production code.

@kjnilsson
Copy link
Contributor

kjnilsson commented May 16, 2024

I disagree it is artificial (although we should perform further testing). The fact that RabbitMQ has not had any sensible limits in the past just means that the problem of large messages has never been solved. RabbitMQ queue type storage engines (particularly replicated ones) aren't optimised or designed for very large message sizes. Therefore it is essential that the broker is limited to a range that we can validate it can cope with well.

This is not without precedence: Azure Service Bus standard tier has a limit of 256KB rising to 100MB for the premium tier. SQS is also limited to 256KiB and, IIRC, uses S3 via their client libraries to store larger messages and pass a reference around.

Lowering max message size limits will force users to design their messaging systems in a more reliable and safe way. This is, IMO, a very good thing.

@carlhoerberg
Copy link
Contributor

That's a very bad thing, it forces ppl off RabbitMQ. What RabbitMQ should do is to optimize for that use case. What is the fundamental problem (don't look at RabbitMQ in its current state) to have large messages in a stream or message queue? How is it more efficient from an end users perspective to add another layer in their stack, the same bytes still have to be uploaded and downloaded somewhere? Why not together with the message metadata?

@lhoguin
Copy link
Contributor

lhoguin commented May 16, 2024

When the message reaches sizes as high as 50MiB or more, there isn't much that can be done to optimise anything. Best we can do is ensure rabbit doesn't blow up, and even then only so much can be done. A message broker that deals with small messages is a very different beast to one that deals with very large messages. For messages this size we would likely require a separate component: meaning that within rabbit we wouldn't keep the payload and metadata together (the same thing users should do today).

That said, 256KiB is a bit low, we definitely can handle more than that at least in classic queues and I'm sure in other parts of rabbit as well.

Changing the default to 50MiB wouldn't change the problem much, it's still high enough that we can't completely prevent nodes from blowing up. But it's a step toward a safer default. Ideally the default would be a size we actually test during development.

Finally, we are only changing the default. Users can easily configure greater values if needed.

@michaelklishin
Copy link
Member Author

michaelklishin commented May 16, 2024

@carlhoerberg a workload with 2 kiB messages and a workload with 200 MiB messages are two very different workloads. This is true even for some key-value stores (supposedly an "easy problem to solve" in distributed data services) which store large files differently, often chunked instead of storing a single blob.

The days of RabbitMQ being optimized for everything and nothing in particular are long gone. Queue types and streams are two obvious example of that.

Given that some popular messaging systems have 256 kiB defaults unless you pay extra, RabbitMQ's current limits of 128 to 512 MiB are really generous, and even 50 or 64 MiB to 256 MiB would be generous.

This is not even to mention what a set of deliveries of 256 MiB messages usually does to consumer heap sizes, in most cases they explode and the only way to avoid that is to use a disk-based storage mechanism in the client. Which would drastically increase the complexity and the amount of maintenance those clients require.

Our team is small and we are already stretched supporting dozens of clients, developing our own AMQP 1.0 clients, stream clients, etc.

@michaelklishin
Copy link
Member Author

michaelklishin commented May 16, 2024

While it might not be optimal the broker should handle large messages and any new artificial > limit will break peoples production code

@carlhoerberg you are welcome to use your company's data set to see how many users actually use messages larger than 64 MiB and 512 MiB. From our team's experience, not many and those who do generally agree that storing large messages in a blob store is a reasonable idea. In my 14 years around RabbitMQ, I recall exactly one case where very large message support was specifically required and using a blob store was impossible.

Anyone is welcome to develop a new queue type and a set of client library optimizations outlined above, and demonstrate their effectiveness, then open source their work or its key parts, like the RabbitMQ core team has been doing since 2007. In particular, the companies that directly make money off of open source RabbitMQ are most welcome to contribute substantial improvements for specific problems and workloads, taking the needs of the proverbial 90% of users in consideration.

@michaelklishin
Copy link
Member Author

michaelklishin commented May 16, 2024

RabbitMQ queue type storage engines (particularly replicated ones) aren't optimised or designed for very large message sizes

To expand on this: the core team does run various workloads with large messages (up to a few dozens of MiBs). We know well what kind of effects it has both on the message stores used in RabbitMQ and on the inter-node connectivity links.

Today RabbitMQ uses Erlang distribution for most key distribution/replicated features, such as: message routing between nodes (performed by connections/channels, depending on the protocol), quorum queues, the schema data store (including Khepri), the HTTP API (where nodes aggregate metrics from other nodes, unlike with Prometheus). Streams use dedicated TCP connections and that is one of the key factors behind their efficiency with practical workloads.

How is that relevant to this issue? Very directly: Erlang/OTP has long had a problem with distribution links being a bottleneck when (Erlang) messages in a workload are large. Starting with Erlang 22, the inter-node communication protocol has changed to use fragmented (chunked) transfers and that has made a massive positive difference for RabbitMQ cluster stability with loads close to peak.

And yet, we have since decided to use dedicated connections for streams, and that turned out
to be the right decision for multiple technical reasons.

So workloads with large and very large messages (I'd call that all messages above 8 MiB and 32 MiB, respectively) do put stress to various parts of the system that may not seem related.

An attempt to support large and very large messages would require changing inter-node communication in many — in fact, most — places, and it is pretty difficult to justify given
that comfortably over 95% of users do not use messages larger than 32 MiB from the core team's experience.

This would likely force us to make Ra borrow the replication implementation from Osiris (the foundational library for RabbitMQ Streams), then see how well that works for QQs and Khepri, then produce a major Ra version, consider an upgrade path (a royal pain of most significant changes in any distributed data service), then integrate it and see how much of a difference it makes in practice in RabbitMQ. Assuming that it all goes well, this may ship in RabbitMQ 5.x.

Or, perhaps, key contributors can simply agree that we can improve things somewhat for large messages (under 8 MiB, such as #11248) but that very large messages simply belong to a blob store. In particular given that some companies charge you extra if you want to publish messages over just 256 kiB in size.

@michaelklishin
Copy link
Member Author

michaelklishin commented May 16, 2024

And finally, speaking of links and data-based decisions.

A 50 MiB message would require a 50MiB * 8 bits per byte = 400MBit/s network link to transfer just one message's payload per second. At 125MiB that'd be 1 GBit/s. A 500 MiB message? 4 GBit/s.

Most typical links offered by many infrastructure providers is usually lower than 1 GBit/s.
So to run a realistic workload where messages are that large would require using
an amount of bandwidth well above the current "industry norm" (a commonly offered amount of resources on all but the most expensive plans).

By using a blob store you get some if not all of the well known benefits of caching: less
data to transfer as only the apps that need to access the complete blob will fetch it (as opposed
to gigantic messages being shoved down their throats).

And if you think that network bandwidth is dirt cheap and "infinite", consider this story. Apparently
very large data volumes present all kinds of less-than-obvious problems, even for companies with access to amazing infrastructure engineering teams.

So perhaps a limit of 50 MiB, or even 16 MiB by default and a hard limit of 512 MiB or 256 MiB is not crazy talk after all.

@michaelklishin
Copy link
Member Author

One of the RabbitMQ-as-a-Service companies that contributes to RabbitMQ has done an initial assessment of their user base's message size.

According to them, across thousands and thousands of clusters the percentage of users who publish messages of 50 MiB or more is really, really low. In fact, so low that they have
suggested a much lower default limit than 50 MiB.

Without collecting this metric in RabbitMQ core it's not something easy to measure but those who try to do it arrive at the same conclusions as the core team.

@rabbitmq rabbitmq deleted a comment from Openegg15 May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants