19. RabbitMQ Limitations

RabbitMQ is an open-source message broker, it allows to define queues to which applications connect in order to transfer messages. Rabbitmq originally implements the Advanced Message Queuing Protocol (AMQP) 0.9.1. But it supports the new AMQP protocol 1.0.0 and other protocols such as Streaming Text Oriented Messaging Protocol (STOMP) and MQ Telemetry Transport (MQTT).

ArmoniK.Core can use RabbitMQ message broker to manage task execution. After they are submitted, tasks wait in queues to be executed.

The protocols supported by RabbitMQ in ArmoniK.Core are:

AMQP 1.0.0

The configuration of RabbitMQ in ArmoniK.Core discussed in this article concerns the AMQP 1.0.0 protocol.

19.1. Messaging in RabbitMQ

RabbitMQ can be used when two services or applications need to communicate with each others. The RabbitMQ message broker sits in between the two services. In that way, the applications are decoupled.

The basic architecture of a queue system consists of three main components. There are producers and consumers, and between them lies the queue. Producers can create messages and deliver them to the queue. Similarly, consumers can consume messages from the queue and process them.

Note that producers do not directly publish messages to the queue but send them to an exchange. The exchange is connected to the queue(s). The exchange can be seen as a post office, it receives all the messages and routes them to the queue(s) they are addressed to. RabbitMQ message broker consisted of the exchange with the queue(s). The main components of RabbitMQ messaging system are represented in the following figure.

Alt text

19.2. RabbitMQ flow control

When sending messages to RabbitMQ, the producer (or publisher) can send them at a faster rate than the consumers’ consumption rate. This may happen if consumers slow down, or if producers suddenly publish a lot of messages. If the producer continues to send messages without any kind of flow control, the queue becomes congested. In systems with limited resources, this is critical as it may lead to dropped messages and reduced performance.

RabbitMQ has a flow control mechanism to prevent such scenarios. This mechanism is a way to apply back pressure on publishers to avoid overloading consumers. It allows to reduce the speed of the connections when publishing is too fast.

Credit-based flow control is a technique used by RabbitMQ to manage message flow between publishers and consumers. It concerns all the actors of the system, which means that each actor grants credits to the previous actor that is sending messages to it. The queue gives credit to the producers and the consumers give credit to the queue. The credit is the maximum number of messages that can be sent over a connection.

In other words, from producers point of view, credit flow control works by limiting the number of messages that a producer can send to the queue(RabbitMQ broker). When a producer connects to the queue, it is assigned a credit limit, which determines the maximum number of messages that it can send to the broker at any given time. Each time the producer sends a message, its credit balance is decremented by one. If the credit balance reaches zero, the producer must wait for RabbitMQ to grant it more credit before it can send any more messages.

This applies also to consumers attempting to retrieve messages from the queue. When a consumer is ready to receive messages, it sends a request for a certain amount of messages called a credit. Each consumer has a credit limit, which determines the maximum number of messages that the consumer can receive at any given time.

The credit system ensures that messages are not lost or dropped due to network congestion or other issues. It allows RabbitMQ to manage the rate at which messages are delivered to consumers, and helps to prevent consumers from becoming overwhelmed with too many messages at once.

19.3. Limitations of RabbitMQ in ArmoniK

ArmoniK is designed to manage situations where a high volume of tasks are submitted, but if execution is slow, this might cause a backlog. In such cases, task submission can exceed the rate of execution, and we need to handle this imbalance effectively. To support this, we allow publishers to outpace consumers. The current RabbitMQ flow control configuration in ArmoniK. Core is set up to enable this behavior.

19.3.1. Configuration of RabbitMQ in ArmoniK.Core

In ArmoniK, the consumers are given a credit of 2 messages, which means they can consume two messages at a time from the queue. This configuration was necessary to guarantee the load balancing between consumers.

If we allow more messages to be sent on the link, they can be all retrieved by a single consumer while others did not consume anything. So, while one consumer is so busy processing all the messages he retrieved from the queue, other consumers are idle, waiting for new messages to be available on the queue. A much more optimal solution is to have each consumer processing a lower number of messages. By setting the number of messages a consumer can consume at a time to 2, each consumer will retrieve two messages at a time, process them and then come back to the queue. If new messages are available, the consumer can consume two and so on.

On the other hand, the credits for publishing a message to the queue are much greater. When a producer wants to publish to a queue, the default parameters are:

credit_flow_default_credit = {InitialCredit, MoreCreditAfter}= {400,200}

The first value InitialCredit=400 represents the initial credit given to the publisher by the queue. Once the queue is able to process MoreCreditAfter=200 messages and sends acknowledgment for that, it will periodically give additional credit to the publisher of MoreCreditAfter=200 messages.

This means that publishing to the queue can be faster than pulling messages from it. This does not seem critical when RabbitMQ has enough resources. But in case of limited resources, consumers may slow down for prolonged periods of time. In that case, publishers may outpace consumers for a long period of time causing therefore a bottleneck and the link to the consumers to go down.

19.3.2. Overcoming the limitations

To avoid such scenarios, we need a minimum of resources (memory, disk) free to be used by RabbitMQ.

19.3.2.1. Minimal required resources

We will provide an example of minimum resources required for RabbitMQ for a given workload. You can use it to estimate the amount of resources you need for your case.

Tests to estimate needed resources were made by deploying ArmoniK.Core on local machine and running Bench tests. Tests were done with RabbitMQ version 3.11.15.

With 8Gb of RAM and 16Gb of disk space dedicated to RabbitMQ, you can run bench tests of 10000 tasks without problems linked to credit values.

The following table shows that with 8Gb of RAM and 16Gb of disk space dedicated to RabbitMQ, you can run bench tests of 10000 tasks without problems linked to credit values.

Amount of disk space	Amount of RAM	Number of tasks
16Gb	8Gb	10000

19.3.2.2. Resources and configuration of RabbitMQ

To see the amount of resources(memory, disk space) available for RabbitMQ, you can use RabbitMQ docker image. After deploying ArmoniK.Core with RabbitMQ, run via the command line:

docker exec queue rabbitmq-diagnostics status

Information on how to deploy ArmoniK.Core can be found here.

You can also view the available resources in the RabbitMQ management UI, which can be accessed using a web browser by connecting to the port 15672.

The amout of resources you will see correspond to the available resources but they are not necessarily dedicated to RabbitMQ.

There are also two configuration parameters that you can configure to throttle producers in order to avoid running out of resources.

Memory high watermark: this parameter is part of the configuration settings in RabbitMQ under the name vm_memory_high_watermark. It defines the memory threshold at which flow control is triggered. By default it is 0.4, which stands for 40% of the available RAM, but this value can be changed. This means that if the available RAM drops below 40% of the RAM available at startup, producers will be completely blocked until some amount of RAM is freed and the available RAM raises up to the defined threshold. Note that this parameter does not mean that RabbitMQ uses 40% of RAM, it is just the limit at which producers are throttled.
Free disk limit : this parameter is configured by the variable disk_free_limit . By default it is set to 50Mb, this means that if free disk space drops below this value, producers will be blocked until disk space is freed. This parameter can be configured also as a value relative to the free disk space available at startup.

You can refer to RabbitMQ production checklist to tune these parameters for your deployments. Configuring these parameters is useful even if memory and disk space are dedicated to RabbitMQ.

19.4. TLS Setup for RabbitMQ

To secure communications with RabbitMQ, the Transport Layer Security (TLS) protocol has been implemented. However, TLS support varies depending on the operating system used.

19.4.1. TLS on Linux

On Linux systems, RabbitMQ natively supports TLS. The TLS configuration has been implemented following these steps:

TLS Certificate Generation: TLS certificates were generated for the RabbitMQ server and clients using Terraform’s TLS provider. This setup includes creating a private key, a self-signed CA certificate, a certificate signing request, and a locally signed certificate. The certificates are then saved to the local file system for use by RabbitMQ.
RabbitMQ Configuration: The RabbitMQ configuration file (rabbitmq.conf) was modified to include paths to the TLS certificates and enable TLS support.

listeners.ssl.default = 5671
ssl_options.cacertfile = /rabbitmq/certs/ca.pem
ssl_options.certfile = /rabbitmq/certs/rabbit.crt
ssl_options.keyfile = /rabbitmq/certs/rabbit.key
ssl_options.verify = verify_peer
ssl_options.fail_if_no_peer_cert = false

19.4.2. TLS on Windows

On Windows systems, implementing TLS for RabbitMQ is more complex. Currently, we use the Docker image micdenny/rabbitmq-windows, which does not natively support TLS. To implement TLS on this image, it would be necessary to rewrite the Docker image to include TLS configuration, which can be a complex and error-prone process.

Due to these limitations, we recommend using RabbitMQ on Linux systems to benefit from native TLS support.

For more details on TLS configuration and other advanced settings, refer to the official RabbitMQ documentation and configuration guides specific to your environment.