[WIP] Async/Await Prototype #70

liamdiprose · 2019-11-16T11:49:51Z

Work in progress - Don't merge yet!

This PR presents and alternative API that makes full use of Python's async/await feature like other well-known python async libraries such as websockets and aiohttp. The client connection is handled by an async context manager, and event handling is achieved with the await keyword instead of callback functions:

async with gmqtt.connect('iot.eclipse.org') as client:
    await client.publish('test/message', 'hello world', qos=1)
    message = await client.subscribe('test/message')

Converting callbacks to `await`-ables

Callbacks can be converted to awaitable objects, which simplifies the user's code significantly:

Before

def on_message(client, topic, payload, qos, properties):
    print('RECV MSG:', payload)

client.on_message = on_message
client.subscribe('TEST/#', qos=0)

After

message = await client.subscribe('TEST/#', qos=0)
print('RECV MSG:', message.payload)

I also spotted a few areas in the codebase that makes heavy use callbacks that can be simplified using this method.

How

The various callback functions are made awaitable with a asyncio.Future object. This is how the on_connect callback can be adapted

async def connect(self, broker_host) -> MqttClientProtocol:
    future = self.loop.create_future()

    def _on_connect(client, flags, rc, properties):
        future.set_result(client)
    self.client.on_connect = _on_connect

    await self.client.connect(broker_host)
    return await future

Progress

Mixser

multiple threading / possible concurrency errors (Future isn't threadsafe)

For which purpose do you want use threading ?

Yes you are right, that Future is not threadsafe, but in my opinion this is problem of customer not problem of library. Because customer can easy to shoot yourself in the foot by using one event loop in a few threads and forget to call run_coroutine_threadsafe.

gmqtt/aioclient.py

Mixser · 2019-11-18T08:19:04Z

gmqtt/aioclient.py

+        self.loop = loop
+
+        self.cbclient = inner_client
+        self.message_queue = asyncio.Queue()


This is dangerous, because if your "reading-code" will be blocked for long time - it will produce high memory usage and we will get memory overflow. So it will be nice to limit queue by option in __init__ method.

Good point. I just looked at the flow control section of the MQTT spec. The broker won't send more messages than the client's send quota, initially set to receive_maximum. I imagine this is so the client's buffer doesn't overflow. The quota is incremented by a PUBACK, should we send this only once the message has been delivered to the user? (popped off the queue)

There might already be something like this in the codebase. I'll have a look around.

Yes, receive_maximum may be a good solution, but it's works only for qos1, qos2 not for qos0.

The quota is incremented by a PUBACK, should we send this only once the message has been delivered to the user?

In the current implementation you may choose when send PUBACK (by setting optimistic_acknowledgement to True or False) - after we received message (but before user process if by on_message callback) or after, it will be processed by on_message callback and will use result of it for PUBACK.

So in my opinion, it's good to save this behaviour and allow to user choose when PUBACK will be sent.

Sweet, I chased down the optimistic_acknowledgement setting and found that what you described; the PUBACK is sent before or after the on_message callback has been run. I think I understand the usecase for optimistic acknowledgements now.

Without optimistic acknowledgments, sub1 will continue to receive messages but will never respond with a PUBACK:

async with gmqtt.connect(...) as client: sub1 = client.subscribe(...) async for message in client.subscribe(...): print(message)

I used to think the lack of PUBACK's would prompt a retransmission, but that doesn't seem to be the case. Section 4.4 of the spec says:

When a Client reconnects with Clean Start set to 0 and a session is present, both the Client and Server MUST resend any unacknowledged PUBLISH packets (where QoS > 0) and PUBREL packets using their original Packet Identifiers. This is the only circumstance where a Client or Server is REQUIRED to resend messages. Clients and Servers MUST NOT resend messages at any other time.

This means the broker won't retransmit packets like I thought. PUBACK is a notification of the client "taking ownership" of the message; the point where we are happy to let the broker forget the message. To me, that means once the message is delivered to the user. Optimistic acknowledgements just disable the flow control feature of MQTT (designed to keep our buffer from overflowing), and saves the broker from storing the messages while they sit in our buffer; An extremely light burden for most cases.

I'm happy to add the option for optimistic acknowledgments, it should just be passing the parameter through to the standard client. I don't think I'd recommend the feature to most people though. Perhaps you can convince me otherwise? 😛

Currently, the __handle_publish_callback is handled after on_message has returned:

gmqtt/gmqtt/mqtt/handler.py

Lines 329 to 330 in 4fb92a9

run_coroutine_or_function(self.on_message, self, print_topic, packet, 2, properties,

callback=partial(self.__handle_publish_callback, qos=2, mid=mid))

For non-optimistic acknowledgements, the library should send the PUBACK as soon as the message is delivered to the user.

My wrapper will need a special case where the __handle_publish_callback is passed to the on_message handler rather than run after it completes.

Subclassing Client is one option.

gmqtt/aioclient.py

…load

liamdiprose · 2019-11-19T02:36:02Z

@Mixser Thanks for the review, I agree with all of your points, and corrected for most of them.

Threading

I was unsure if threading was going to be an issue or not (I have trust issues with callback functions). It sounds like threads aren't something we need to worry about, which is great.

Flow Control

You raised a good point with the infinitely-sized queue causing a possible memory overflow. MQTT has a flow-control method to prevent client buffer overflow, but depends on sending the PUBACK response once the message is removed from the queue and delivered to the user. Here's the solutions I can think of:

Remove my queue and have the message buffering handled by the current client, which I assume handles flow-control.
Modify the client to let the wrapper choose when the PUBACK is sent.
Ignore MQTT's flow control spec, restrict the message queue size and error on overflow.

Option 1 sounds preferable as this is a wrapper class, but I'm unsure if it can be implemented with the callback API you have.

Is there any way to stop the on_message callback from firing so the internal queue starts filling up?

If not, I imagine it will be easier to implement Option 2. In my opinion, Option 3 should not be considered as it breaks the spec and could lead to qos>0 packets being lost 😬.

We will also have to throw out a qos=0 packet on an overfull queue, as MQTT's flow control doesn't account for them. This might be implemented elsewhere in this project.

Subscriptions

I rethought the subscription part of the API and implemented an object-oriented alternative that better reflects the long-lived nature of subscriptions, and removes the ambiguity you demonstrated. Here it's obvious what topic subscription message will come from:

async with gmqtt.connect('iot.eclipse.org') as client:
    sub = await client.subscribe('test1/#')
    sub2 = await client.subscribe('test2/#')
    message = await sub2.receive()
    await sub2.unsubscribe()  # optional

I also implemented the subscribe method as a context manager that unsubscribes on exit:

async with gmqtt.connect('iot.eclipse.org') as client:
    async with client.subscribe('test1/#') as sub:
        print(await sub.receive())

And made subscription a generator, as I imagined in my initial issue #69 🎉

import gmqtt.aioclient as gmqtt
async with gmqtt.connect('iot.eclipse.org') as client:
    async with client.subscribe('test1/#') as sub:
        async for message in sub:
            print(message.payload)

Next Steps

Additional functions

I'd like to wrap up everything connection related into the gmqtt.connect function, as it makes gmqtt hard to misuse. I'll probably add the following as arguments:

Encryption
Authentication
Reconnection

Is there any other functionality I'm missing out?

Flow control

Decide on one of the three options, or something else I haven't considered

Testing

I imagine I can mock the existing client and test that my wrapper calls the right functions.

Add unit tests that properly test the new API.
Doctests might be nice as this is user-facing code.

Documentation

Docstrings for public classes and functions
Use API as the example the Readme?

…flow control

Mixser · 2019-12-06T14:50:00Z

gmqtt/message_queue.py

+                if "+" in level:
+                    raise ValueError("Single-level wildcard (+) only allowed by itself")
+
+    def match(self, topic: str) -> bool:


What do you think about using regexp ?
For example if a user passes a topic ("root/sub1/+/sub3") we can transform it in regexp ("root/sub1/([^/]+)/sub3") and code will be more clear.

Replacements: + -> ([^/]+), # ->(.+)

Yes, I am considering switching this to a regexp - I wonder how much faster it'll be.

I'm also considering using a tree structure to store the subscriptions, which I think will find all matching subscriptions with less work.

Tree structure will give us a speed boost if only you will have hundred of subscriptions. For a less then ten (in my opinion) it will be only overhead for iterating and managing this structure in memory.

But it's interesting and you may implement it and compare with for-loop and matching by regexp implementation.

Mixser · 2019-12-23T09:31:42Z

gmqtt/message_queue.py

+        # if over, attempt to drop qos=0 packet using `drop_policy`
+
+        # if under, add to appropiate queues:
+        for subscription_topic in self.subs:


It's all good, but we can reduce count of indents and make the code more beautiful.

for topic in filter(lambda x: x.match(message.topic)): for sub in self.subs[topic]: subs.put_nowait(message)

or

for subscription_topic in self.subs: if not subscription_topic.match(message.topic): continue for subscription in self.subs[subscription_topic]: subscription.put_nowait(message)

Mixser · 2019-12-23T09:39:49Z

gmqtt/message_queue.py

+                if "+" in level:
+                    raise ValueError("Single-level wildcard (+) only allowed by itself")
+
+    def match(self, topic: str) -> bool:


Tree structure will give us a speed boost if only you will have hundred of subscriptions. For a less then ten (in my opinion) it will be only overhead for iterating and managing this structure in memory.

But it's interesting and you may implement it and compare with for-loop and matching by regexp implementation.

Mixser · 2019-12-23T09:41:52Z

gmqtt/aioclient.py

+        receive_maximum = receive_maximum or 65665  # FIXME: Sane default?
+
+        self.subscription_manager = SubscriptionManager(receive_maximum)
+        # self.message_queue = asyncio.Queue(maxsize=receive_maximum or 0)


let's remove all old comments

Mixser · 2019-12-23T09:49:41Z

gmqtt/message_queue.py

+class TopicFilter:
+    def __init__(self, topic_filter: str):
+        self.levels = topic_filter.split("/")
+        TopicFilter.validate_topic(self.levels)


Do you have some reason to call validate_topic by naming TopicFilter?

It will generate some unexpected behavior if you will inherit from it (if you will inherit and will want to overwrite validate_topic, also you will need to overwrite __init__ and call super and call validate_topic manually):

class B(TopicFilter): @staticmethod def validate_topic(*args): print("Never happens") return super().validate_topic(*args) B("my-awesome-topic/bla")

okaestne · 2023-11-07T09:53:03Z

Hi! Is there still any interest in getting this PR ready to be merged? I'm currently implementing my own asyncio wrapper around this library. My main motivations are to be able to:

await the result of (un)subscriptions
call specialized callbacks for each subscription

I use this wrapper to implement the command-response pattern using the MQTT5 response topic/correlation data headers.

For 1) I'm simply tracking the mid and resolving an asyncio.Future on a matching (Un)SubAck.
For 2) I'm assigning a (random) subscription identifier (implies MQTT5), maintaining a sub_id -> callback mapping and filtering incoming messages to call the callbacks or fallback to the default on_message callback.

I think the two goals are common enough that this library could support its users by providing an appropriate (async) interface. I may be able to contribute some of the necessary changes either to this PR or separately if you are open to alternative approaches. I'd just like to ask you about your general opinion on the overall style (i.e. full asyncio wrapper as in this PR or additional _async methods?) and requirements regarding MQTT5-only features, e.g. specialized per-subscription callbacks only available when using MQTT5.

Mixser · 2023-11-14T14:08:25Z

Hi, thanks for interest in GMQTT 🎉

I'm simply tracking the mid and resolving an asyncio.Future on a matching (Un)SubAck.

Yes, this is the right way how it could be implemented, but you need to be careful how to manage reconnection and other staff because there is the possibility of leaving leftovers asyncio.Feature that will never be completed (eg. due to network reconnections)

For 2) I'm assigning a (random) subscription identifier (implies MQTT5), maintaining a sub_id -> callback mapping and filtering incoming messages to call the callbacks or fallback to the default on_message callback.

It's quite tricky because even if the server implements v5, it might not support subscription identifiers at all (see CONNACK Properties -> 3.2.2.3.12 Subscription Identifiers Available in spec )

About style, @Lenka42 could you support 😉

okaestne · 2023-11-14T20:41:31Z

Yes, this is the right way how it could be implemented, but you need to be careful how to manage reconnection and other staff because there is the possibility of leaving leftovers asyncio.Feature that will never be completed (eg. due to network reconnections)

Yeah that's definitely something to consider. I'm using asyncio.wait_for() to set a timeout and asyncio.Future.add_done_callback() to remove the Future from the tracking list. I didn't test how reconnects are handled yet.

It's quite tricky because even if the server implements v5, it might not support subscription identifiers at all (see CONNACK Properties -> 3.2.2.3.12 Subscription Identifiers Available in spec )

About style, @Lenka42 could you support 😉

Yes, that's another thing that could be tested when connecting. I understand that there is a potential conflict between flexibility and ease-of-use. Users should be still able to decide, which headers/properties are sent to the broker. I also can't tell how many v5 capable brokers do or do not support subscription IDs, at least mosquitto does 😁

Initial async-with and await API prototype

24af4e6

Mixser reviewed Nov 18, 2019

View reviewed changes

liamdiprose added 6 commits November 19, 2019 09:18

Rename wrapper class with postfix that is used elsewhere in codebase

27e77c3

Allow maximum queue size to be set to prevent a possible memory overflow

69b3b7e

Received messages are now the entire message object, not just the pay…

27a4e4e

…load

Finish renaming ClientWrapper class

a53d567

Initial implementation of Subscription object-based API

6dd3e73

Implement async iterator for subscription

f109600

Liam Diprose and others added 14 commits November 19, 2019 20:13

Make connect context-manager awaitable (see doctest)

09af3ec

Make client_id optional

5fe1f1f

Simplify filling in optional parameter

56e01b4

Add more parameters to connect function

ca1c613

Git-ignore mypy cache and hidden '.venv' directories

f6a0169

Use named argument

a6a7c48

Initial implimentation of subscription manager that conforms to MQTT …

1776fa9

…flow control

Add subscription manager for handling incoming messages

5786fee

Add tests for alternative async/await client API

da1dee7

Enable typechecking

9c8624c

Format async/await tests

73e9e82

Add tests for expect-duplicated messges

da05359

Finish renaming recv() method

cbf0062

Add FIXME note

995d3d8

Mixser reviewed Dec 6, 2019

View reviewed changes

liamdiprose requested a review from Mixser December 8, 2019 23:11

Mixser reviewed Dec 23, 2019

View reviewed changes

Work with port

d63353f

frederikaalund mentioned this pull request Apr 6, 2020

Async-with API #69

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Async/Await Prototype #70

[WIP] Async/Await Prototype #70

liamdiprose commented Nov 16, 2019 •

edited

Mixser left a comment

Mixser Nov 18, 2019 •

edited

liamdiprose Nov 18, 2019

liamdiprose Nov 18, 2019

Mixser Nov 19, 2019

liamdiprose Nov 21, 2019 •

edited

liamdiprose commented Nov 19, 2019

Mixser Dec 6, 2019

liamdiprose Dec 8, 2019

Mixser Dec 23, 2019

Mixser Dec 23, 2019

Mixser Dec 23, 2019

Mixser Dec 23, 2019

Mixser Dec 23, 2019

okaestne commented Nov 7, 2023

Mixser commented Nov 14, 2023 •

edited

okaestne commented Nov 14, 2023

	run_coroutine_or_function(self.on_message, self, print_topic, packet, 2, properties,
	callback=partial(self.__handle_publish_callback, qos=2, mid=mid))

[WIP] Async/Await Prototype #70

Are you sure you want to change the base?

[WIP] Async/Await Prototype #70

Conversation

liamdiprose commented Nov 16, 2019 • edited

Converting callbacks to await-ables

Before

After

How

Progress

Mixser left a comment

Choose a reason for hiding this comment

Mixser Nov 18, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liamdiprose Nov 21, 2019 • edited

Choose a reason for hiding this comment

liamdiprose commented Nov 19, 2019

Threading

Flow Control

Subscriptions

Next Steps

Additional functions

Flow control

Testing

Documentation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

okaestne commented Nov 7, 2023

Mixser commented Nov 14, 2023 • edited

okaestne commented Nov 14, 2023

liamdiprose commented Nov 16, 2019 •

edited

Converting callbacks to `await`-ables

Mixser Nov 18, 2019 •

edited

liamdiprose Nov 21, 2019 •

edited

Mixser commented Nov 14, 2023 •

edited