Fix handling of "message too large" with autoscaling #143

wrouesnel · 2023-12-30T22:24:03Z

Modify the default message pump loop to automatically increase the message size and wait for reconnect when using get* commands (hereby in the category of "retryable" commands).

This fixes getting data for large numbers of entities (my HA instance has 47000+ out of the box).

To facilitate this, more information is now returned from the client object when cancelling errors - namely we bubble the error which is causing the cancellation through to the task cancellation, which allows the caller to make an informed choice about what to do (in this case, wait and retry so long as the error is only "message too long").

The other change is to add an event for connected - this lets our retryable processes efficiently wait for the reconnect loop to succeed (which should be immediate with the message size problem).

setuptools is also bumped to version 64.0 which supports proper editable installs.

Modify the default message pump loop to automatically increase the message size and wait for reconnect when using get* commands (hereby in the category of "retryable" commands). This fixes getting data for large numbers of entities.

marcelveldt · 2024-02-13T15:29:35Z

I'm sorry - I completely missed your PR while I was refactoring this entire client.

marcelveldt · 2024-02-13T15:33:05Z

hass_client/client.py

-        return await self.send_command("config/entity_registry/get", entity_id=entity_id)
+        return await self.send_retryable_command("config/entity_registry/get", entity_id=entity_id)
+
+    async def send_retryable_command(self, command: str, **kwargs: dict[str, Any]) -> CommandResultData:


I'm not sure if I'm a fan of this pattern. Why not introduce an argument to specify if the command may be retried ?
Also, I think it would be good to specify a number of retries. So for example introduce an argument "retries:int =3" and thus by default retry a command 3 times on common connection errors

marcelveldt · 2024-02-13T15:34:28Z

hass_client/client.py

+            if len(e.args) > 0:
+                # Raise the inner exception
+                raise e.args[0] from e


what is this ?

It is not very common to catch the CancelledError, why are you doing this like this ?

The websocket connection shuts down when a TooLarge error message is returned, which means we experience the failure as a cancellation. But we need to know what actually caused it to shutdown, which is packaged as the first parameter of e.args - since from SendCommand we'd like to get the actual exception, not just "everything was aborted".

The relevant usage is here:

async def send_retryable_command(self, command: str, **kwargs: dict[str, Any]) -> CommandResultData: """Send a command to the HA websocket and return response. Retry on failure.""" while True: try: return await self.send_command(command, **kwargs) except ConnectionFailedDueToLargeMessage: LOGGER.debug("Connection failed due to large message - waiting for reconnect and then retrying") await self.wait_for_connection()

where we want to either keep raising the exception (because it's not something known harmless), or handle it because we actually know what it is (ConnectionFailedDueToLargeMessage). Otherwise calling code doesn't know why the connection was ended.

Of note: this catch could be other things, but I don't know what they should be at the moment (a reasonable one would be something like "ConnectionLostRetriesExceeded" or something.)

marcelveldt · 2024-02-13T15:36:26Z

hass_client/client.py

+                    if msg.data.code == aiohttp.WSCloseCode.MESSAGE_TOO_BIG:
+                        # Parse the attempted size out, and schedule a reconnect.
+                        if (m := re.match(r"Message size (\d+)", msg.data.args[1])) is not None:
+                            attempted_message_size = int(m.group(1))
+                            # Set to 2x what they attempted to send us so hopefully we'll succeed
+                            # on reconnect.
+                            self._max_msg_size = attempted_message_size * 2
+                            raise ConnectionFailedDueToLargeMessage()
+


I think you have a very specific usecase here, it is not very common to have THIS MANY entities.
So my suggestion would be to just increase the default message limit for now and keep it simple.

The theory is that the user is already aware, or doesn't care - which for me is true because my HA installation operates just fine (hence how I discovered this issue - turning on iBeacons probably did it and I've not cleaned up yet).

So the logic is that if HA doesn't hit this (and it doesn't seem to have any limit other then browser memory) then the user probably doesn't care either for any practical case because this is HA-specific.

Basically: if the normal HA client isn't breaking (which is much heavier on memory) then the Python client shouldn't break either - it would Just Work(TM).

The other side of that is an obvious reason to use this library would be to do something like scripting a bulk clean up entities as well.

EDIT: Basically I'd argue usability wise, the constraint goes the other way - if reconnects are automatic, then updating receive size limits should be automatic too unless the the user specifically has a reason they want to constrain it.

Makes sense, although I'm a bit worried about the readability/maintainability of this client with this added overhead.
If you adjust the PR so its compatible with the latest refactor, I'm fine merging it.

Another, more simple approach could be to just have a config param to set the default message size for more advanced setups.

Fix handling of "message too large" with autoscaling

b5f2689

Modify the default message pump loop to automatically increase the message size and wait for reconnect when using get* commands (hereby in the category of "retryable" commands). This fixes getting data for large numbers of entities.

marcelveldt reviewed Feb 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling of "message too large" with autoscaling #143

Fix handling of "message too large" with autoscaling #143

wrouesnel commented Dec 30, 2023 •

edited

marcelveldt commented Feb 13, 2024

marcelveldt Feb 13, 2024

marcelveldt Feb 13, 2024

wrouesnel Feb 13, 2024 •

edited

marcelveldt Feb 13, 2024

wrouesnel Feb 13, 2024 •

edited

marcelveldt Feb 14, 2024

Fix handling of "message too large" with autoscaling #143

Are you sure you want to change the base?

Fix handling of "message too large" with autoscaling #143

Conversation

wrouesnel commented Dec 30, 2023 • edited

marcelveldt commented Feb 13, 2024

marcelveldt Feb 13, 2024

Choose a reason for hiding this comment

marcelveldt Feb 13, 2024

Choose a reason for hiding this comment

wrouesnel Feb 13, 2024 • edited

Choose a reason for hiding this comment

marcelveldt Feb 13, 2024

Choose a reason for hiding this comment

wrouesnel Feb 13, 2024 • edited

Choose a reason for hiding this comment

marcelveldt Feb 14, 2024

Choose a reason for hiding this comment

wrouesnel commented Dec 30, 2023 •

edited

wrouesnel Feb 13, 2024 •

edited

wrouesnel Feb 13, 2024 •

edited