Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of "message too large" with autoscaling #143

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wrouesnel
Copy link

@wrouesnel wrouesnel commented Dec 30, 2023

Modify the default message pump loop to automatically increase the message size and wait for reconnect when using get* commands (hereby in the category of "retryable" commands).

This fixes getting data for large numbers of entities (my HA instance has 47000+ out of the box).

To facilitate this, more information is now returned from the client object when cancelling errors - namely we bubble the error which is causing the cancellation through to the task cancellation, which allows the caller to make an informed choice about what to do (in this case, wait and retry so long as the error is only "message too long").

The other change is to add an event for connected - this lets our retryable processes efficiently wait for the reconnect loop to succeed (which should be immediate with the message size problem).

setuptools is also bumped to version 64.0 which supports proper editable installs.

Modify the default message pump loop to automatically increase the
message size and wait for reconnect when using get* commands (hereby
in the category of "retryable" commands).

This fixes getting data for large numbers of entities.
@marcelveldt
Copy link
Member

I'm sorry - I completely missed your PR while I was refactoring this entire client.

return await self.send_command("config/entity_registry/get", entity_id=entity_id)
return await self.send_retryable_command("config/entity_registry/get", entity_id=entity_id)

async def send_retryable_command(self, command: str, **kwargs: dict[str, Any]) -> CommandResultData:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I'm a fan of this pattern. Why not introduce an argument to specify if the command may be retried ?
Also, I think it would be good to specify a number of retries. So for example introduce an argument "retries:int =3" and thus by default retry a command 3 times on common connection errors

Comment on lines +236 to +238
if len(e.args) > 0:
# Raise the inner exception
raise e.args[0] from e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this ?

It is not very common to catch the CancelledError, why are you doing this like this ?

Copy link
Author

@wrouesnel wrouesnel Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The websocket connection shuts down when a TooLarge error message is returned, which means we experience the failure as a cancellation. But we need to know what actually caused it to shutdown, which is packaged as the first parameter of e.args - since from SendCommand we'd like to get the actual exception, not just "everything was aborted".

The relevant usage is here:

    async def send_retryable_command(self, command: str, **kwargs: dict[str, Any]) -> CommandResultData:
        """Send a command to the HA websocket and return response. Retry on failure."""
        while True:
            try:
                return await self.send_command(command, **kwargs)
            except ConnectionFailedDueToLargeMessage:
                LOGGER.debug("Connection failed due to large message - waiting for reconnect and then retrying")
                await self.wait_for_connection()

where we want to either keep raising the exception (because it's not something known harmless), or handle it because we actually know what it is (ConnectionFailedDueToLargeMessage). Otherwise calling code doesn't know why the connection was ended.

Of note: this catch could be other things, but I don't know what they should be at the moment (a reasonable one would be something like "ConnectionLostRetriesExceeded" or something.)

Comment on lines +349 to +357
if msg.data.code == aiohttp.WSCloseCode.MESSAGE_TOO_BIG:
# Parse the attempted size out, and schedule a reconnect.
if (m := re.match(r"Message size (\d+)", msg.data.args[1])) is not None:
attempted_message_size = int(m.group(1))
# Set to 2x what they attempted to send us so hopefully we'll succeed
# on reconnect.
self._max_msg_size = attempted_message_size * 2
raise ConnectionFailedDueToLargeMessage()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have a very specific usecase here, it is not very common to have THIS MANY entities.
So my suggestion would be to just increase the default message limit for now and keep it simple.

Copy link
Author

@wrouesnel wrouesnel Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The theory is that the user is already aware, or doesn't care - which for me is true because my HA installation operates just fine (hence how I discovered this issue - turning on iBeacons probably did it and I've not cleaned up yet).

So the logic is that if HA doesn't hit this (and it doesn't seem to have any limit other then browser memory) then the user probably doesn't care either for any practical case because this is HA-specific.

Basically: if the normal HA client isn't breaking (which is much heavier on memory) then the Python client shouldn't break either - it would Just Work(TM).

The other side of that is an obvious reason to use this library would be to do something like scripting a bulk clean up entities as well.

EDIT: Basically I'd argue usability wise, the constraint goes the other way - if reconnects are automatic, then updating receive size limits should be automatic too unless the the user specifically has a reason they want to constrain it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, although I'm a bit worried about the readability/maintainability of this client with this added overhead.
If you adjust the PR so its compatible with the latest refactor, I'm fine merging it.

Another, more simple approach could be to just have a config param to set the default message size for more advanced setups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants