Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] OTA Support #154

Closed
puddly opened this issue May 11, 2019 · 119 comments
Closed

[RFC] OTA Support #154

puddly opened this issue May 11, 2019 · 119 comments
Labels

Comments

@puddly
Copy link
Collaborator

puddly commented May 11, 2019

Implementing support for OTA upgrades would practically eliminate the need for unnecessary hubs. Below is the event listener I used to successfully upgrade six IKEA Trådfri 1000lm bulbs (copied from an unrelated issue):

import asyncio
import logging
import contextlib

from collections import defaultdict

import aiohttp
from zigpy.zcl.foundation import Status
from zigpy.zcl.clusters.general import Ota

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)



class HerdLock:
    '''
    A lock that indicates to the caller if they are the first to acquire it.
    If not, the context manager waits until the lock is released. Used like this:

        lock = HerdLock()
        expensive = None

        async def worker():
            async with lock('LOCK_NAME') as is_first:
                if is_first:
                    print('Actually performing task')
                    await asyncio.sleep(5)
                    expensive = 'got it'

            # Here we can assume `expensive` is set and that only one worker actually fetched it
            print(expensive)

        await asyncio.gather(worker(), worker(), worker(), worker())
    '''

    def __init__(self):
        self.semaphores = defaultdict(asyncio.Semaphore)

    @contextlib.asynccontextmanager
    async def __call__(self, key):
        semaphore = self.semaphores[key]

        try:
            is_first = not semaphore.locked()

            async with semaphore:
                yield is_first
        finally:
            # Delete the lock if we're the last one to release it
            if not semaphore.locked():
                self.semaphores.pop(key, None)


class TrådfriOTAMainListener:
    UPDATE_URL = 'https://fw.ota.homesmart.ikea.net/feed/version_info.json'
    OTA_HEADER = 0x0BEEF11E.to_bytes(4, 'little')
    MAXIMUM_DATA_SIZE = 40

    def __init__(self):
        self._firmware_cache = {}
        self._lock = HerdLock()

    def cluster_command(self, tsn, command_id, args, ota_cluster):
        asyncio.create_task(self._cluster_command(tsn, command_id, args, ota_cluster))

    async def _cluster_command(self, tsn, command_id, args, ota_cluster):
        if not self._firmware_cache:
            logger.info('Downloading firmware update list from IKEA')

            async with self._lock('DOWNLOAD_VERSION_INFO') as was_first:
                if was_first:
                    async with aiohttp.ClientSession() as client:
                        async with client.get(self.UPDATE_URL) as response:
                            firmwares = await response.json(content_type='application/octet-stream')

                        self._firmware_cache.clear()

                        for fw in firmwares:
                            if 'fw_file_version_MSB' not in fw:
                                continue

                            fw['fw_file_version'] = (fw['fw_file_version_MSB'] << 16) | fw['fw_file_version_LSB']
                            self._firmware_cache[(fw['fw_manufacturer_id'], fw['fw_image_type'])] = fw

        if command_id == 0x0001:  # query_next_image
            field_control, manufacturer_code, image_type, current_file_version, hardware_version = args
            key = (manufacturer_code, image_type)

            logger.info('Received an OTA image query from %s', ota_cluster.endpoint.device)

            # We don't know what this device is
            if key not in self._firmware_cache:
                await ota_cluster.query_next_image_response(Status.NO_IMAGE_AVAILABLE, 0x0000, 0x0000, 0x00000000, 0x00000000)
                return

            fw = self._firmware_cache[key]

            # Tell the device we're ready to continue
            await ota_cluster.query_next_image_response(Status.SUCCESS, manufacturer_code, image_type, fw['fw_file_version'], fw['fw_filesize'])
        elif command_id == 0x0003:  # image_block
            field_control, manufacturer_code, image_type, \
            file_version, file_offset, maximum_data_size, \
            request_node_address, block_request_delay = args

            # We assume at this point we won't be getting unsolicited requests for blocks
            key = (manufacturer_code, image_type)
            fw = self._firmware_cache[key]

            # Download the firmware on demand
            if 'fw_data' not in fw:
                async with self._lock(f'DOWNLOAD_FW:{fw["fw_binary_url"]}') as was_first:
                    if was_first:
                        async with aiohttp.ClientSession() as client:
                            async with client.get(fw['fw_binary_url']) as response:
                                data = await response.read()

                        # The IKEA images wrap the Zigbee OTA file in a container
                        offset = data.index(self.OTA_HEADER)
                        fw['fw_data'] = data[offset:offset + fw['fw_filesize']]
                        assert len(fw['fw_data']) == fw['fw_filesize']

            logger.debug('Firmware upgrade progress: %0.2d', 100.0 * file_offset / fw['fw_filesize'])
            data = fw['fw_data'][file_offset:file_offset + min(self.MAXIMUM_DATA_SIZE, maximum_data_size)]

            await ota_cluster.image_block_response(Status.SUCCESS, manufacturer_code, image_type, file_version, file_offset, data)
        elif command_id == 0x0006:  # upgrade_end
            status, manufacturer_code, image_type, file_version = args

            # Upgrade right now
            await ota_cluster.upgrade_end_response(manufacturer_code, image_type, file_version, 0x00000000, 0x00000000)


class TrådfriOTAListener:
    def __init__(self, ota_cluster, main_listener):
        self.ota_cluster = ota_cluster
        self.main_listener = main_listener

    def cluster_command(self, tsn, command_id, args):
        logger.info('Received an OTA cluster command from %s', self.ota_cluster.endpoint.device)

        self.main_listener.cluster_command(tsn, command_id, args, self.ota_cluster)

The Trådfri remotes seem to work too. Unfortunately, I was not able to get my RGB bulbs to upgrade despite a newer firmware version being available so I believe further device quirks will need to be handled by https://github.com/dmulcahey/zha-device-handlers. The core OTA functionality, however, should belong in zigpy since it doesn't depend on any specific adapter interface and can basically be a transcription of the process described in the Zigbee specification (see pages 27+).

What do you think?

@puddly
Copy link
Collaborator Author

puddly commented May 11, 2019

Before any of this is actually useful, OTA upgrade files need to be obtained. I've posted my method for IKEA's products until they change things around. A generic solution would be to sniff the traffic, though it would be nicer to reverse engineer the methods used by each manufacturer-specific gateway to check and fetch updates.

The only other devices I own are battery powered Xiaomi Aqara sensors and buttons. I've been sniffing both the Zigbee and network traffic of my Aqara gateway but it appears that the gateway does not check for OTA upgrades. Similarly, the Xiaomi cloud only sends back OTA upgrade files for mains-powered devices so I suspect there's little else to be done for the battery powered Xiaomi devices (though there seem to be efforts to create open-source firmware for them).

If anybody has a Trådfri hub and is able to capture the firmware upgrade traffic of a RGB bulb (you need to just power cycle the bulb and wait for ~60 seconds) it would be really helpful. You can use Wireshark and the nRF52840 dongle ($10 + shipping from Mouser/Digikey) or the common HUSBZB-1.

@Adminiuga
Copy link
Collaborator

Well, your code inspired me to look into this more closely. Here's my initial approach which would require some cleaning https://github.com/Adminiuga/zigpy/tree/feature/ota And since we're still supporting Python 3.5, I made some changes to run it on py35.
Out of my 6 Ikea devices, only one light bulb and "5-button remote" had an older firmware and successfully pulled the updates.

For firmware updates you are right, the easiest way is to leech it from the vendor hub, but distributing those firmwares probably wouldn't be welcome. Don't know, maybe baloob would be able to put some weight and make an agreement with some vendors to get access to their firmwares. Not sure how much vendors would be interested.

@puddly
Copy link
Collaborator Author

puddly commented May 11, 2019

That's a much cleaner approach. I do agree that it would be unwise to distribute the firmware upgrades in-tree although I'm not sure if an open HTTP endpoint like IKEA's falls in the same category.

Honestly, while the Zigbee spec attempts to promote interoperability, it really has failed here. Everything but the actual firmware upload mechanism is left up to the vendor, which naturally leads to closed hubs. In an ideal world every Zigbee device would have a readable attribute part of its OTA cluster that returns an upgrade URL with a JSON response similar to IKEA's.

We could also fork over $7k (or $75k) to join the Zigbee Alliance or nicely ask someone else to petition on our behalf for the inclusion of something this in a future version of the Zigbee spec.

Also, SmartThings seems to perform OTA upgrades for IKEA and SYLVANIA bulbs so I guess a few vendors do share this info in the spirit of interoperability. I'm just hoping they will share it with us as well.

@Adminiuga
Copy link
Collaborator

I'm not sure if an open HTTP endpoint like IKEA's falls in the same category.

Correct, kudos to IKEA. We're not distributing IKEA's firmware as they are kindly providing an endpoint do download it from.

Zigbee device would have a readable attribute part of its OTA cluster that returns an upgrade URL with a JSON response similar to IKEA's

JSON is not really needed, as all that info is in OTA file header. I need to add parser for that header (shouldn't be complicated with all Zigpy types) and then could add a "plain file" OTAProvider.

Also, SmartThings seems to perform OTA upgrades for IKEA and SYLVANIA bulbs so I guess a few vendors do share this info in the spirit of interoperability. I'm just hoping they will share it with us as well.

Yep, they do. It should be possible to join their network and pretend to be an endpoint and leech the firmware file. IMO manup from deConz did something like this in the past.

@puddly
Copy link
Collaborator Author

puddly commented May 22, 2019

I've added an OTA file parser and a better method for extracting the updates from the IKEA container in my branch that's tracking yours. I'll try to get a few more IKEA devices to test with and hopefully get everything working again in-tree.

@Adminiuga
Copy link
Collaborator

Ok, I'll do my best not to rebase my branch then, which means I may need to merge dev back into it. ATM i'm stuck at writing tests for Tradfri aiohttp use

@ryanwinter
Copy link

This is so cool, I was worried I was going to have to buy a hub and move them across if I ever needed to update them. Is the samsung OTA a similar process?

@ryanwinter
Copy link

I tried out your branch @puddly , but it seemed to have some errors in it.

@puddly
Copy link
Collaborator Author

puddly commented May 28, 2019

@ryanwinter it's a WIP. I'm out of devices to upgrade and don't have the time right now to dogfood it but will do that when I get some more bulbs or switches. If you have an IKEA hub, capturing the upgrade process would be helpful for the RGB bulbs.

@ryanwinter
Copy link

ryanwinter commented May 29, 2019

Thanks @puddly. I don't have an RGB bulb yet, but was planning to buy some next time I visit IKEA. I have a ton of smartthings devices, so I might take a look at those.

I have a HUSBZB-1, so I'll investigate how to snoop on the upgrade traffic.

@Adminiuga
Copy link
Collaborator

@puddly do we really need both Firmware and OTAImage classes? Originally Firmware class was just a convenient container for IKEA files, because IKEA had required information already present in json.
OTAImage can provide all that info just based on the actual OTA file header, making the firmware class redundant.

@puddly
Copy link
Collaborator Author

puddly commented Jun 1, 2019

@Adminiuga no, Firmware is unnecessary at this point.

In its current form, OTA.get_firmware won't actually work correctly the first call for async providers because it's a sync method and just backgrounds the firmware fetching while immediately returning None. Usually the background task will have completed by the time the event is fired a second time but this isn't guaranteed.

What do you think about a new set of async-compatible ListenableMixin methods?

@Adminiuga
Copy link
Collaborator

In its current form, OTA.get_firmware won't actually work correctly the first call for async providers because it's a sync method and just backgrounds the firmware fetching while immediately returning None

I've changed it a bit in IKEA firmare prefetch so OTA queries OTA providres for a firmware and it is up to OTA Provider whether to prefetch it on startup or trigger fetching upon 1st request. Essentially forcing each provider to keep its own local images cache and OTA keeps the newest one. Which means we need to have some sort of refreshing/updating cache of firmware...

What do you think about a new set of async-compatible ListenableMixin methods?

It crossed my mind. not a fan of the current sync-to-async triggering, but all cluster commands are currently sync. Let me think about it, as I couldn't make a good enough justification for it. let's do a RFC on it

@puddly
Copy link
Collaborator Author

puddly commented Jun 1, 2019

@Adminiuga I'm hesitant to deploy code that sends 16 simultaneous HTTP requests on startup (and on every cache refresh), even for users who don't have Trådfri devices on their network. I'm aiming for a lazy approach that only hits IKEA's servers if a Trådfri device requests a firmware upgrade. For example:

                                                      ([ Local FS ])
query_next_image <-> ota.get_firmware <-> concurrently([   IKEA   ])
                                                      ([  Others  ])

                                              cached per-provider, if appropriate

Getting the event bus stuff right is pretty important. I've run into similar issues with my home automation system and have slowly replaced just about every sync method with an async one. I've removed the ListenableMixin from my version of the zigpy.ota.OTA class for now and I'll look into what changes will be useful as the OTA codebase evolves.

While changing things I also noticed that OTA clients really deal with two types of image signatures (FirmwareKeys):

  • Before the upgrade has started, they ask the OTA server to provide then info for the latest image (or the image that the OTA server decides to send them). This has no specific version number (just a minimum one) but does have a manufacturer id and an image type.
  • If that is acceptable to the client, they then ask for blocks from a specific image matching the image type, manufacturer id, and version. Once the upgrade process has started I think it would be best to "pin" OTA images matching those three values and cache them for some amount of time. There's no reason for the image to change yet retain that signature. This also has the benefit of keeping the OTA process mostly stateless and fast.

I've also reflected this addition to the FirmwareKey class in my (untested) codebase and made changes to make the IKEA firmware provider refrain from making any HTTP requests unless it absolutely has to. In the future it may be useful to utilize the WAIT_FOR_DATA response to more properly delay OTA image downloading.

I'll try to get everything actually running and write some thorough tests soon.

@Adminiuga
Copy link
Collaborator

I'm hesitant to deploy code that sends 16 simultaneous HTTP requests on startup (and on every cache refresh), even for users who don't have Trådfri devices on their network.

Agreed.

Getting the event bus stuff right is pretty important. I've run into similar issues with my home automation system and have slowly replaced just about every sync method with an async one. I've removed the ListenableMixin from my version of the zigpy.ota.OTA class for now and I'll look into what changes will be useful as the OTA codebase evolves.

Hrm, I concur on sync vs async, but not so sure about removing ListenableMixin. In a sense it was providing a common interface for provider. IMO adding an async_listener_event() makes perfect sense.

If that is acceptable to the client, they then ask for blocks from a specific image matching the image type, manufacturer id, and version. Once the upgrade process has started I think it would be best to "pin" OTA images matching those three values and cache them for some amount of time. There's no reason for the image to change yet retain that signature. This also has the benefit of keeping the OTA process mostly stateless and fast.

Do you have reason to believe the image_version would be different from the one indicated in response to next_image request? ATM OTA caches newest IMAGE and keeps it, so it would be serving same version which was in next_image response. But I think it is good idea still to enforce it in get_image_block

.. and made changes to make the IKEA firmware provider refrain from making any HTTP requests unless it absolutely has to.

Makes sense. I just was impatient to test IKEA updates and had to have firmware ready. IMO for IKEA it should be perfectly fine, as I see them making a few requests even when getting NO_IMAGE_AVAILABLE response.

In the future it may be useful to utilize the WAIT_FOR_DATA response to more properly delay OTA image downloading.
IMO makes sense if we have lazy loading like in case if IKEA OTA images. Is it worth implementing lazy loading for OTA images in files? (in other words devs concerned with memory usage is a rare thing nowdays :) ) eg keeping just OTA headers cached. Don't want to over complicate things unnecessary.

@puddly
Copy link
Collaborator Author

puddly commented Jun 2, 2019

Hrm, I concur on sync vs async, but not so sure about removing ListenableMixin.

I'm just seeing what possible additions or modifications will make the most sense as the OTA implementation is finalized. I agree that the final version of the OTA class should still inherit from ListenableMixing, it's just that I don't want to sacrifice useful functionality at the moment to conform to the existing codebase. I'll try to modify ListenableMixing to support my changes.

Do you have reason to believe the image_version would be different from the one indicated in response to next_image request? ATM OTA caches newest IMAGE and keeps it, so it would be serving same version which was in next_image response.

The probability of that being violated is pretty low but it still makes me a little uneasy. Using (image_type, manufacturer_id, file_version) as a cache key doesn't really change anything but guarantees the image won't be swapped out by a mistimed update. This will hopefully prevent some unlucky user's Zigbee device from being bricked because their manufacturer neglected to verify the OTA image's integrity before applying the update.

Makes sense. I just was impatient to test IKEA updates and had to have firmware ready. IMO for IKEA it should be perfectly fine, as I see them making a few requests even when getting NO_IMAGE_AVAILABLE response.

That's true. If I recall, my color temperature bulbs seem to make a series of four or five requests in quick succession after receiving a next_image response. Now if only I could figure out how to make my RGB bulbs do the same thing...

If nobody with an IKEA hub, a Zigbee sniffer, and some outdated RGB bulbs is available to help out, I'll have to buy an IKEA hub myself. I would rather not waste money on something I'm trying to avoid in the first place, though...

Don't want to over complicate things unnecessary.

Very true. I was just looking over the Zigbee spec when checking the responses we send during the OTA process and stumbled upon that response type. It seemed useful if the next_image response could be generated immediately but requesting the OTA image could take some time (e.g. NFS file system, slow internet).

@Adminiuga
Copy link
Collaborator

Adminiuga commented Jul 4, 2019

Pushed new version to https://github.com/Adminiuga/zigpy/tree/feature/ota
To enable file based OTA, create zigpy_ota directory in the same folder as zigbee.db. To enable IKEA provider, create an empty enable_ikea_ota file in zipgy_ota directory.

Disclaimer

This is work in progress. By enabling this feature you do accept the risks of bricking your devices!!!

Warning

I've haven't tested the new IKEA implementation, as I've ran out of devices to update. Does anyone has an older version of IKEA firmwares?

But a downloaded IKEA OTA image in zigpy_ota folder worked fine and essentially both providers are quite similar (even share base class) and only difference is how each providers obtains the list of firmwares

In this implementation we store only image headers for each provider with expiration (IKEA expires in 12 hours, file based OTA images expire in 24 hours), meaning every 12/24 hours providers will reload its images.

OTA Handler itself caches images for 18 hours and extends expiration if there's activity for this particular image, so it does not expire in the middle of an upgrade.

** ToDo: **

  • load files in an ExecutorPool so we don't block loop for the file IO?
  • should we cache negative results with a short expiration like of 4-8 hours? so we don't hit providers every time we receive a request for an image we don't have an update? Although probably not a biggy, since each provider holds each own cache which it can lookup quickly and it is async call anyway.

@Adminiuga
Copy link
Collaborator

oh, and BTW it does follow your proposal:

                                                      ([ Local FS ])
query_next_image <-> ota.get_firmware <-> concurrently([   IKEA   ])
                                                      ([  Others  ])

                                              cached per-provider, if appropriate

As I've added async_event() method to the ListenableMixin

@ryanwinter
Copy link

I have a bunch of older firmware bulbs. How do I test this? Do I just install this via pip?

@ryanwinter
Copy link

I see the following messages:

2019-07-04 15:59:28 DEBUG (MainThread) [zigpy.zcl] [0x660a:1:0x0019] OTA query_next_image handler for 'IKEA of Sweden TRADFRI bulb E26 W opal 1000lm': field_control=1, manufacture_id=4476, image_type=8449, current_file_version=304170354, hardware_version=1
2019-07-04 15:59:28 DEBUG (MainThread) [zigpy.ota.provider] Downloading http://fw.ota.homesmart.ikea.net/Tradfri_OTA_release_signed_2019_06_12_111253/bin/159696-TRADFRI-bulb-w-1000lm-1.2.214.ota.ota.signed for ImageKey(manufacturer_id=4476, image_type=8449)
2019-07-04 15:59:29 DEBUG (MainThread) [zigpy.ota.provider] Finished downloading None bytes from http://fw.ota.homesmart.ikea.net/Tradfri_OTA_release_signed_2019_06_12_111253/bin/159696-TRADFRI-bulb-w-1000lm-1.2.214.ota.ota.signed for ImageKey(manufacturer_id=4476, image_type=8449) ver None
2019-07-04 15:59:29 DEBUG (MainThread) [zigpy.zcl] [0x660a:1:0x0019] OTA image version: 304170354, size: 168318. Update needed: False

@Adminiuga
Copy link
Collaborator

No update needed. Version on the bulb same as downloaded one.
Hrm, maybe should pass along the version as well, so we don't download the image if it has the same version

@ryanwinter
Copy link

So checking back this morning, I saw a bunch of OTA upgrades which ended in a SUCCESS message. Looking through the bulbs, out of 7, 1 of them is still on an older version.

Here is the start of the output of one upgrade. I do notice that the debug messages indicate that is downloading the firmware repeatedly (this message is repeated ~3000 times)

I'm hoping this isn't the case, and it's reusing the same download :)

2019-07-04 23:39:23 DEBUG (MainThread) [zigpy.zcl] [0x9344:1:0x0019] OTA query_next_image handler for 'IKEA of Sweden TRADFRI bulb E26 opal 1000lm': field_control=0, manufacture_id=4476, image_type=8449, current_file_version=286283552, hardware_version=None
2019-07-04 23:39:23 DEBUG (MainThread) [zigpy.zcl] [0x9344:1:0x0019] OTA image version: 304170354, size: 168318. Update needed: True
2019-07-04 23:39:23 INFO (MainThread) [zigpy.zcl] [0x9344:1:0x0019] Updating: IKEA of Sweden TRADFRI bulb E26 opal 1000lm
2019-07-04 23:39:33 DEBUG (MainThread) [zigpy.zcl] [0x9344:1:0x0019] ZCL request 0x0103: [0, 4476, 8449, 304170354, 0, 63, None, None]
2019-07-04 23:39:33 DEBUG (MainThread) [zigpy.zcl] [0x9344:1:0x0019] OTA image_block handler for 'IKEA of Sweden TRADFRI bulb E26 opal 1000lm': field_control=0, manufacturer_id=4476, image_type=8449, file_version=304170354, file_offset=0, max_data_size=63, request_node_addr=Noneblock_
request_delay=None
2019-07-04 23:39:33 DEBUG (MainThread) [zigpy.zcl] [0x9344:1:0x0019] OTA upgrade progress: 0.0
2019-07-04 23:39:33 DEBUG (MainThread) [zigpy.zcl] [0x9344:1:0x0019] ZCL request 0x0103: [0, 4476, 8449, 304170354, 40, 63, None, None]
2019-07-04 23:39:33 DEBUG (MainThread) [zigpy.zcl] [0x9344:1:0x0019] OTA image_block handler for 'IKEA of Sweden TRADFRI bulb E26 opal 1000lm': field_control=0, manufacturer_id=4476, image_type=8449, file_version=304170354, file_offset=40, max_data_size=63, request_node_addr=Noneblock
_request_delay=None
2019-07-04 23:39:33 DEBUG (MainThread) [zigpy.zcl] [0x9344:1:0x0019] OTA upgrade progress: 0.0
2019-07-04 23:39:33 DEBUG (MainThread) [zigpy.zcl] [0x9344:1:0x0019] ZCL request 0x0103: [0, 4476, 8449, 304170354, 80, 63, None, None]
2019-07-04 23:39:33 DEBUG (MainThread) [zigpy.zcl] [0x9344:1:0x0019] OTA image_block handler for 'IKEA of Sweden TRADFRI bulb E26 opal 1000lm': field_control=0, manufacturer_id=4476, image_type=8449, file_version=304170354, file_offset=80, max_data_size=63, request_node_addr=Noneblock
_request_delay=None
2019-07-04 23:39:33 DEBUG (MainThread) [zigpy.ota.provider] Downloading http://fw.ota.homesmart.ikea.net/Tradfri_OTA_release_signed_2019_06_12_111253/bin/159696-TRADFRI-bulb-w-1000lm-1.2.214.ota.ota.signed for ImageKey(manufacturer_id=4476, image_type=8449)
2019-07-04 23:39:33 DEBUG (MainThread) [zigpy.ota.provider] Finished downloading None bytes from http://fw.ota.homesmart.ikea.net/Tradfri_OTA_release_signed_2019_06_12_111253/bin/159696-TRADFRI-bulb-w-1000lm-1.2.214.ota.ota.signed for ImageKey(manufacturer_id=4476, image_type=8449) ve
r None
2019-07-04 23:39:33 DEBUG (MainThread) [zigpy.zcl] [0x9344:1:0x0019] OTA upgrade progress: 0.0
2019-07-04 23:39:33 DEBUG (MainThread) [zigpy.zcl] [0x9344:1:0x0019] ZCL request 0x0103: [0, 4476, 8449, 304170354, 120, 63, None, None]
2019-07-04 23:39:33 DEBUG (MainThread) [zigpy.zcl] [0x9344:1:0x0019] OTA image_block handler for 'IKEA of Sweden TRADFRI bulb E26 opal 1000lm': field_control=0, manufacturer_id=4476, image_type=8449, file_version=304170354, file_offset=120, max_data_size=63, request_node_addr=Nonebloc
k_request_delay=None
2019-07-04 23:39:33 DEBUG (MainThread) [zigpy.ota.provider] Downloading http://fw.ota.homesmart.ikea.net/Tradfri_OTA_release_signed_2019_06_12_111253/bin/159696-TRADFRI-bulb-w-1000lm-1.2.214.ota.ota.signed for ImageKey(manufacturer_id=4476, image_type=8449)
2019-07-04 23:39:34 DEBUG (MainThread) [zigpy.ota.provider] Finished downloading None bytes from http://fw.ota.homesmart.ikea.net/Tradfri_OTA_release_signed_2019_06_12_111253/bin/159696-TRADFRI-bulb-w-1000lm-1.2.214.ota.ota.signed for ImageKey(manufacturer_id=4476, image_type=8449) ve
r None
2019-07-04 23:39:34 DEBUG (MainThread) [zigpy.zcl] [0x9344:1:0x0019] OTA upgrade progress: 0.1
2019-07-04 23:39:34 DEBUG (MainThread) [zigpy.zcl] [0x9344:1:0x0019] ZCL request 0x0103: [0, 4476, 8449, 304170354, 160, 63, None, None]
2019-07-04 23:39:34 DEBUG (MainThread) [zigpy.zcl] [0x9344:1:0x0019] OTA image_block handler for 'IKEA of Sweden TRADFRI bulb E26 opal 1000lm': field_control=0, manufacturer_id=4476, image_type=8449, file_version=304170354, file_offset=160, max_data_size=63, request_node_addr=Nonebloc
k_request_delay=None
2019-07-04 23:39:34 DEBUG (MainThread) [zigpy.ota.provider] Downloading http://fw.ota.homesmart.ikea.net/Tradfri_OTA_release_signed_2019_06_12_111253/bin/159696-TRADFRI-bulb-w-1000lm-1.2.214.ota.ota.signed for ImageKey(manufacturer_id=4476, image_type=8449)
2019-07-04 23:39:34 DEBUG (MainThread) [zigpy.ota.provider] Finished downloading None bytes from http://fw.ota.homesmart.ikea.net/Tradfri_OTA_release_signed_2019_06_12_111253/bin/159696-TRADFRI-bulb-w-1000lm-1.2.214.ota.ota.signed for ImageKey(manufacturer_id=4476, image_type=8449) ve
r None

@Adminiuga
Copy link
Collaborator

yeah, there are going to be tons of messages, cause you have to transfer about 160-200KB of data in 40 bytes chunks, so somewhere between 4096 and 5120 messages.

Look for the upgrade end messages. After transfer finishes (OTA upgrade progress: 99.9 -100) the device should send Upgrade End command to which we should reply. Then the device usually takes some time checking the firmware and applying the update. My lights and motion sensor updated fine, however plug had to be reset and rejoined.

I'd say give some time (30 - 45min) after it finishes the transfer and issues upgrade end command, don't powercycle the device during this time.

@Adminiuga
Copy link
Collaborator

for that 1 bulb which is still on the old version, check if you got "upgrade end" report from the bulb. If you did, try powercycling it. If it didn't finish the upgrade, it should check for the update again after a few minutes.

@puddly
Copy link
Collaborator Author

puddly commented Jul 6, 2019

The Finished downloading message only shows up inside of IKEAImage.fetch_image, which actually hits IKEAs server. Is this not a bad thing?

@Adminiuga
Copy link
Collaborator

not really:

  1. We received query_next_image command so OTA checked its local cache and it was a miss
  2. OTA pings providers -> IKEA provider.
  3. IKEA provider has the link for this image_type and manufacturer_id, so it downloads it and sends back to OTA provider
  4. OTA provider now has this full image and going to keep it for 18 hours if there was no "image_block" commands received.

So you should see downloading from IKEA, only on the 1st attempt (in 18 hours) of querying the image with this particular manufacturer_id and image_type.

In other words, providers will always download/load IMAGES if they are queried, but it shouldn't happen often because OTA holds a local cache for a time period.

@Adminiuga
Copy link
Collaborator

@puddly so for the color bulbs, it queries the image but never attempts to download it, correct?

@puddly
Copy link
Collaborator Author

puddly commented Jul 6, 2019

@Adminiuga I don't have the color ones any more but from what I recall yes, they send four or five queries but never progress any further. Some deCONZ users on GitHub made it seem like they were able to upgrade their bulbs so maybe mine were just on a really old firmware version.

As for the downloads, @ryanwinter's log contains four HTTP requests for the OTA image, even though only bulb 0x9344 is upgrading. I had the same thing happen when I was trying to upgrade a single switch and had to stop it after 20 HTTP requests.

@Adminiuga
Copy link
Collaborator

how many query_next_image handler have you received?

@puddly
Copy link
Collaborator Author

puddly commented Jul 6, 2019

@Adminiuga just one. Here's a complete log (~2 minutes) of zigpy/bellows from the time a switch joins the network until I pulled the plug (ha) because too many requests were being sent: ota.log

I'll try to debug it live later today.

@Hedda
Copy link
Contributor

Hedda commented May 29, 2020

I have now submitted a PR for current ZHA docs -> home-assistant/home-assistant.io#13621

Mostly just copied @walthowd reddit post but tried to simplify by not mentioning debug steps.

@walthowd perhaps you could submit a separate PR for debug steps but under "Troubleshooting"?

@Hedda
Copy link
Contributor

Hedda commented May 29, 2020

Could it be a good idea to move OTA provider URL source information from zigpy to device quirks?

https://github.com/zigpy/zha-device-handlers/

OTA provider URL source information could be seen as a device handlers?

As I understand, zigbee-herdsman/zigbee2mqtt handle providers via zigbee-herdsman-converters?

https://github.com/Koenkk/zigbee-herdsman-converters/tree/master/ota

Thinking that it might be less intimidating for new developers to help with providers if it's in quirks?

https://github.com/zigpy/zigpy/tree/dev/zigpy/ota

Maybe also break providers out into separate code files for each provider source / manufacturer?

@Adminiuga
Copy link
Collaborator

@walthowd have you tried "source routing" yet?
Depending how many "end-devices" are connected to the coordinator directly, you may want to reduce number of children and increase number of source routes

zha:
  zigpy_config:
    ota:
      ikea_provider: True
      ledvance_provider: True
      otau_directory: /config/zigpy_ota
     
    source_routing: True
    ezsp_config:
      CONFIG_MAX_END_DEVICE_CHILDREN: 16
      CONFIG_SOURCE_ROUTE_TABLE_SIZE: 24

@roger-
Copy link

roger- commented Jun 11, 2020

@Adminiuga

have you tried "source routing" yet?

Off-topic, but is explicitly setting source_routing: True necessary for a CC2531 with the source routing firmware? Not seeing any references to it anywhere in the code.

@Adminiuga
Copy link
Collaborator

I don't know, don't think source routing is supported by zigpy-cc module. ATM only bellows/ezsp has test support and there's alpha firmware for ConBee II But atm it does all src routing in firmware

@Hedda
Copy link
Contributor

Hedda commented Feb 1, 2021

Would it maybe be a good idea if zha-device-handlers could provide custom 'OTA Provider' (OTAProvider) for Zigbee devices?

Could make it easier to submit a new 'OTA Provider' to zha-device-handlers or add custom 'OTA Providers' without updating zigpy?

Regardless, might it also be a good idea to split each 'OTA Provider' into a separate code file per manufacturer for readability?

https://github.com/zigpy/zigpy/tree/dev/zigpy/ota

Please see issue zigpy/zha-device-handlers#750 which raised this question as user posted a request for a new OTA Provider there.

PS: I understand why ZHA users could today think that "ZHA Device Handlers" would now also handle 'OTA Providers' for zigpy.

@Hedda
Copy link
Contributor

Hedda commented Feb 1, 2021

Would it maybe be a good idea if zha-device-handlers could provide custom 'OTA Provider' (OTAProvider) for Zigbee devices?

Could make it easier to submit a new 'OTA Provider' to zha-device-handlers or add custom 'OTA Providers' without updating zigpy?

Regardless, might it also be a good idea to split each 'OTA Provider' into a separate code file per manufacturer for readability?

https://github.com/zigpy/zigpy/tree/dev/zigpy/ota

Please see issue zigpy/zha-device-handlers#750 which raised this question as user posted a request for a new OTA Provider there.

PS: I understand why ZHA users could today think that "ZHA Device Handlers" would now also handle 'OTA Providers' for zigpy.

FYI, I started a new discussion about these specific questions/ideas here -> #654

@Hedda
Copy link
Contributor

Hedda commented Feb 1, 2021

EUROTRONIC Technology GmbH has recently started publishing official OTA firmware image releases on their GitHub repo here:

https://github.com/EUROTRONIC-Technology/Spirit-ZigBee/releases

So far is only an FW image is for their "Spirit ZigBee" (model "SPZB0001") product which is TVR (Thermostatic Radiator Valve) device:

https://eurotronic.org/produkte/zigbee-heizkoerperthermostat/spirit-zigbee/

https://zigbee.blakadder.com/Eurotronic_SPZB0001.html

https://www.zigbee2mqtt.io/devices/SPZB0001.html

They have also created a wiki there and posted a guide on how to OTA update using deCONZ software from Dresden Elektronik:

https://github.com/EUROTRONIC-Technology/Spirit-ZigBee/wiki/OTA-update-guide-via-deCONZ

Plus the repos README.md also has a another guide for upgrading via Home Assistant if using their deCONZ integration:

https://github.com/EUROTRONIC-Technology/Spirit-ZigBee/blob/main/README.md

PS: Upgrading is recommended because "issues" -> https://github.com/EUROTRONIC-Technology/Spirit-ZigBee/issues

@Hedda
Copy link
Contributor

Hedda commented Apr 30, 2021

FYI pipiche38 has begun an interesting project that is trying to get OTA firmware URL via info get from Wireshark Zigbee sniffing:

https://github.com/pipiche38/Capture-OTA-from-Wireshark

https://github.com/pipiche38/Domoticz-Zigate-Wiki/blob/master/en-eng/Corner_Retreiving-Legrand-Firmware.md

@github-actions
Copy link

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@pipiche38
Copy link
Contributor

@walthowd have you tried "source routing" yet? Depending how many "end-devices" are connected to the coordinator directly, you may want to reduce number of children and increase number of source routes

zha:
  zigpy_config:
    ota:
      ikea_provider: True
      ledvance_provider: True
      otau_directory: /config/zigpy_ota
     
    source_routing: True
    ezsp_config:
      CONFIG_MAX_END_DEVICE_CHILDREN: 16
      CONFIG_SOURCE_ROUTE_TABLE_SIZE: 24

This is maybe off topic, but are those parameters still required with the more recent stack ?
I beleive yes for the source_routing, but what about the other parameters , and if so how to get the right value ?

@MattWestb
Copy link
Contributor

MattWestb commented Jul 24, 2022

With IKEA controllers and other sleepers its better forcing them using router instead of the coordinator (if having good router) and the sleepers is not losing the network then the coordinator id offline or doing TC things.
(battery draining then sleepers is jumping around or loosing there parents)
I usually paring routers and then setting CONFIG_MAX_END_DEVICE_CHILDREN: 0 and end device cant using the coordinator.

Was 3 week offline then having holiday in Spain and the laptop was with and all devices was online in one hours the reconnecting it with the network then all router was "holding" the network all the time.

Also using EZSP 6.7.8 or 6.10.X and not older or the bad 6.8 and 6.9 for having one stable network.

PS: Source routing enabled with very high routers in the EZSP firmware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests