Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maestral keeps indexing and indexing #133

Closed
sprig opened this issue May 2, 2020 · 11 comments · Fixed by #134
Closed

Maestral keeps indexing and indexing #133

sprig opened this issue May 2, 2020 · 11 comments · Fixed by #134

Comments

@sprig
Copy link

sprig commented May 2, 2020

An initial Sync appeared to work, until at some point I ran maestral status and got an error telling me to restart.

Since then maestral appears to keep indexing until crashing and restarting. I've set logging to DEBUG but I don't seem to get extra output. Here is a sample of the log. The same appears to repeat over and over, except that the indexed amount changes slightly before it restarts.

Some of the time, the amount of files changes drastically, e.g. it indexed ~300K at one time, and ~100K at another. However, the amount seems to be relatively stable.

System:

  • Maestral version: 0.6.4
  • OS: Ubuntu 19.10 Server (headless)
  • Desktop environment: None
  • Qt version: None

I'm using the version from pypi. looking at the git log, it doesn't appear that there were changes that would affect this issue since the release, but I'll try it and will report.

@samschott
Copy link
Owner

Did you forget to attach the logs? It is possible that the internet connection is lost at some point while indexing, which will currently trigger a restart instead of resuming where it left off. Do you see “Connecting...” in the logs just before indexing starts again? This would also explain the seemingly arbitrary number of indexed items.

@sprig
Copy link
Author

sprig commented May 2, 2020

Thanks for the reply! I actually realized I forgot to add the logs and was adding as you answered! See below (I've raised the log level to aid debugging).

It appears indeed as you describe, except there isn't really a loss in internet connection. Also, for the past (so many) times I looked while debugging, the reset seems to occur at the same place (135910 files). This seems slightly suspicious to me, though suspicious of what I'm not sure.

I looked around in the code it appears that lower level errors are not handled at all.

https://github.com/SamSchott/maestral/blob/f9010d8bc289cfb628200048dd3109eead2d5517/maestral/client.py#L596-L610

I haven't really worked with the dropbox API before so I'm not sure if the cursor remains valid after a connection error. For now, I've added another except clause to try and handle it by retrying the read. Hopefully this will be enough to get past this connection timeout. I'll report results later.

...
2020-05-02 08:31:12 maestral.client INFO: Indexing 135411...
2020-05-02 08:32:12 maestral.sync DEBUG: Connecting...
Traceback (most recent call last):
  File "/home/kosta/maestral/lib/python3.7/site-packages/urllib3/connectionpool.py", line 426, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/home/kosta/maestral/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.7/http/client.py", line 1354, in getresponse
    response.begin()
  File "/usr/lib/python3.7/http/client.py", line 306, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.7/http/client.py", line 267, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.7/ssl.py", line 1071, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.7/ssl.py", line 929, in read
    return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/kosta/maestral/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/home/kosta/maestral/lib/python3.7/site-packages/urllib3/connectionpool.py", line 725, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/home/kosta/maestral/lib/python3.7/site-packages/urllib3/util/retry.py", line 403, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/home/kosta/maestral/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise
    raise value
  File "/home/kosta/maestral/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/home/kosta/maestral/lib/python3.7/site-packages/urllib3/connectionpool.py", line 428, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/home/kosta/maestral/lib/python3.7/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
    self, url, "Read timed out. (read timeout=%s)" % timeout_value
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='api.dropboxapi.com', port=443): Read timed out. (read timeout=60)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/kosta/maestral/git/maestral/maestral/client.py", line 110, in wrapper
    return func(*args, **kwargs)
  File "/home/kosta/maestral/git/maestral/maestral/client.py", line 600, in list_folder
    more_results = self.dbx.files_list_folder_continue(results[-1].cursor)
  File "/home/kosta/maestral/lib/python3.7/site-packages/dropbox/base.py", line 1954, in files_list_folder_continue
    None,
  File "/home/kosta/maestral/lib/python3.7/site-packages/dropbox/dropbox.py", line 274, in request
    timeout=timeout)
  File "/home/kosta/maestral/lib/python3.7/site-packages/dropbox/dropbox.py", line 365, in request_json_string_with_retry
    timeout=timeout)
  File "/home/kosta/maestral/lib/python3.7/site-packages/dropbox/dropbox.py", line 449, in request_json_string
    timeout=timeout,
  File "/home/kosta/maestral/lib/python3.7/site-packages/requests/sessions.py", line 578, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/home/kosta/maestral/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/kosta/maestral/lib/python3.7/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/home/kosta/maestral/lib/python3.7/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='api.dropboxapi.com', port=443): Read timed out. (read timeout=60)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/kosta/maestral/git/maestral/maestral/sync.py", line 2664, in startup_worker
    sync.get_remote_folder()
  File "/home/kosta/maestral/git/maestral/maestral/sync.py", line 343, in wrapper
    res = func(self, *args, **kwargs)
  File "/home/kosta/maestral/git/maestral/maestral/sync.py", line 1888, in get_remote_folder
    include_deleted=False, limit=500)
  File "/home/kosta/maestral/git/maestral/maestral/client.py", line 115, in wrapper
    raise ConnectionError('Cannot connect to Dropbox')
ConnectionError: Cannot connect to Dropbox
2020-05-02 08:32:12 maestral.sync INFO: Connecting...
2020-05-02 08:32:12 maestral.sync INFO: Downloading your Dropbox
2020-05-02 08:32:13 maestral.client INFO: Indexing 8...
.....

@sprig
Copy link
Author

sprig commented May 2, 2020

Update:
The added retry logic seems partially successful it does get through the first read timeout. However, it appears that the timeouts then simply continue regularly:

2020-05-02 10:01:21 maestral.client INFO: Indexing 136915...
2020-05-02 10:02:22 maestral.client ERROR: Read timeout
2020-05-02 10:02:22 maestral.client DEBUG: Retrying after sleeping 1 secs...
2020-05-02 10:02:23 maestral.client INFO: Indexing 137415...
2020-05-02 10:03:23 maestral.client ERROR: Read timeout
2020-05-02 10:03:23 maestral.client DEBUG: Retrying after sleeping 2 secs...
2020-05-02 10:03:25 maestral.client INFO: Indexing 137915...
2020-05-02 10:04:25 maestral.client ERROR: Read timeout
2020-05-02 10:04:25 maestral.client DEBUG: Retrying after sleeping 4 secs...
2020-05-02 10:04:29 maestral.client INFO: Indexing 138415...
2020-05-02 10:05:29 maestral.client ERROR: Read timeout
2020-05-02 10:05:29 maestral.client DEBUG: Retrying after sleeping 8 secs...
2020-05-02 10:05:37 maestral.client INFO: Indexing 138915...
2020-05-02 10:06:37 maestral.client ERROR: Read timeout
2020-05-02 10:06:37 maestral.client DEBUG: Retrying after sleeping 16 secs...
2020-05-02 10:06:53 maestral.client INFO: Indexing 139415...
2020-05-02 10:07:54 maestral.client ERROR: Read timeout
2020-05-02 10:07:54 maestral.client DEBUG: Retrying after sleeping 32 secs...
2020-05-02 10:08:26 maestral.client INFO: Indexing 139915...

This is the added clause, which is right below the previous except. I've also increased retry, which is why it hasn't failed yet:

            except (requests.exceptions.ReadTimeout) as exc:
                logger.error("Read timeout")
                if self._retry_count >= retry:
                    self._retry_count = 0
                    raise exc
                import time
                secs = min(2**self._retry_count,60)
                logger.debug(f'Retrying after sleeping {secs} secs...')
                time.sleep(secs)
                self._retry_count += 1

Any ideas as to what might be going on?

Finally, I'm assuming that my logged error somehow did it but when I now run activity on the cli I get an exception. Wasn't sure whether to file it as a new issue since it is probably caused by my modifications.

$ prun maestral.cli main activity
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/kosta/maestral/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/kosta/maestral/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/kosta/maestral/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/kosta/maestral/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/kosta/maestral/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/kosta/maestral/git/maestral/maestral/cli.py", line 505, in activity
    if _check_for_fatal_errors(m):
  File "/home/kosta/maestral/git/maestral/maestral/cli.py", line 104, in _check_for_fatal_errors
    maestral_err_list = m.maestral_errors
  File "/home/kosta/maestral/lib/python3.7/site-packages/Pyro5/client.py", line 94, in __getattr__
    return self._pyroInvoke("__getattr__", (name,), None)
  File "/home/kosta/maestral/lib/python3.7/site-packages/Pyro5/client.py", line 243, in _pyroInvoke
    raise data  # if you see this in your traceback, you should probably inspect the remote traceback as well
TypeError: 'NoneType' object is not subscriptable
 +--- This exception occured remotely (Pyro) - Remote traceback:
 | Traceback (most recent call last):
 |   File "/home/kosta/maestral/lib/python3.7/site-packages/Pyro5/server.py", line 453, in handleRequest
 |     data = _get_exposed_property_value(obj, vargs[0])
 |   File "/home/kosta/maestral/lib/python3.7/site-packages/Pyro5/server.py", line 946, in _get_exposed_property_value
 |     return v.fget(obj)
 |   File "/home/kosta/maestral/git/maestral/maestral/main.py", line 512, in maestral_errors
 |     maestral_errors = [r.exc_info[1] for r in self._log_handler_error_cache.cached_records]
 |   File "/home/kosta/maestral/git/maestral/maestral/main.py", line 512, in <listcomp>
 |     maestral_errors = [r.exc_info[1] for r in self._log_handler_error_cache.cached_records]
 | TypeError: 'NoneType' object is not subscriptable
 +--- End of remote traceback

Edit: status seems to work but has a similar error at the bottom.

@sprig
Copy link
Author

sprig commented May 2, 2020

Seems related dropbox/dropbox-sdk-python#155

@samschott
Copy link
Owner

Thank you for the investigation! This does seem to be a Dropbox API issue but unlike dropbox/dropbox-sdk-python#155 retrying the call seems to work for you. Did you also try to just increase the requests timeout and reduce the number of items requested per call from 500 to maybe 100 or 50?

Intuitively, I would handle this independent of the retry count for CursorResetErrors. I don't quite remember the reason for handling those - Dropbox should almost never reset a cursor but apparently there were some issues at some point. From the number of indexed items between failures it looks like eventually every second call fails with a ReadTimeout. Maybe a better way to handle this is to always raise on two successive timeouts?

In any case, I will try to reproduce this myself and flag the issue with Dropbox again with a bit more information than the original report.

@samschott
Copy link
Owner

samschott commented May 2, 2020

The error with maestral activity and maestral status is indeed caused by your modifications: you log an error without providing any exc_info and both the CLI and GUI use this info to surface an error message to the user.

Maestral uses the logging as follows:

  1. Debug messages are normally discarded but can be logged if required.
  2. Info messages are used to populate the status field and inform the user.
  3. Warning messages trigger a desktop notification and are used for errors which are not fatal but need user interaction to be resolved. Those can be sync issues due to too large files, invalid file names, etc.
  4. Errors are non-recoverable and need to be resolved before syncing can continue. This includes revoked Dropbox access and all unexpected errors.

I should probably document this in the developer documentation which currently is limited to an API documentation only and is very much work in progress.

@sprig
Copy link
Author

sprig commented May 2, 2020

Thanks for info about logging! I'll just change to info then. Would a warning also require an exc_info?

It seemed to me that most of the fetches too a few seconds so I tried reducing the timeouts to 5 seconds, but apparently that was too short as the timeouts started to occur much earlier (at around ~20K files IIRC - the logs are gone now, it seems). I did not want to increase them for fear of making the indexing take a very long time.

I also tried increasing (also in the hope of reducing indexing time) the limit of items to fetch to 2000 and happily this seems to be enough - I was able to index ~600K files, and they are currently downloading.

@samschott
Copy link
Owner

Wow, that is a large Dropbox folder! Are your 600k items nested or in a flat hierarchy? I tried to reproduce the timeout with 150k folders but everything worked well.

A warning does not require any exec_info so that should be fine.

I am surprised that increasing the limit to 2000 items per call prevents timeouts, I would have expected the individual calls to take longer even if the total time is reduced. But who knows how the Dropbox backend works. For now, I will remove any limits and let Dropbox decide how many items it wants to return per call. And it does make sense to introduce a certain number of retries on timeout, maybe even for the other API calls.

@sprig
Copy link
Author

sprig commented May 3, 2020

There is a hierarchy but some folders have quite a few items.

I agree that it's not the most intuitive that the timeouts are resolved by increasing rather than decreasing the item count. Since the timeouts always occurred roughly at the same place, I guess that those might be related to the total time or total amount of requests in handling the listing. If this is indeed the case, you may be able to reproduce the timeouts by reducing the items limits to e.g. 100.

Retrying on connection errors seems like the right thing to do...

samschott pushed a commit that referenced this issue May 3, 2020
@samschott samschott mentioned this issue May 6, 2020
@samschott
Copy link
Owner

I agree, it is likely that limiting the number of returned items somehow involves multiple requests on Dropbox’s backend therefore can more easily time out.

This should hopefully be fixed now in the new release, where timeouts are handled more gracefully and there is no limit on the number of returned entries. However, feel free to reopen the issue if it persists.

@sprig
Copy link
Author

sprig commented May 8, 2020

Great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants