Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Depth infinity: Make recursive PROPFIND calls cheaper #14531

Closed
danimo opened this issue Feb 26, 2015 · 41 comments · Fixed by #38583
Closed

Depth infinity: Make recursive PROPFIND calls cheaper #14531

danimo opened this issue Feb 26, 2015 · 41 comments · Fixed by #38583

Comments

@danimo
Copy link
Contributor

danimo commented Feb 26, 2015

Testcase

curl -XPROPFIND https://user:pwd@server/owncloud/remote.php/webdav -H "Depth:infinity"

Actual results

On a well-equipped x86_64 machine it takes 7:20 minutes under heavy server load to list 5279 items (dirs/files).

Expected results

This should actually be fairly cheap, since all required information is available from the DB. The suspicion is that we could attain some serious speedup by investigating code paths and DB queries and optimizing them.

Motivation

Cheap recursive PROPFINDs would allow the client to move away from doing directory-by-directory discovery, which in turn would significantly speed up syncs, especially all sync runs until completing initial sync.

@danimo danimo added this to the 8.1-current milestone Feb 26, 2015
@DeepDiver1975
Copy link
Member

@icewind1991 just another case for profiling webdav - please have a look - THX

@guruz
Copy link
Contributor

guruz commented Feb 26, 2015

Note that 1.8 has recursive PROPFIND disabled, this issue only affects 1.7.x

Sabre also creates the XML Reply in memory, maybe this could be streamed to the client instead..

@DeepDiver1975
Copy link
Member

Sabre also creates the XML Reply in memory, maybe this could be streamed to the client instead..

@evert ?? THX

@evert
Copy link

evert commented Feb 26, 2015

Congrats on finding the exact reason I didn't want this feature in the first place, and it's now disabled by default ;). It basically enabled a DDOS attack fairly easily. Do a few of these PROPFINDs in parallel, and the entire server grinds to a halt.

However, streaming xml is coming. This requires two things:

1 -> Completion and integration on the 'xml-rewrite2' branch. This is a project I've been on and off working on for about 2 years now and actually getting pretty close to integration.

Pull request here: https://github.com/fruux/sabre-dav/pull/602

This spawned a separate project sabre/xml which I'm also working on getting to a full release:

http://sabre.io/xml/

  1. Allowing users to return iterators (and generators) wherever arrays are used.

Most of the time when creating the response for this is not just spent writing xml, the most important thing is that the entire WebDAV tree needs to be traversed on the server. Most of the work is in fetching every node, checking access control, etc...

To properly enable streaming support, we need to be able to get the results as they come in and create the xml on the fly.

Doing this is scheduled after a next major sabre/dav release, as PHP 5.5 is required for full generator support, and I don't want to do a half-baked version of this with Iterators first.

However, none of this will really affect the total time spent (7:20) as the exact same processing is still happening. The difference is that there's lower memory usage and the first byte returns a lot quicker.

To reduce the total time spent, you should still do some proper profiling as @DeepDiver1975 said.

@PVince81
Copy link
Contributor

Why is recursive PROPFIND needed in the first place ?
Is this mostly for the initial sync ?

Let me think:
Once everything is synced, a PROPFIND on the root should be enough to compare etags. If the etag is different, then need to do a PROPFIND to get the folder content and see which etag changed there.

@evert
Copy link

evert commented Feb 26, 2015

Only if you talk to broken servers. An ETag shouldn't say anything about the contents of a collection, so you shouldn't rely on this behavior if you are trying to build a standard client.

Instead, you can use the ctag (or better yet) use the sync-collection.

@PVince81
Copy link
Contributor

Can we use our existing etags as ctags ? 😉 (whatever that is, need to look it up)

@PVince81
Copy link
Contributor

Here was a result from blackfire.io I did when testing with Sabre 2.1, it was a PROPFIND with depth 1 on a folder with 69 entries: https://blackfire.io/profiles/6f6c0033-da1e-43d0-b886-b5eeeb62286c/graph

@evert
Copy link

evert commented Feb 26, 2015

Ctag is also non-standard, but it's well defined:

http://svn.calendarserver.org/repository/calendarserver/CalendarServer/trunk/doc/Extensions/caldav-ctag.txt

The problem with using the ETag is that it does have a specific definition, and that definition is incompatible with how you are using the etag.

So while this ctag was meant for calendars, it makes sense for any collection.

However, sync-tokens are even better. Instead of doing a full PROPFIND every time, you literally ask for all changes since the last sync-token you received which can make this incredibly fast.

@PVince81
Copy link
Contributor

   Description:  The CS:getctag property allows clients to quickly
      determine if the contents of a calendar or scheduling Inbox or
      Outbox collection have changed since the last time a
      "synchronization" operation was done.  The CS:getctag property
      value MUST change each time the contents of the calendar or
      scheduling Inbox or Outbox collection change, and each change MUST
      result in a value that is different from any other used with that
      collection URI.

That sounds like our etags. When the collection content changes (or any subcollection), the etag will change.

So would it just be a matter of renaming "getetag" to "getctag" ?

Last changes since the last sync means the server needs to remember the states that it returned for every sync client so it can diff it, which cannot be done with the current architecture/data model. It was suggested by @icewind1991 here: #4936 (speed up by keeping etag history)

@evert
Copy link

evert commented Feb 26, 2015

Implementing this:

http://tools.ietf.org/html/rfc6578

Will basically solve any performance problems you have today of this kind. It will greatly reduce memory usage and cpu usage on both server and client. You literally just get a list of deletions and additions back.

@evert
Copy link

evert commented Feb 26, 2015

So would it just be a matter of renaming "getetag" to "getctag" ?

It's also in a different namespace.

Last changes since the last sync means the server needs to remember the states that it returned for every sync client so it can diff it, which cannot be done with the current architecture/data model. It was suggested by @icewind1991 here: #4936 (speed up by keeping etag history)

That sounds very similar indeed. What you need from a database perspective:

  1. A primary key (this becomes your sync token, can just be an incremental id)
  2. The path to the file that was changed
  3. The type of operation (modify/create/delete)

It's pretty lightweight

@PVince81
Copy link
Contributor

So does it mean that for every file change (rename, delete, upload, download, even outside of WebDAV) need to create a new entry there ? Or should these be grouped ?

"the path of the file that was changed": the path can change if the file was moved. So possibly need more info about moved stuff.

@evert
Copy link

evert commented Feb 26, 2015

It's flexible, you can either do it per file, or per group of changes.

I saw a note about remote storage in one of the tickets as well. I can see that that's a little harder. But even there I assume that you must have some way to figure out whether a tree or subtree has changed within remote storage, even today, right?

So the change for that would be that you now have to figure out what has changed in a remote storage tree, and log those entries as well. You would only have to do this when the sync-token is requested, and it gives you room for future optimization. A big different for remote storage is that the logic to determine changes is now something that happens on the server, not the client.

Plus down the road you get the option to handle changes in remote storage asynchronously.

@PVince81
Copy link
Contributor

Unfortunately the external storage stuff relies mostly on recursive scanning (no subtree magic). It checks the mtime of remote folders and compares with the local cache. If it has changed, it goes deeper. It can be slow.

I suppose it should be possible to write this as a separate experimental app/Sabre plugin (if time permits) without being too intrusive.

@evert
Copy link

evert commented Feb 26, 2015

I figured that was a possibility.. but then at least you have the option to do this processing server-side, and you don't have to let clients do this. I assume that you already need to do something similar to this today?

@PVince81
Copy link
Contributor

Sounds like a big rewrite 😄

I uploaded 1600 files to my local server (on master, with the new Sabre 2.1) and ran a PROPFIND with infinity through backfire. Here is the result: https://blackfire.io/profiles/488be46f-7890-4be2-8e1b-7098961a572e/graph

There aren't many folders, just this:

data/root/files
data/root/files/lots
data/root/files/lots/of
data/root/files/lots/of/files
data/root/files/lots/of/files/500
data/root/files/lots/of/files/1000
data/root/files/lots/of/files/100

The numbers are the number of files inside the respective folders.

Result doesn't look too bad.

@PVince81
Copy link
Contributor

Note: this was an "allprops" call. I'll maybe redo another one that also request the OC-specific props next week.

@PVince81
Copy link
Contributor

@danimo did your items contain shared files/folders ?

@PVince81
Copy link
Contributor

I hope it got better with 8.1 😄

I remember @jturcotte did some tests, but not sure if they were about recursive PROPFIND or whether they involved shared folders.

@jturcotte
Copy link
Member

I've set Depth to 1 in all requests: https://github.com/owncloud/administration/blob/master/performance-tests-c%2B%2B/main.cpp#L69

And the test data was only 1 directory deep, so that wouldn't stress it so much either.

@karlitschek karlitschek modified the milestones: 8.2-next, 8.1-current May 4, 2015
@PVince81
Copy link
Contributor

Any new data from 8.1.3 ? 😄

@DeepDiver1975 DeepDiver1975 modified the milestones: 9.0-next, 8.2-current Sep 22, 2015
@DeepDiver1975
Copy link
Member

let's move this on -> 9.0

@guruz
Copy link
Contributor

guruz commented Nov 4, 2015

Since the sync client is not using recursive PROPFIND anymore and the code was removed from the client (since it was still libneon based instead of Qt QNetworkAccessManager) I'm closing this.

@guruz guruz closed this as completed Nov 4, 2015
@LukeOwlclaw
Copy link

I just had a look at the mysql log for a single, non-recursive PROPFIND operation:

  • 104 SELECTs
  • 16 `UPDATE oc_file_lock``
  • 8 INSERT INTOoc_file_locks`

Can this be correct? In particular, are the writing access operations to oc_file_locks needed for a purely reading PROPFIND?

@PVince81
Copy link
Contributor

When doing a PROPFIND, you are reading a list of entries in a folder.
A concurrent process could make a chance to the folder listing while you are changing it (ex: add/update/remove entries).
So the folder currently being listed will be locked with a shared lock to avoid modifications to it during the read operation (also modifications to etags/sizes, etc). Failure to do so could cause unpredictable results, like missed entries or so.

@PVince81
Copy link
Contributor

@icewind1991 correct me if I'm wrong

@LukeOwlclaw
Copy link

The only thing that comes to my mind what could go wrong, is that a directory size is not equal to the sum of its containing items (in case dir size is somehow obtained first, then a file is changed in parallel, afterwards containing files are "propfound" (but this sounds like an odd implementation if that happened)). It this case the result is inconsistent in itself, however, a PROPFIND should never "see" and report an inconsistent DB state as this must be ensured by the write operations using appropriate locking.

@LukeOwlclaw
Copy link

My apologies, I used version 8.2.0. Now using 8.2.2 performance is much better (~factor 100). Probably due to the fact that there are no UPDATEs nor INSERTs and only about half the number of SELECTs.

@guruz guruz removed this from the 9.0 milestone Mar 15, 2017
@guruz
Copy link
Contributor

guruz commented Mar 15, 2017

Re-opening as per @hodyroff @michaelstingl @ogoffart @jturcotte @SergioBertolinSG

@guruz guruz reopened this Mar 15, 2017
@michaelstingl
Copy link

… adding @mrow4a to the discussion about performance…

@mrow4a
Copy link
Contributor

mrow4a commented Mar 15, 2017

Working on it, and see no ending :>

#27372
#27339
#27373
#27284

@guruz
Copy link
Contributor

guruz commented Mar 23, 2017

Some data points:
A infinity PROPFIND XML reply taking uncompressed 12 MB (around 2900 directories, around 20.000 files)
Downloads currently in 25 sec using curl.
I didn't check the server memory usage.

(no memory / XML streaming operations yet).

@guruz guruz changed the title Make recursive PROPFIND calls cheaper Depth infinity: Make recursive PROPFIND calls cheaper Mar 24, 2017
@PVince81
Copy link
Contributor

Found this PR about PROPFIND streaming: https://github.com/fruux/sabre-dav/pull/898

@guruz
Copy link
Contributor

guruz commented Apr 4, 2017

When implementing this, please also try to gzip the response stream.

Right now in our recommended web server configs we disable gzip because some web servers mess with ETags then. But for the text-based PROPFIND it still makes sense.

@ogoffart
Copy link

ogoffart commented Apr 5, 2017

Right now in our recommended web server configs we disable gzip because some web servers mess with ETags then

We have been using OC-ETAG snice mirall 1.7.0, so i think we can ignore this issue now. (are we still supporting such clients)

But I wonder about the resuming when downloading files.

@hodyroff
Copy link
Contributor

hodyroff commented Apr 5, 2017

Support as in Support Contract and active fixing = latest version only
Support as in should work and if not we consider it a bug = 2.0 with 10.0; 1.8 with 9.1 IMHO

@PVince81
Copy link
Contributor

bug with Infinity: #28341

@michaelstingl
Copy link

Related: #38583

@stale
Copy link

stale bot commented Sep 20, 2021

This issue has been automatically closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.