Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk delete of documents #1214

Open
ptbrowne opened this issue Jan 31, 2018 · 2 comments
Open

Bulk delete of documents #1214

ptbrowne opened this issue Jan 31, 2018 · 2 comments

Comments

@ptbrowne
Copy link
Contributor

ptbrowne commented Jan 31, 2018

I think it should be useful if the stack could provide a _bulk_delete endpoint on its collections.

We have the problem for example in cozy-client where in order to delete a collection, we issue a query and then an update on those documents, but since the query is limited, only 50 documents are deleted. Then the problem of perfomance comes to mind, why sending all the documents to the client when they are going to be deleted ?

I am open to do a PR but I would like to check with you if it sounds good for you.

I would use the _all_docs API to retrieve all documents with only _id and _rev in the stack and then issue a _bulk_docs update with { _deleted: true }.

Now for the response to the client, it could be interesting to send back the documents for which there was a problem so that the client can decide what to do with them. For the rest of the documents we could omit them for performance; a query option could be used to only put them if the client explicitly requests them.

What do you think ? Any particular hurdles in mind ?

@nono
Copy link
Member

nono commented Jan 31, 2018

It should do the trick. Here is a list of things to do:

  • find a name that won't be used by CouchDB in the future (I know, it's just a guest and _bulk_delete is consistent with the other methods, but I'd prefer something like _delete_by_query)
  • check that the doctype is writable (the stack forbid to write on some doctypes via the /data APIs, like io.cozy.triggers)
  • check that the request has a token with a permission on the whole doctype (I don't think that a permission on a subset is used enough to implement a more complex validation)
  • use pagination for fetching documents from CouchDB and deleting them
  • the documents will need to be fetched with their attributes (not just the _id and _rev), to be injected in the hub (for realtime and triggers) -> rtevent
  • send the response (probably the same format as POST _all_docs)
  • add some unit tests
  • we are using goimports for fomating the code, and gometalinter for lint/vet
  • and, the most important part, document the endpoint in data-system.

To implement, the entry point is web/data/replication.go.

@ptbrowne
Copy link
Contributor Author

Thanks for the detailed description, I will try to advance on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants