Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No progress during prune (0.12, 0.13) #3761

Closed
micw opened this issue May 18, 2022 · 20 comments
Closed

No progress during prune (0.12, 0.13) #3761

micw opened this issue May 18, 2022 · 20 comments
Labels
backend: s3 state: need investigating cause unknown, need investigating/troubleshooting

Comments

@micw
Copy link

micw commented May 18, 2022

Output of restic version

restic 0.13.1 compiled with go1.18 on linux/amd64

How did you run restic exactly?

# restic_0.13.1_linux_amd64 prune -v 3
repository cc70af73 opened successfully, password is correct
loading indexes...
loading all snapshots...
finding data that is still in use for 20 snapshots
[0:01] 100.00%  20 / 20 snapshots...
searching used packs...
collecting packs for deletion and repacking
will remove pack 0009d7e4 as it is unused and not indexed
will remove pack 0028bb8a as it is unused and not indexed
will remove pack 003e5a07 as it is unused and not indexed
will remove pack 00520e1d as it is unused and not indexed
will remove pack 00ac712e as it is unused and not indexed
will remove pack 010e33d0 as it is unused and not indexed
will remove pack 0125212c as it is unused and not indexed
will remove pack 013c21f2 as it is unused and not indexed
will remove pack 0249a0ce as it is unused and not indexed
will remove pack 029a644e as it is unused and not indexed
will remove pack 02d1329e as it is unused and not indexed
will remove pack 033e85fd as it is unused and not indexed
will remove pack 0385f0cb as it is unused and not indexed
will remove pack 038c43ec as it is unused and not indexed
will remove pack 0392f42d as it is unused and not indexed
will remove pack 046ae779 as it is unused and not indexed
will remove pack 048465a6 as it is unused and not indexed
will remove pack 04904e99 as it is unused and not indexed
will remove pack 04a35f4b as it is unused and not indexed
will remove pack 04ade1a7 as it is unused and not indexed
will remove pack 04b7416e as it is unused and not indexed
will remove pack 04e5d7fc as it is unused and not indexed
will remove pack 051b3f85 as it is unused and not indexed
[8:27] 2.16%  977 / 45240 packs processed...

What backend/server/service did you use to store the repository?

Open Telekom Cloud (via S3 interface)

Expected behavior

Should prune the repository

Actual behavior

Runs up to a certain point (package 977 on this repo) and then stops making progress.

Steps to reproduce the behavior

I have no way to reproduce it on an arbitrary repository. Hoever, I have 3 repos with that issue where the prune stops always at the same package.

Additional info

  • Same with Restic from master, 0.13.1 and 0.12.1
  • Always stops around packet 1000
  • Always stops at the same packet per repo, independent of the restic version I try

Do you have any idea what may have caused this?

Do you have an idea how to solve the issue?

Did restic help you today? Did it make you happy in any way?

Restic is great, I use it for all my backups!

@lbausch
Copy link
Contributor

lbausch commented May 18, 2022

Might this be (or related to) #3710?

@micw
Copy link
Author

micw commented May 18, 2022

It looks so. Is there an easy way to get a build with that fix?

@lbausch
Copy link
Contributor

lbausch commented May 18, 2022

Sure! https://beta.restic.net/

@micw
Copy link
Author

micw commented May 18, 2022

Thank you. I downloaded latest_restic_linux_amd64 from the beta site and tried it. The behaviour is exactly the same, so the fix for #3710 does not work here.

Additional info: On all 3 repos it stops around package 1000.

@micw
Copy link
Author

micw commented May 18, 2022

I have tried to prune the repo with restic 0.12.1. The behaviour is still exactly the same. So it seems like the cause is a completely different.

@micw micw changed the title No progress during prune for hours No progress during prune for hours (0.12, 0.13) May 18, 2022
@micw micw changed the title No progress during prune for hours (0.12, 0.13) No progress during prune (0.12, 0.13) May 18, 2022
@MichaelEischer
Copy link
Member

Might this be (or related to) #3710?

#3710 did not affect release versions of restic.

Judging from the log snippet restic gets stuck while listing all pack files in the repository. Does adding -o s3.list-objects-v1=true to the command line help?

@MichaelEischer MichaelEischer added the state: need feedback waiting for feedback, e.g. from the submitter label May 19, 2022
@micw
Copy link
Author

micw commented May 23, 2022

s3.list-objects-v1=true works:

image

@MichaelEischer MichaelEischer added backend: s3 state: need investigating cause unknown, need investigating/troubleshooting and removed state: need feedback waiting for feedback, e.g. from the submitter labels May 23, 2022
@MichaelEischer
Copy link
Member

It would be interesting to have a debug log (see https://github.com/restic/restic/blob/master/CONTRIBUTING.md#reporting-bugs) of the requests restic issues to the S3 backend. To be precise, I'm interested in the requests probably sent to the S3 backend, when listing files gets stuck. If you or someone else who has the same problem creates a debug.log, make sure to remove the Authorization header and sanitize the backend configuration in the log.

@micw
Copy link
Author

micw commented May 23, 2022

If someone can provide a debug build of restic, I can provide a debug log.

@micw
Copy link
Author

micw commented May 23, 2022

Note that I'm using OTC (Open Telecom Cloud) which mimics the S3 API. May be relevant.

@micw
Copy link
Author

micw commented May 30, 2022

I had to recover most of the stuck backups (by running restic prune -o s3.list-objects-v1=true). After that it seems to work without that option. I have now one repo left with the issue which I can leave in this state for a few more days. How can I get a debug version so that I can provide a debug log?

@dhopfm
Copy link

dhopfm commented May 31, 2022

How can I get a debug version so that I can provide a debug log?

Have a look here.

@micw
Copy link
Author

micw commented Jun 2, 2022

Looks like it is in an endless loop with the following request/repsonse (redacted, let me know if anything important is missing):

2022/06/02 09:09:47 debug/round_tripper_debug.go:83     debug.loggingRoundTripper.RoundTrip     156     ------------  HTTP REQUEST -----------
GET /zzz-backup-prod/?continuation-token=VjIt[...]]ZsQT0%3D&delimiter=&encoding-type=url&fetch-owner=true&list-type=2&prefix=zzz-cloud-db-0%2Fdata%2F HTTP/1.1
Host: obs.eu-de.otc.t-systems.com
User-Agent: MinIO (linux; amd64) minio-go/v7.0.27
Authorization: AWS4-HMAC-SHA256 Credential=J[...]]S/20220602/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=1[...]9
X-Amz-Content-Sha256: e[...]5
X-Amz-Date: 20220602T070947Z
Accept-Encoding: gzip


2022/06/02 09:09:47 debug/round_tripper_debug.go:96     debug.loggingRoundTripper.RoundTrip     156     ------------  HTTP RESPONSE ----------
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: application/xml
Date: Thu, 02 Jun 2022 07:09:47 GMT
Server: OBS
X-Amz-Bucket-Region: eu-de
X-Amz-Id-2: 32A[...]IkQ
X-Amz-Request-Id: 00[...]3A
X-Reserved: amazon, aws and amazon web services are trademarks or registered trademarks of Amazon Technologies, Inc

@micw
Copy link
Author

micw commented Jun 7, 2022

@MichaelEischer does this help in any way? My prunes worked 2-3 times after I ran it once with the s3.list-objects-v1=true option but got stuck again now. I cannot keep the current state forever, so let me know if you need further debug output or testing. Sooner or later I'll have to add the s3.list-objects-v1=true as default to make our backuos work again.

@MichaelEischer
Copy link
Member

Sorry for the late reply. I'm completely not sure whether the bug is in the Open Telekom Cloud implementation or in the minio-go library. Judging from my understanding of the Amazon S3 API description, the minio library is behaving correctly which would put the blame on the cloud implementation.

What seems to happen is that ListObjectsV2 never sets IsTruncated to false which causes the library to continue issuing requests. But for ListObjectsV1 that doesn't seem to happen.

The difference between the list-objects versions, that is probably relevant here, is how request pagination works (aka. the file list is too large for a single request and is thus returned incrementally). My guess right now is that the Telekom Cloud is doing something odd in that regard. Unfortunately the debug log isn't that helpful. What might be interesting is whether the continuation-token in the url changes for two subsequent requests?

I'll have to add the s3.list-objects-v1=true as default

That's the only workaround for now. It shouldn't have any adverse effects.

micw added a commit to realestatepilot/docker-restic-backupclient that referenced this issue Jun 15, 2022
micw added a commit to realestatepilot/docker-restic-backupclient that referenced this issue Jun 15, 2022
micw added a commit to realestatepilot/docker-restic-backupclient that referenced this issue Jun 15, 2022
micw added a commit to realestatepilot/docker-restic-backupclient that referenced this issue Jun 15, 2022
micw added a commit to realestatepilot/docker-restic-backupclient that referenced this issue Jun 15, 2022
micw added a commit to realestatepilot/docker-restic-backupclient that referenced this issue Jun 15, 2022
micw added a commit to realestatepilot/docker-restic-backupclient that referenced this issue Jun 15, 2022
@micw
Copy link
Author

micw commented Jun 27, 2022

@MichaelEischer in the boto3 docs is mentioned that v2 getobjects is someting the requester has to pay for. not sure if that's also true for v1 - if not, using v1 should probably the default.

It turned out that the loop caused lot of traffic (20 TB in the last month which is ~2k€). Just got the bill and saw the mess...

So I spent the afternoon in reproducing the issue which very clear is an issue with the OTC endpoint. It returns the same set of objects for each requests and ignores the continuation token.

@MichaelEischer
Copy link
Member

in the boto3 docs is mentioned that v2 getobjects is someting the requester has to pay for. not sure if that's also true for v1 - if not, using v1 should probably the default.

As both versions of getobjects work essentially the same way, I'd expect that the same pricing applies to both versions.

It turned out that the loop caused lot of traffic (20 TB in the last month which is ~2k€). Just got the bill and saw the mess...

Oh dear!

So I spent the afternoon in reproducing the issue which very clear is an issue with the OTC endpoint. It returns the same set of objects for each requests and ignores the continuation token.

So the S3-compatible endpoint is actually not S3-compatible? This has to be fixed by OTC. Returning a continuation token shows that apparently listObjectsV2 is supported, but then ignoring the token is a problem. Besides that, I don't think there's much we can do at the restic-side to avoid this problem.

@micw
Copy link
Author

micw commented Jul 2, 2022

@MichaelEischer Absolutely, restic is not to blame in any way for that ^^ I have an open ticket in OTC and wait for final response.

@MichaelEischer
Copy link
Member

Is there anything left which we can do on the restic side here? If no, I'd like to close this issue.

@MichaelEischer
Copy link
Member

Judging from the Server: OBS header in the http reponse, the issue here might be fixed by minio/minio-go#1704 which is included in the latest master branch. Besides that there's nothing left to do here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: s3 state: need investigating cause unknown, need investigating/troubleshooting
Projects
None yet
Development

No branches or pull requests

4 participants