Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mirror watch doesn't recover when server is temporary down #4883

Open
Javaluca opened this issue Mar 22, 2024 · 7 comments
Open

mirror watch doesn't recover when server is temporary down #4883

Javaluca opened this issue Mar 22, 2024 · 7 comments

Comments

@Javaluca
Copy link

Expected behavior

I'm trying to mirror my s3 bucket to a local directory into a doker container using the image minio/mc:RELEASE.2024-03-13T23-51-57Z-cpuv1 with the command

/usr/bin/mc mirror --watch --overwrite --remove miniohost/my-bucket /usr/share/my-folder

As you can see the bucket is the source and the container folder is the target and it works very good file will be added or removed based on the actual state of the bucket.

Actual behavior

I'm experienced a problem when the minio server is down or unreacheable, the container show the following log:

mc: <ERROR> Failed to perform mirroring Get "http://minio.net:9000/my-bucket/?events=s3%3AObjectCreated%3A%2A&events=s3%3AObjectRemoved%3A%2A&events=s3%3ABucketCreated%3A%2A&events=s3%3ABucketRemoved%3A%2A&ping=10&prefix=&suffix=": dial tcp 10.0.0.93:9000: connect: connection refused

but when the minio-server is up again the minio-client can't recover the mirroring functionality and essentially the container is useless and must to be manually restarted.

There is something I'm doing wrong or simply the minio-client is not inteded to recover it self?

Steps to reproduce the behavior

version: '3'
services:
   
  minio-client:
    restart: always
    image: minio/mc:RELEASE.2024-03-13T23-51-57Z-cpuv1
    volumes:
      - 'my-volume:/usr/share/my-folder'
    depends_on:
      - minio
    entrypoint: >
      /bin/sh -c "
        echo START ;
        mkdir -p /usr/share/my-folder;
        /usr/bin/mc config host rm miniohost ;
        /usr/bin/mc config host add --insecure miniohost http://minio.net:9000 ${MINIO_ROOT_USER} ${MINIO_ROOT_PASSWORD} ;
        /usr/bin/mc mirror --watch --overwrite --remove miniohost/my-bucket /usr/share/my-folder
        echo END ;
      "

volumes:
  my-volume:

This is the complete logs provided by the container

START
mc: <ERROR> No such alias `miniohost` found. Use `mc alias set mycloud miniohost ...` to add an alias. Use the alias for S3 operations.
Added `miniohost` successfully.
`miniohost/my-bucket/one.txt` -> `/usr/share/my-folder/one.txt`
`miniohost/my-bucket/two.txt` -> `/usr/share/my-folder/two.txt`
Removed `/usr/share/my-folder/one.txt`.
mc: <ERROR> Failed to perform mirroring Get "http://minio.net:9000/my-bucket/?events=s3%3AObjectCreated%3A%2A&events=s3%3AObjectRemoved%3A%2A&events=s3%3ABucketCreated%3A%2A&events=s3%3ABucketRemoved%3A%2A&ping=10&prefix=&suffix=": dial tcp 10.0.0.93:9000: connect: connection refused

mc --version

mc version RELEASE.2024-03-13T23-51-57Z (commit-id=2508db9c560c10be0ed2203eaa585134235b6907)
Runtime: go1.21.8 linux/amd64
Copyright (c) 2015-2024 MinIO, Inc.
License GNU AGPLv3 https://www.gnu.org/licenses/agpl-3.0.html

System information

Docker container with image minio/mc:RELEASE.2024-03-13T23-51-57Z-cpuv1

@klauspost
Copy link
Contributor

Not sure if this is something we want to fix. If a remote is down, we will not receive notifications and mirror will miss entries.

Restarting mc seems like the reasonable solution to re-enumerate the files on either side.

@harshavardhana
Copy link
Member

That is what is expected from mc --watch not sure why it's failing.

@klauspost
Copy link
Contributor

Well, the server isn't up when it is called, hence the connect: connection refused Isn't there just an mc ping -x miniohost missing?

@harshavardhana
Copy link
Member

Well, the server isn't up when it is called, hence the connect: connection refused Isn't there just an mc ping -x miniohost missing?

No the idea was that even with that it perpetually keeps retrying. Looks like someone has changed the code.

@klauspost
Copy link
Contributor

Well, that is a problem, if it changed then.

@znerol
Copy link

znerol commented Apr 10, 2024

I'm observing that too. In my case it would be perfect if mc would exit with a non-zero status code if the connection to the bucket is lost.

Browsing through the code, I found the following (just a hypothesis, haven't tested my observations at all):

If I'm not mistaken, errors from the watcher are handled on line 754ff in mirror-main.go. If an error occurs, it is forwarded to the parallel manager (mj.parallel.queueTask()).

The parallel manager will handle that error task in one of its workers. If a worker encounters an error task, then it simply terminates itself (line 97ff in parallel-manager.go). But all the others seem to continue as if nothing happened.

@znerol
Copy link

znerol commented Apr 10, 2024

It seems that the code in parallel-manager.go doesn't actually handle the error, but just passes it back to the result channel. That one is polled in 535ff in mirror-main.go.

Maybe there should be a distinction between fatal errors (connection lost to bucket) and recoverable errors. In that case it would be possible to make a distinction at line 588ff

@bh4t bh4t added the bugfix label May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants