Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Alternative to 1148): Don't blindly reuse state from a previous layer when re-creating it #1140

Merged
merged 1 commit into from May 2, 2022

Conversation

mtrmac
Copy link
Collaborator

@mtrmac mtrmac commented Feb 17, 2022

  • Use the Driver.locker lock when creating a layer in the overlay driver. I don’t know why that lock is necessary, but if it is necessary at all, it seems that it should be held on that code path. Alternatively, maybe that lock should be removed entirely?
  • Don't just add files to the layer’s directory; if we find anything pre-existing, remove that, including the symbolic link in linkDir, to ensure the layer is created in a known state.

Compare also EDIT #1133 #1139.

@mtrmac
Copy link
Collaborator Author

mtrmac commented Feb 17, 2022

Absolutely untested so far, and I’d like to see this work in the simulated situation before it is merged.

@mtrmac mtrmac force-pushed the create-no-duplicates branch 2 times, most recently from 4b5e4c5 to dbbd18f Compare February 17, 2022 18:26
@mtrmac
Copy link
Collaborator Author

mtrmac commented Feb 17, 2022

Note that removing a layer like this is also potentially risky — we can’t necessarily trust the contents of the link or additionallayer files in the layer’s directory.

@mtrmac
Copy link
Collaborator Author

mtrmac commented Feb 21, 2022

OK, I have tested this now and verified that it works as expected.

OTOH #1139 might be a better fix — I’d very much love for someone who understands the design intent to weigh in.

@mtrmac mtrmac changed the title Don't blindly reuse state from a previous layer when re-creating it (Alternative to 1148): Don't blindly reuse state from a previous layer when re-creating it Feb 22, 2022
@mtrmac
Copy link
Collaborator Author

mtrmac commented Feb 22, 2022

Podman tests: containers/podman#13315

@mtrmac mtrmac force-pushed the create-no-duplicates branch 2 times, most recently from 006af1c to a5296f4 Compare February 23, 2022 13:25
@mtrmac
Copy link
Collaborator Author

mtrmac commented Feb 23, 2022

Marking as ready to review — please consider this only in tandem with #1148.

@mtrmac mtrmac marked this pull request as ready for review February 23, 2022 15:46
@mtrmac
Copy link
Collaborator Author

mtrmac commented Apr 13, 2022

@nalind @giuseppe PTAL. This might be very wrong, I’d just like to make sure something is done about these failure cases.

@nalind
Copy link
Member

nalind commented Apr 18, 2022

The in-driver lock appears to be a carryover from docker — everything at this level is supposed to be protected by a LayerStore lock that happened before we called into any driver-specific logic. The premise looks sound.

@nalind
Copy link
Member

nalind commented Apr 19, 2022

This looks a decent remediation for cases where a process was killed sometime earlier, while it was attempting to create a layer from a template, which is often done for users of fedora toolbox, so I think this is worth pursuing.

defer d.locker.Unlock(id)

if _, err := system.Lstat(dir); err == nil {
logrus.Warnf("Trying to create a layer %#v while directory %q already exists; removing it first", id, dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We default to showing warnings. Is this something we want normal users to see by default? Or should we drop this down to info?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that, this indicates an unexpected abort of a c/storage operation, and it is definitely something we want to have captured in any logs at the default level for postmortems if this cleanup is insufficient and/or incorrect and harmful.

I’m not so sure that it needs to be printed by default on a TTY during interactive use; at that point, there are unlikely to be any logs recorded, and the thing either works or it doesn’t. OTOH, it’s also a situation we shouldn’t be regularly getting into, so I don’t think that a warning should hurt.

Alternatively, #1148 should make this case unreachable assuming there are no bugs — and in that case we either don’t need this PR, or we can happily leave it on a warning level because it is clearly very unexpected. If I’m reading @nalind’s comments right, it seems preferable to have both this PR and #1148.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok leave it at Warning level.

@rhatdan
Copy link
Member

rhatdan commented Apr 20, 2022

LGTM

mtrmac added a commit to mtrmac/storage that referenced this pull request Apr 20, 2022
In all call paths, the layerStore owning the driver
is expected to be locked, so, so this seems redundant.

See also containers#1140 (comment) .

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
@mtrmac
Copy link
Collaborator Author

mtrmac commented Apr 20, 2022

The in-driver lock appears to be a carryover from docker — everything at this level is supposed to be protected by a LayerStore lock that happened before we called into any driver-specific logic.

OK, let’s discuss possibly removing that in #1214 . (That would break this PR, I will rebase either one as necessary.)

mtrmac added a commit to mtrmac/storage that referenced this pull request Apr 20, 2022
In all call paths, the layerStore owning the driver
is expected to be locked, so, so this seems redundant.

See also containers#1140 (comment) .

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
We have reports in the wild of a layer store where two symbolic links
in linkDir point to the same layer. That could only happen when calling
Driver.create with a previously-used layer ID (which happens all the time
because pulls use deterministic layer IDs), without fully deleting
the previous version of the layer (so far, we don't know how that has
happened).

To avoid such situations, don't just leave whatever was in
the layer directory laying around; try to remove any pre-existing
contents, as well as the symbolic link in linkDir, if any.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
@rhatdan
Copy link
Member

rhatdan commented May 2, 2022

After meeting with @nalind and @mtrmac we have decided to merge both PRs.

@rhatdan rhatdan merged commit f6fd87f into containers:main May 2, 2022
@mtrmac mtrmac deleted the create-no-duplicates branch May 2, 2022 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants