Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Snapshot" of writable layers #223

Open
1 task
gorbak25 opened this issue Jun 12, 2023 · 9 comments
Open
1 task

"Snapshot" of writable layers #223

gorbak25 opened this issue Jun 12, 2023 · 9 comments
Labels
enhancement New feature or request

Comments

@gorbak25
Copy link

gorbak25 commented Jun 12, 2023

What is the version of your Overlaybd

No response

What would you like to be added?

I would like for the overlaybd-tcmu binary to implement a TCMU reconfigure handler which:

  1. Temporarily pauses IO on the block device
  2. Seals the current writable layer
  3. Creates a new empty writable layer
  4. Reconfigures itself, so the sealed layer is now a lower layer and the upper layer points to the new empty layer
  5. Resumes IO
    Sealing is quick, so the IO pause shouldn't be noticeable for users of the block device.

Why is this needed for Overlaybd?

OverlayBD will be the main storage back-end for https://github.com/hocus-dev/hocus (we're currently finishing the migration from raw sparse files to OverlayBD). We would have 2 use cases for reconfiguration:

  1. Hocus spawns long-lived vm's (workspaces) which we would like to periodically back up to an OCI registry without recreating the vm's from scratch.
  2. When fetching content from remote repositories we spawn an dedicated VM for that task which pulls changes from the remote repository into a new layer. If re-configuring tcmu was supported we could reuse the same block device and vm between pulls of remote repositories.

Are you willing to submit PRs to contribute to this feature?

  • Yes, I am willing to implement it.
@gorbak25 gorbak25 added the enhancement New feature or request label Jun 12, 2023
@lihuiba
Copy link

lihuiba commented Jun 12, 2023

2. we could reuse the same block device

This reuse is risky, because it probably incurs inconsistency between disk content and that cached in kernel. I would advise against reconfigure (sightly change the content of) a block device without detaching it from kernel. Except in the case that you are 100% sure that the inconsistency will not happen.

@gorbak25
Copy link
Author

gorbak25 commented Jun 12, 2023

incurs inconsistency between disk content and that cached in kernel

Would data still be inconsistent when first flushing the kernel writeback buffer by invoking sync on the guest/host and then reconfiguring the device?

@lihuiba
Copy link

lihuiba commented Jun 13, 2023

Basically yes, as the kernel will not be aware that you have changed (reconfigured) the disk.

@gorbak25
Copy link
Author

gorbak25 commented Jun 13, 2023

Ok, I agree that inconsistencies will arise. I'm wondering whether sync would act as a barrier. Let's consider this scenario:

  1. Data is constantly being written to the block device without a pause
  2. Issue a sync at time T_s
  3. After sync finished reconfigure the device (swap the writable layer), we now have a sealed layer S and the new writable layer W1

Now my question is:
Can I be sure that data written at T < T_s will end up in S and data written after the sync T > T_s will end up in W? If I proceed to create a new writable layer W2 on top of S won't fsck during bootup fix any inconsistencies which may arise?

Do I need a stronger primitive to ensure this will work(like pausing the VM for a moment)? In Hocus we don't use containerd or the overlaybd containerd snapshotter, we use overlaybd-tcmu directly, so we are free to make this work for our use cases :)

@lihuiba
Copy link

lihuiba commented Jun 14, 2023

whether sync would act as a barrier

No, sync is only a memory-to-disk synchronization, i.e. writing dirty cache pages back to disk; it does not take care of the opposite direction by loading new data from disk. (And the disk does not even have a way to tell what has been changed!)

Dropping caches with sysctl -w vm.drop_caches=3 before reconfiguration does not seem to ensure consistency. I suspect that it does not fully drop all kinds of caches, but I didn't carry out further investigation.

@liulanzheng
Copy link
Member

reconfiguration seems easy to implement because libtcmu already supports it. But it must be used with caution due cache consistency problems.
Perhaps this set of commands can solve the cache consistency issue:

sync
echo 3 > /proc/sys/vm/drop_caches
blockdev --flushbufs $DISK
hdparm -F $DISK

from https://stackoverflow.com/questions/9551838/how-to-purge-disk-i-o-caches-on-linux

@lihuiba
Copy link

lihuiba commented Jun 14, 2023

Anyway, umount and re-mount again are necessart.

@gorbak25
Copy link
Author

I don't see why purging the read cache is required when the only thing which changes is the upper writable layer so the data on disk remains the same. I see that problems will arise if one changes the disk image entirely but if only the upper layer changes then the read caches are still ok.

@lihuiba
Copy link

lihuiba commented Jun 15, 2023

I don't see why

Because you "pulls changes from the remote repository". This may incur inconsistency even if you don't "changes the disk image entirely".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants