New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform file sync outside of lock on Commit #10128
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -574,6 +574,13 @@ func (nw *namespacedWriter) Commit(ctx context.Context, size int64, expected dig | |
|
||
var innerErr error | ||
|
||
// We pre-sync the in-flight writes to the disk. This avoids the [subsequent fp.Sync() call] | ||
// (https://github.com/containerd/containerd/blob/c4c3c6ea568ce0cfbcf754863abadeea37d77c8f/plugins/content/local/writer.go#L95) | ||
// from taking too long (10s+) while holding the metadata database lock as in the following | ||
// `update` transaction. We intentionally ignore any error on Sync() because it will be | ||
// handled by the subsequent `fp.Sync` anyway. | ||
nw.Sync() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The idea looks good. However, as far as I know, fsync or fdatasync could commit unrelated dirty page. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @fuweid Thanks for reviewing the PR!
This applies regardless of where we do the sync, right? The downside of doing the sync under an exclusive transaction/lock is that it blocks concurrent operations that don't demand heavy I/O like
That fdatasync call is on the metadata database, not on the image layers (which can be gigabytes of data). So the degree of lock contention is dramatically different.
In our production environment, we pull and start around 10 containers upon VM booting. We observed a high variance in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the comment.
Yes. I was saying that the CreateContainer can commit part of dirty pages created by content writer.
Please checkout filesystem behaviour I mentioned in the REF. Both metadata database and image layers are in the same block device by default. They can impact each other.
That's why I want to see that improvement result. It will be more confident to get this commit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Makes sense. We've seen a substantial improvement on the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@fuweid My understanding is that even if this exists today, we are still better off moving the sync outside of the lock which is almost guaranteed to be blocking should there be a large image layer (e.g., GB+). And there's also a chance the interference issue you referenced will be addressed by the linux kernel community in the future, and then we will enjoy a free ride here :) |
||
|
||
if err := update(ctx, nw.db, func(tx *bolt.Tx) error { | ||
dgst, err := nw.commit(ctx, tx, size, expected, opts...) | ||
if err != nil { | ||
|
@@ -599,6 +606,13 @@ func (nw *namespacedWriter) Commit(ctx context.Context, size int64, expected dig | |
return innerErr | ||
} | ||
|
||
func (nw *namespacedWriter) Sync() error { | ||
if syncer, ok := nw.w.(content.Syncer); ok { | ||
return syncer.Sync() | ||
} | ||
return nil | ||
} | ||
|
||
func (nw *namespacedWriter) commit(ctx context.Context, tx *bolt.Tx, size int64, expected digest.Digest, opts ...content.Opt) (digest.Digest, error) { | ||
var base content.Info | ||
for _, opt := range opts { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface should not change,
Sync() error
can either be added to a new interface here or defined next to where it needs to be called. For new and optional functions, this should just use a type assert and called if available. In this case only the filesystem writer would need a Sync function.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, we need to call
namespacedWriter.w.fp.Sync()
from thenamedspacedWriter.Commit()
to actually flush the writes. But these are all private members of the class hiding behind the interface, so I don't see a clean way to do it without modifying the interface. Can you elaborate how to exposenamespacedWriter.w.fp.Sync()
just for the filesystem writer (i.e., thelocal.writer
class itself is private tonamespacedWriter
)?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
Somewhere during namespacedWriter Commit before the transaction you could have something like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I will test the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested and it works. Thanks for reviewing the PR.