Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Mlock flag. #273

Merged
merged 2 commits into from
Apr 26, 2021
Merged

Add Mlock flag. #273

merged 2 commits into from
Apr 26, 2021

Conversation

wpedrak
Copy link
Contributor

@wpedrak wpedrak commented Apr 21, 2021

Mlock flag will cause mlock on db file which will prevent memory swapping of it. Motivation of this commit is etcd-io/etcd#12750

My concerns (feedback very welcome):

  • Can full munlock and then full mlock after file growth be replaced with single mlock of new file chunk?
  • Should mlock and munlock functions from bolt_unix.go be copied into bolt_unix_aix.go and bolt_unix_solaris.go (as mmap and munmap is)?
  • How to test such OS-specific flag? I'm not aware of any OS-specific test.

@@ -51,7 +51,7 @@ func funlock(db *DB) error {
// mmap memory maps a DB's data file.
func mmap(db *DB, sz int) error {
// Map the data file to memory.
b, err := syscall.Mmap(int(db.file.Fd()), 0, sz, syscall.PROT_READ, syscall.MAP_SHARED|db.MmapFlags)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ, what we be the difference if we called

syscall. Mmap(..., syscall.MAP_SHARED | syscall.MAP_LOCKED | db.MmapFlags)

instead of what we currently have in this change - a separate mlock call?

Copy link
Contributor

@ptabor ptabor Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK It's the first attempt that Wojtek tried and the problem is that mmaped size by bbolt is bigger than the size of the file.

It works for mmap as long as its 'lazy'. But mlock (flag) is actually trying to lock/fetch the underlying pages and fails if the file sizes is too small.

Now the question why bbolt does not keep the file in-sync with mmap:

  1. Appending more content to a file is cheap and does not requires any special locking. We can do this in a few MB batches pretty frequently.
  2. Changing the size of mmap is complicated. There is no guarantee that if you mmap the same file but with bigger size, you will get the same continues virtual address space (as next addresses can be already taken). In theory on linux there is remap call that best-effort tries to remap in place (but its not exported to golang). Either way: as currently implemented, in order to change mmap we need to take RW lock on the address space, so drain all RW and RO transactions. That's expensive and distrustful. We do this once per GB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ptabor

AFAIK It's the first attempt that Wojtek tried and the problem is that mmaped size by bbolt is bigger than the size of the file.

It was not. Failing approach I've tried was locking of byte slice b returned by Mmap. Reason of this failure was exactly as you described.

@mm4tt
I believe there are 2 reasons to not do that:

  1. syscall.MAP_LOCKED is available on linux only (e.g. darwin is missing it)
  2. from mmap man MAP_LOCKED flag "This implementation will try to populate (prefault) the whole range but the mmap() call doesn't fail with ENOMEM if this fails. Therefore major faults might happen later on." which looks like swallowing errors to me.

bolt_unix.go Outdated
db.dataref = nil
db.data = nil
db.datasz = 0
return err
}

// mlock locks memory of db file
func mlock(db *DB, fileSize int) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that the code currently would not compile for windows & other osses and we need a dump (log_error / panic ?) implementation for bolt_windows...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, added

db.go Outdated
@@ -120,6 +120,12 @@ type DB struct {
// of truncate() and fsync() when growing the data file.
AllocSize int

// Mlock locks database file in memory when set to true.
// It prevents potential page faults, however used memory can't be reclaimed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

major page faults

Copy link
Contributor

@ptabor ptabor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.
Overall looks good to me. But we need this code to compile on windows and to have some tests (of file growing) in that mode.

`Mlock` flag will cause mlock on db file which will prevent memory swapping of it. Motivation of this commit (etcd): etcd-io/etcd#12750
@wpedrak
Copy link
Contributor Author

wpedrak commented Apr 23, 2021

Travis is failing to mlock memory in tests. Most likely it is caused by too small value of RLIMIT_MEMLOCK. There were such issue in 2014 travis-ci/travis-ci#2462. I'd skip those tests if RLIMIT_MEMLOCK is too low, but it generally seems bad idea to have such switches in CI pipeline.

}

if info.Cur < memlockLimitRequest {
t.Skip(fmt.Sprintf(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please link the travis 'no-changes' discussion in a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ptabor ptabor merged commit 9c92be9 into etcd-io:master Apr 26, 2021
@wpedrak wpedrak deleted the mlock-flag branch June 1, 2021 07:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants