Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix db.grow is unusable when NoFreelistSync is on #387

Merged
merged 1 commit into from Jan 19, 2023

Conversation

uvletter
Copy link
Contributor

@uvletter uvletter commented Jan 17, 2023

Originally Truncate and Sync in db.grow is to solve the problem fdatasync after write cannot garantee file size metadata is flushed(see boltdb/bolt#284). In e67705e over allocation is introduced to save the cost of ftruncate frequently, that's nearly the final form we can see now in db.grow.

Summarize it, db.grow basically has two effects:

  1. Fix the file safety issue of fdatasync
  2. Improve the performance by reducing the need to flush file metadata

The file safety issue is fixed in torvalds/linux@156bddd and torvalds/linux@b71fc07 long ago, it wouldn't likely to be a problem nowadays(Yes this PR does't fix any severe problem, even without db.grow, boltdb can still run normally).

For a logical consistncy, db.grow should be invoked every time the high watermark of the database size increases, but for now it only happens when commitFreelist is invoked, so I move db.grow out of commitFreelist.

Linked to #285

Signed-off-by: Tian skylypig@gmail.com

Copy link
Contributor

@ptabor ptabor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I support this change. Thank you for it.

But before submitting It would be good to cover this with a test,
In particular the severity of this issue (so messaging in the bbolt release notes) depend what we will learn about potential failure scenario: is it rather a panic() or memory stomping bug.

@ptabor
Copy link
Contributor

ptabor commented Jan 17, 2023

(please --sign-off the commit)

tx_test.go Outdated
return
}
for _, isSyncFreelist := range []bool{false, true} {
// Open the database.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run subtest here using t.Run for having independent failures for both scenarios

tx_test.go Outdated
}
for _, isSyncFreelist := range []bool{false, true} {
// Open the database.
db, err := bolt.Open(tempfile(), 0666, &bolt.Options{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider using btesting.CreateDBWithOptions

tx_test.go Outdated
}
// Put a big enough value to ensure new space must be allocated
bigvalue := make([]byte, db.Info().PageSize)
if err := b.Put([]byte("key"), bigvalue); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use testify's require.NoError(...) for such assertions.

tx_test.go Outdated
}

newSize := fileSize(db.Path())
if newSize < initSize*2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this assertions proves that Grow works...

I think that real prove starts when we cross the AllocSize border and discover that small additional allocations puts us on 2*AllocSize. Here the file might have just grown as write at offset consequence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the way you mentioned is more obvious, but I have no good idea to make up a database to cross the AllocSize exactly...

Here the file might have just grown as write at offset consequence.

The initial database fills 4 pages, and I put a value of 1 page size, so there would be around 6 pages. mmapSize normalize the datasz to 32kb or multiple of 32kb, and db.grow would truncate to datasz, so I judge by at least double initSize, i.e. 8 pages.

Of course if I can make up datasz to AllocSize, it's more obvious to distinguish.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think you can iteratively write data of size in order of AllocSize/100.
    And I would guess that around 80th write you would observe the file exceeded 2*AllocSize.
    If we do that write, you should never observe a file of Size x: AllocSize<x < 2 * AllocaSize

  2. Seems your math works, assuming we are on 4KB pages (and there are OSes with 64KB pages).
    So minimal version would be to enforce the 4096 page and put in the comment the math that you made here...
    But if 1. is not extremely slow, I would recommend trying 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adopted the 1. for the readibility. The test time in my laptop is around 10 seconds, which I support is acceptable.

PS the 2. even for odd page size like 64KB, I think it still works. As mmapSize normalize the datasz to 32kb or multiple of 32KB, for 64KB page size and 6 page, the datasz would be 512KB=32KB2^4>64KB6,the initial datasz is 64KB4=256KB, so it still obey newSize >= initSize2. Forgive my rigidness : )

Copy link
Contributor

@ptabor ptabor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Please fix DCO (--signoff)

Signed-off-by: tian <skylypig@gmail.com>
@uvletter
Copy link
Contributor Author

Thank you. Please fix DCO (--signoff)

Thanks, done

@ptabor ptabor merged commit 774edab into etcd-io:master Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants