Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!: trace file seeking #1340

Merged
merged 10 commits into from
Jun 3, 2024
Merged

fix!: trace file seeking #1340

merged 10 commits into from
Jun 3, 2024

Conversation

evan-forbes
Copy link
Member

@evan-forbes evan-forbes commented May 6, 2024

Description

This PR backports the fix to the writing and reading of traced files. This enables them to get push correctly every time instead of occasionally get cut from the seek being set to the beginning of the file.

it also changes the api by unexporting a function that doesn't need to be exported or included in the interface.

these changes are included in the upcoming #1295

@evan-forbes evan-forbes requested a review from a team as a code owner May 6, 2024 07:08
@evan-forbes evan-forbes requested review from ramin and staheri14 and removed request for a team May 6, 2024 07:08
@evan-forbes evan-forbes added WS: Big Blonks 🔭 Improving consensus critical gossiping protocols fix labels May 6, 2024
@evan-forbes evan-forbes self-assigned this May 6, 2024
@evan-forbes evan-forbes requested a review from rach-id May 6, 2024 07:09
@evan-forbes evan-forbes changed the title fix: fix!: trace file seeking May 6, 2024
Copy link
Contributor

@staheri14 staheri14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks good. However, I have some comments regarding the use of mut and the atomic boolean reading for managing file access.

Comment on lines +53 to +54
f.mut.Lock()
defer f.mut.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just rely on the mut to control access to the file? The reading and mut seem to be locked/set to true and unlocked/set to false simultaneously in all cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I'll take a look and see if we can simply this. The initial add for two different mutexes was so that we can ignore writes on line 43 w/o actually holding the mutex. Perhaps we can only use the reading atomic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a few things and unfortunately I think we'll have to keep both.

we don't want to only use the mutex, as that will block writes when reading, which could halt the node. If we only check if the mutex is locked then we risk ignoring writes even though we aren't reading. If we only use the reading atomic bool then we hit data races.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will block writes when reading

I thought that is intended to be this way based on the godoc for bufferedFile

// bufferedFile is a file that is being written to and read from. It is thread
// safe, however, **when reading from the file, writes will be ignored**.

Maybe that comment needs some revisions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will block writes when reading

I thought that is intended to be this way based on the godoc for bufferedFile

yeah we can definitely rephrase that. I think I'm some author bias unfortunately and need some help making it clearer, naively this part seems to indicate that writes are ignored.

when reading from the file, writes will be ignored.

perhaps a modification to the part on thread safety?

return nil
}

func (f *bufferedFile) stopReading() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[suggestion] I'd rename it to something that indicates the seek is reset, and not paused e.g., resetCursor or resetSeek.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good point, one of the actions that is being done is that the cursor is being set to the end of the file (the last place we wrote to), however from the user's perspective the cursor of where we write isn't actually moving. What we are doing tho is setting atomic bool to false that indicates that the file is currently being read from

func (f *bufferedFile) File() (*os.File, error) {
err := f.Flush()
// File returns the underlying file with the seek point reset. The caller should
// not close the file. The caller must call the returned function when they are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller must call the returned function when they are done reading from the file.

Can you provide more details on this? the purpose of the returned func is not clear (why should it be called when the reading is done, it does not look like that it closes the file)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should just be called to set the cursor to the end of the file and so that we stop ignoring writes.

pkg/trace/local_tracer.go Show resolved Hide resolved
pkg/trace/tracer.go Show resolved Hide resolved
)

// bufferedFile is a file that is being written to and read from. It is thread
// safe, however, when reading from the file, writes will be ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description suggests that writes and reads cannot be done simultaneously, hence my interpretation is that having one mutex to control access to the file is sufficient. There is no need for the reading bool.

Also, there is no possibility of concurrent reads based on the current implementation of startReading() (it always locks on mut hence disallowing concurrent reads). Hence, relying on mut should be fine.

Can you shed light on a case where two of reading and mut are needed? thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evan-forbes evan-forbes requested a review from staheri14 May 9, 2024 08:11
rach-id
rach-id previously approved these changes May 9, 2024
Copy link
Member

@rach-id rach-id left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utAck

Co-authored-by: Sanaz Taheri <35961250+staheri14@users.noreply.github.com>
@evan-forbes evan-forbes enabled auto-merge (squash) May 16, 2024 14:41
Comment on lines +43 to +45
if f.reading.Load() {
return 0, nil
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-iterating an oddity for reviewers to hopefully help get this across the line. This seems silly to have an atomic and a mutex, however if we want to have the ability to ignore writes while also avoiding all races, then we need both the atomic and the mutex

@evan-forbes
Copy link
Member Author

recently added more reviewers to try and get this across the line. This is strictly a backport to main from v0.34.x-celestia, as this fix was already included there

Copy link
Collaborator

@rootulp rootulp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skimmed. Approving b/c this was already reviewed + merged to v0.34.x-celestia

type bufferedFile struct {
mut *sync.RWMutex
// reading protects the file from being written to while it is being read
// from. This is needed beyond in addition to the mutex so that writes can
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// from. This is needed beyond in addition to the mutex so that writes can
// from. This is needed in addition to the mutex so that writes can

Copy link
Contributor

@staheri14 staheri14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utAck

@evan-forbes evan-forbes merged commit cc37a71 into main Jun 3, 2024
17 of 18 checks passed
@evan-forbes evan-forbes deleted the evan/backport-seek-fix branch June 3, 2024 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix WS: Big Blonks 🔭 Improving consensus critical gossiping protocols
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants