Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(group_common_events): filter expected raft_topology error messages #7452

Merged
merged 1 commit into from
May 28, 2024

Conversation

aleksbykov
Copy link
Contributor

@aleksbykov aleksbykov commented May 20, 2024

Raft topology features bring new error messages, which could appear in logs when any topology operation aborted/failed.
Add common_group_events context manager which will Change event severity of expected error log messages from error to warning when topology operations is failed

Fixes: #7426, #7425

Testing

  • JOb #18 - job failed due to spot termination, but events have changed sererites
  • Job #17 - Job is passed

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

@aleksbykov aleksbykov force-pushed the filter-msg-of-failed-decommission branch from 0901c2b to c420ec1 Compare May 22, 2024 04:44
@aleksbykov aleksbykov requested review from fruch and soyacz May 22, 2024 05:06
@aleksbykov aleksbykov added backport/none Backport is not required Ready for review labels May 22, 2024
@aleksbykov aleksbykov marked this pull request as ready for review May 22, 2024 05:07
@fruch
Copy link
Contributor

fruch commented May 22, 2024

@aleksbykov do we have anywhere confirmation/agreement from devs about ignoring those ?

To make sure we are not overlooking actual issues ?

@aleksbykov
Copy link
Contributor Author

aleksbykov commented May 22, 2024

@aleksbykov do we have anywhere confirmation/agreement from devs about ignoring those ?

To make sure we are not overlooking actual issues ?

Yes, these errors are expected during when operation has been aborted. Similar we have and for dtest. Dev approved them.
These new context manager is added only for *StreamingErr nemesis

@aleksbykov aleksbykov added backport/6.0 and removed backport/none Backport is not required labels May 23, 2024
@mykaul
Copy link
Contributor

mykaul commented May 26, 2024

Don't we wish to capture the narrow scope of the error? Example:

send_raft_topology_cmd(stream_ranges) failed with exception (node state is rebuilding):

I wish to catch an exception with this specific reason - 'node state is rebuilding' - not just any failure of send raft_topology_cmd exception, no?

@aleksbykov aleksbykov force-pushed the filter-msg-of-failed-decommission branch 3 times, most recently from 5fbada9 to 43b9a6e Compare May 27, 2024 15:14
@aleksbykov
Copy link
Contributor Author

@fruch , @soyacz can you take a look?
updated:

  • set errors to be more explicit
  • add new log error to have warning severity

Additional commit fix discovered error with logger.debug on specific case

@aleksbykov
Copy link
Contributor Author

Don't we wish to capture the narrow scope of the error? Example:

send_raft_topology_cmd(stream_ranges) failed with exception (node state is rebuilding):

I wish to catch an exception with this specific reason - 'node state is rebuilding' - not just any failure of send raft_topology_cmd exception, no?

@mykaul , i wanted to have wider regexp to filter out possible errors during operations. But you right, i rewrote error regexp to be more specific, so not to loose possible issue

Copy link
Contributor

@soyacz soyacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to fix commit message (too long line)
otherwise, when you confirm it works as expected (after fixes),
LGTM

@fruch
Copy link
Contributor

fruch commented May 27, 2024

@soyacz the extra fix here for the remoter logger should come on it's own and get backport together with the original PR introduced it

Copy link
Contributor

@fruch fruch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
But first address the failures in pre commit checks

@aleksbykov aleksbykov force-pushed the filter-msg-of-failed-decommission branch from 43b9a6e to a9b22f3 Compare May 28, 2024 05:30
@soyacz
Copy link
Contributor

soyacz commented May 28, 2024

@soyacz the extra fix here for the remoter logger should come on it's own and get backport together with the original PR introduced it

True.
I cherry-picked it and created separate PR: #7493 (but I amended a bit commit message to avoid precommit check error)

@aleksbykov aleksbykov force-pushed the filter-msg-of-failed-decommission branch from a9b22f3 to 462f1b2 Compare May 28, 2024 06:42
@aleksbykov
Copy link
Contributor Author

@aleksbykov
Copy link
Contributor Author

@fruch @soyacz . i fixed commit log message length, but linting failed for
pydantic.error_wrappers.ValidationError: 1 validation error for SctKafkaConfiguration, which was not affected by this PR.

Add common group_events context manager which will Change severity
of expected error log messages to warning when topology operations
is failed aborted by request

Fixes: scylladb#7426, scylladb#7425
@aleksbykov aleksbykov force-pushed the filter-msg-of-failed-decommission branch from 462f1b2 to 52f5e66 Compare May 28, 2024 10:30
@fruch fruch merged commit c27be0e into scylladb:master May 28, 2024
3 of 5 checks passed
@roydahan
Copy link
Contributor

roydahan commented Jun 2, 2024

@fruch are we backporting manually or with mergify?

@roydahan roydahan added the backport/6.0-done Commit backported to 5.3 label Jun 2, 2024
@roydahan
Copy link
Contributor

roydahan commented Jun 2, 2024

backported manually for now.

@fruch
Copy link
Contributor

fruch commented Jun 2, 2024

@fruch are we backporting manually or with mergify?

we can do any of the above, mergify is just a bit more safe, and documented process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants