Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test enterprise nightly #3843

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Conversation

Michal-Leszczynski
Copy link
Collaborator

@Michal-Leszczynski Michal-Leszczynski commented May 9, 2024

This PR introduces new restore test framework (testHelper) and (with its help) implements a functional test for restoring a user ( TestRestoreTablesUserIntegration). It also rewrites TestRestoreTablesNoReplicationIntegration.

It also:
Fixes #3833

Up to this point we didn't specifically set tombstone_gc mode in tests and expected it to be 'timeout'. Since tablet tables need to have tombstone_gc 'repair', we should change the default expectation to it.
Note that this 'assumption' should be removed in the future in favor of something more robust.
@Michal-Leszczynski Michal-Leszczynski force-pushed the ml/test-enterprise-nightly branch 2 times, most recently from 7c64299 to f702253 Compare May 22, 2024 14:03
Ent Scylla versions contain additional table "dicts" which cannot be found in OSS Scylla,
so it should be excluded when comparing GetTarget output with golden files.
@Michal-Leszczynski Michal-Leszczynski force-pushed the ml/test-enterprise-nightly branch 9 times, most recently from 89e2fb8 to 801b9e2 Compare May 27, 2024 10:55
@Michal-Leszczynski Michal-Leszczynski marked this pull request as ready for review May 27, 2024 13:29
@Michal-Leszczynski
Copy link
Collaborator Author

Michal-Leszczynski commented May 27, 2024

@karol-kokoszka as expected (see #3869), new restore user test is failing for Scylla nightly and will be failing for Scylla 6.0 in general. The PR is ready for review!

Copy link
Collaborator

@karol-kokoszka karol-kokoszka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall 👍 , just few questions.

@@ -430,7 +430,8 @@ func TestGetTargetIntegration(t *testing.T) {
if diff := cmp.Diff(golden, v,
cmpopts.SortSlices(func(a, b string) bool { return a < b }),
cmpopts.IgnoreUnexported(backup.Target{}),
cmpopts.IgnoreSliceElements(func(u backup.Unit) bool { return u.Keyspace == "system_replicated_keys" })); diff != "" {
cmpopts.IgnoreSliceElements(func(u backup.Unit) bool { return u.Keyspace == "system_replicated_keys" }),
cmpopts.IgnoreSliceElements(func(t string) bool { return t == "dicts" })); diff != "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with ignoring it, but the best would be to just distinguish what version of Scylla is it and exclude the assertion from OSS only.

e.g.: line 397 returns CQL session where you can query SELECT release_version FROM system.local;

This is not the hard requirement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change that. In general I am not a big fan of the golden file tests - they seem more annoying than useful when working with different Scylla versions. Maybe it would be good to get rid of them in the future, or perhaps it would be better to separate tests for different Scylla versions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karol-kokoszka on the other hand, does it make any sense?
We still have only 1 basic golden file and some tables are present only in the older versions, some only in the newer and ENT versions. Ignoring "dicts" only in newer versions has the same result as ignoring it in general - the golden file does not contain it. The golden file would need to contain a union of all possible tables, but this would need to be done manually as there is no Scylla version containing all of them. As mentioned above, I don't think that this is a useful test and it should be refactored in the future, but maybe let's skip this in terms of this PR.

rootSession gocqlx.Session
}

func newCluster(t *testing.T, hosts []string) clusterHelper {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is general comment about the service_restore_integration_test.go.
The source file is pretty big already. Do you think it's possible to split it to smaller pieces following some logic ?
The first that comes to my mind is to keep only tests in service_restore_integration_test.go and move utility methods (everything what doesn't start with Test) to other files keeping the context of the file in filename.
e.g. backup utlity methods move to service_restore_backuputils_integration_test.go etc.
WDYT ?
It's just hard to find the "meat" in this tests as they are mixed with a lot of utils.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea!

@@ -430,7 +430,7 @@ func TestGetTargetIntegration(t *testing.T) {
if diff := cmp.Diff(golden, v,
cmpopts.SortSlices(func(a, b string) bool { return a < b }),
cmpopts.IgnoreUnexported(backup.Target{}),
cmpopts.IgnoreSliceElements(func(u backup.Unit) bool { return u.Keyspace == "system_replicated_keys" }),
cmpopts.IgnoreSliceElements(func(u backup.Unit) bool { return u.Keyspace == "system_replicated_keys" || u.Keyspace == "system_auth" }),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why to skip it in assertion. Maybe checking the running version of Scylla server and behaving accordingly is the better choice ?

Comment on lines 1991 to 1996

t.Run("repair ignore hosts", func(t *testing.T) {
h := newRepairTestHelper(t, session, defaultConfig())
ctx, cancel := context.WithCancel(context.Background())
defer cancel()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment refers to commit message:

This test stops and starts Scylla service on a node, and it could happen that this node is not fully responsive when the next test is already running. Running this test at the end removes the races between service start and next test execution.

Do you know why the node may be not fully responsive ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this test is executed, is starts the node again and sleeps for 5s. It might happen that the node won't be able to take part in a repair after such short startup time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I will try to improve waiting mechanism instead of moving it to the back of the test list.

@Michal-Leszczynski Michal-Leszczynski force-pushed the ml/test-enterprise-nightly branch 2 times, most recently from aec76b3 to 986d708 Compare May 29, 2024 11:34
…on_test.go

This commit refactors restore_test helper functions and extracts them in helper_integration_test.go. They will be used for a improved test framework in the future commits. It also gets rid of validateTable, as it was introduced as a workaround for issues with COUNT(*), but is no longer needed.
TestHelper is a reimplementation of restoreTestHelper which wasn't well-designed. It removes a lot of boilerplate from test setup and also allows for more flexibility.
TestRestoreTablesUserIntegration is implemented with new test framework (testHelper). It checks if we can log in as a restored user and use their permissions.
TestRestoreTablesNoReplicationIntegration is rewritten so that it's prettier, and it works well with new default tombstone_gc mode repair.
Starting with Scylla 6.0, system_auth is moved to system/system_auth_v2 and is handled by raft.
New clusters won't have the old system_auth tables, so the tests should be adjusted accordingly.
Repair ignore hosts test stops and starts Scylla service on a node, and it could happen that this node is not fully responsive when the next test is already running.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include Scylla Enterprise nightly latest image to CI tests
2 participants