Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: reading last snapshot in etcdutl migrate command #18010

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

fykaa
Copy link
Contributor

@fykaa fykaa commented May 15, 2024

Fixes a part of #17227

In the etcdutl/migrate_command.go file, I've rectified a minor oversight in the logic for reading the last snapshot during the WAL migration process. Previously, an empty walpb.Snapshot{} was passed as a reference, that would result in reading of the WAL from the beginning.

I will raise a follow-up PR that should also cover the addition of test cases, completing the solution for this issue

@k8s-ci-robot
Copy link

Hi @fykaa. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

To improve correctness of etcdutl migrate command

Signed-off-by: fykaa <faeka6@gmail.com>
@jmhbnz jmhbnz removed github_actions Pull requests that update GitHub Actions code area/robustness-testing area/observability labels May 19, 2024
@fykaa fykaa marked this pull request as ready for review May 20, 2024 12:43
@siyuanfoundation
Copy link
Contributor

Thanks @fykaa for the pr. Can you fix the tests?

@fykaa
Copy link
Contributor Author

fykaa commented May 24, 2024

Hi @siyuanfoundation, yes, I am on it. Fixing some test bugs that occured for me.

@karuppiah7890
Copy link
Contributor

/retest

@k8s-ci-robot
Copy link

@karuppiah7890: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@MadhavJivrajani
Copy link
Contributor

/ok-to-test

@k8s-ci-robot
Copy link

@fykaa: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-integration-1-cpu-amd64 b6967db link false /test pull-etcd-integration-1-cpu-amd64
pull-etcd-e2e-amd64 b6967db link false /test pull-etcd-e2e-amd64

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

etcdutl/etcdutl/migrate_command.go Outdated Show resolved Hide resolved
etcdutl/etcdutl/migrate_command.go Outdated Show resolved Hide resolved
if err != nil {
return nil, fmt.Errorf(`failed to find last snapshot: %v`, err)
}
w, err := wal.OpenForRead(c.lg, walPath, walpb.Snapshot{Index: lastSnapshot})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e2e failure logs: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/etcd-io_etcd/18010/pull-etcd-e2e-amd64/1794986793938980864

The error seems to be coming from:

return nil, fmt.Errorf(`failed to read wal: %v`, err)

Error excerpt:

        	Error Trace:	/home/prow/go/src/github.com/etcd-io/etcd/tests/e2e/utl_migrate_test.go:162
        	Error:      	Error "[/home/prow/go/src/github.com/etcd-io/etcd/bin/etcdutl migrate --data-dir /tmp/TestEtctlutlMigrateMigrate_v3.5_to_v3.5_is_no-op2858091090/002/member-0 --target-version 3.5] match not found.  Set EXPECT_DEBUG for more info Errs: [unexpected exit code [128] after running [/home/prow/go/src/github.com/etcd-io/etcd/bin/etcdutl migrate --data-dir /tmp/TestEtctlutlMigrateMigrate_v3.5_to_v3.5_is_no-op2858091090/002/member-0 --target-version 3.5]], last lines:\n2024-05-27T07:18:46Z\tinfo\tbbolt\tbackend/backend.go:197\tOpening db file (/tmp/TestEtctlutlMigrateMigrate_v3.5_to_v3.5_is_no-op2858091090/002/member-0/member/snap/db) with mode 2d72772d2d2d2d2d2d2d and with options: {Timeout: 0s, NoGrowSync: false, NoFreelistSync: true, PreLoadFreelist: false, FreelistType: , ReadOnly: false, MmapFlags: 8000, InitialMmapSize: 10737418240, PageSize: 0, NoSync: false, OpenFile: 0x0, Mlock: false, Logger: 0xc000130238}\r\n2024-05-27T07:18:46Z\tinfo\tbbolt\tbbolt@v1.4.0-alpha.1/db.go:321\tOpening bbolt db (/tmp/TestEtctlutlMigrateMigrate_v3.5_to_v3.5_is_no-op2858091090/002/member-0/member/snap/db) successfully\r\nError: failed to read wal: wal: snapshot mismatch\r\n (expected \"storage version up-to-date\\t{\\\"storage-version\\\": \\\"3.5\\\"}\", got []). Try EXPECT_DEBUG=TRUE" does not contain "storage version up-to-date\t{\"storage-version\": \"3.5\"}"

The error shown:

wal: snapshot mismatch

is returned only when indices match but Terms don't afaict, this is the error type:

ErrSnapshotMismatch = errors.New("wal: snapshot mismatch")

and that comes from here because we end up calling ReadAll as part of ReadWALVersion:

if snap.Term != w.start.Term {
state.Reset()
return nil, state, nil, ErrSnapshotMismatch
}

w.start here is ultimately the snapshot object that we pass into OpenForRead:

start: snap,

I think we should be passing in the Term here as well as part of walpb.Snapshot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

6 participants