Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate and fix race conditions #129

Closed
radeksimko opened this issue Mar 1, 2021 · 8 comments
Closed

Investigate and fix race conditions #129

radeksimko opened this issue Mar 1, 2021 · 8 comments
Labels
bug Something isn't working

Comments

@radeksimko
Copy link
Member

radeksimko commented Mar 1, 2021

Version: v0.13.0 (22121c5627c8729080f6de7e829a6a2b4f772703)

I also managed to reproduce some races under 0.12.0, but it was hard to tell whether these are the same due to some other (non-race) test failures adding to the noise.

There is a few tests which fail under go test -race:

--- FAIL: TestApply (180.40s)
    --- FAIL: TestApply/basic-0.11.14 (29.45s)
--- FAIL: TestTFVersionMismatch (3.29s)
    --- FAIL: TestTFVersionMismatch/tf99-0.11.14 (0.95s)
--- FAIL: TestContext_sleepTimeoutExpired (190.26s)
    --- FAIL: TestContext_sleepTimeoutExpired/sleep-0.12.30 (62.82s)
--- FAIL: TestValidate (7.81s)
    --- FAIL: TestValidate/invalid-0.12.30 (0.53s)

Full output is available in this gist: https://gist.github.com/radeksimko/9f9cd6a451204c12cc3e312dd0c34836 or you can reproduce it yourself via go test ./... -race.

@radeksimko radeksimko added the bug Something isn't working label Mar 1, 2021
@kmoe
Copy link
Member

kmoe commented Mar 2, 2021

Thanks for opening this, @radeksimko. Is this causing problems in the LS?

@radeksimko
Copy link
Member Author

@kmoe I'm not aware of it causing problems to the users, at least nobody filed an issue. I was able to reproduce it in one test in LS though.

I assume that the reason it's (probably) not affecting users is because they'd have to be running many commands at the same time - which some of them probably do, but we limited that significantly per hashicorp/terraform-ls#391 - and the execution would need to overlap in a particular (split-second) moment.

I mean - race conditions are just hard to reproduce in general. 🤷🏻‍♂️

@kmoe
Copy link
Member

kmoe commented Mar 2, 2021

There are valid terraform-exec use cases involving parallel execution outside the LS, so this is the highest priority bugfix for tfexec right now, but active LS issues would raise it to "critical".

There shouldn't be any other test failures with 0.12.0, or rather, 0.12.30 (a support policy we should be clearer about). Filing and fixing these will be done as part of investigating the race failures.

@radeksimko
Copy link
Member Author

There shouldn't be any other test failures with 0.12.0, or rather, 0.12.30

As far as I can remember these were related to some of the CLI flag changes (in master/main), so I think that is expected as these tests rely on an external source, which itself is a moving target.

I'm not sure how/whether that affects the support policy. I assume we intend to support all versions of Terraform (on best-effort basis), but do we also intend to offer such support in all versions of terraform-exec? That seems like a promise hard to uphold.

@radeksimko
Copy link
Member Author

Just ran into another race condition in some LS tests:

https://github.com/hashicorp/terraform-ls/pull/448/checks?check_run_id=2225193121#step:10:31

The trace suggests it's a different one, but I haven't looked into it too deeply.

@radeksimko
Copy link
Member Author

radeksimko commented Mar 30, 2021

Actually this one is pretty much 100% (3 out of 3 attempts) reproducible with go test -race (on LS side at least) ^ which means it's likely to be attributed to some of the recent dependency updates. 👀

@radeksimko
Copy link
Member Author

The last mentioned race condition will is fixed in go-getter 1.5.3 via hashicorp/go-getter#315 but the other race conditions are AFAIK still present and likely unrelated.

@radeksimko
Copy link
Member Author

This was addressed by the amazing @lornasong in #299 🎉

go test ./... -race -count=5 -timeout=30m
?   	github.com/hashicorp/terraform-exec/internal/version	[no test files]
ok  	github.com/hashicorp/terraform-exec/tfexec	17.355s
ok  	github.com/hashicorp/terraform-exec/tfexec/internal/e2etest	907.986s
?   	github.com/hashicorp/terraform-exec/tfexec/internal/testutil	[no test files]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants