-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Engine.HealthCheck tests network connectivity #431
Conversation
0fee042
to
a808453
Compare
Codecov Report
@@ Coverage Diff @@
## master #431 +/- ##
=======================================
Coverage 99.28% 99.29%
=======================================
Files 53 55 +2
Lines 1117 1131 +14
=======================================
+ Hits 1109 1123 +14
Misses 8 8 |
a808453
to
0082046
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see my tech talk is already helping us test things 😄
Okay, whew. Thanks for the fantastic review. I just pushed up a commit that I believe addresses all these comments. Please "resolve conversation" everywhere you agree, and let me know if there's anything else. Upon approval, I'll squash this commit back into the main one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more quick comments! My only point of (mild) concern is the inets
thing, the rest aren't that important.
lib/engine/health.ex
Outdated
network_check_mod = Keyword.get(opts, :network_check_mod, Engine.NetworkCheck.Hackney) | ||
|
||
restart_fn = | ||
Keyword.get(opts, :restart_fn) || Application.get_env(:realtime_signs, :restart_fn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some reason this one is different from the others?
Keyword.get(opts, :restart_fn, Application.get_env(:realtime_signs, :restart_fn))
lib/engine/health.ex
Outdated
{:reply, [], timer_ref} | ||
@spec log_restart() :: :ok | ||
def log_restart do | ||
Logger.error("log_restart") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Come to think of it, do we need this here, since we already log the restarting_application
message right before the only place we call the restart_fn
? This could instead be a no-op function (removing the need to even define anything extra on this module; the default config value could be fn -> :ok end
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Playing around with it, this is I think what I ran into before. You can't have a function literal in a config .exs file. So I have to put a "no-op" function somewhere so I can use the capture syntax. I couldn't think of a good arity-0 function in the standard library to specify instead (ideally returning :ok
). Maybe I can issue a PR to Elixir Lang to add System.ok()
....
I'll remove the logging since, as you say, it's superfluous, and I guess change its name to be restart_noop
to maybe be a bit clearer? And I'll add a comment, too. It doesn't have to be here, but I figured I might as well put it in this module since that's the reason for its existence.
mix.exs
Outdated
@@ -30,6 +30,7 @@ defmodule RealtimeSigns.Mixfile do | |||
def application do | |||
[ | |||
extra_applications: [ | |||
:inets, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will start up inets
also in non-test environments, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good catch. Moved it to test_helper.exs
.
|
||
defmodule MockGoodNetwork do | ||
def unquote(:do)(_data), do: {:proceed, [response: {200, 'OK'}]} | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could define these inside HackneyTest
, thus not putting them in the global namespace (and maybe having to deal with a collision later if we call something else MockGoodNetwork
).
test/engine/health_test.exs
Outdated
|
||
{:ok, health_pid} = | ||
Engine.Health.start_link( | ||
name: :health_test3, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to have this be the same value in every test, since tests within a given module are serialized (even with async
).
Alternatively, by default the GenServer could be unnamed, and the app supervisor could explicitly give it one if needed — we've used this approach on other apps (see ApiWeb
in the API, and specifically its spec for RequestTrack
). Actually, is it needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, you're right. It doesn't need a name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🌩️ 🔁 ✅ 🎉
78461ce
to
f4b3808
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks. Good.
Summary of changes
Asana Ticket: 💳 Have realtime_signs restart if HTTP requests are failing
This adds another check to the 1/min
Engine.Health
to test network connectivity, to mitigate the issue we saw last week where the app "went silent" and stopped sending POSTs to the signs, to signs-ui, and to Splunk. A bounce fixed that issue.The app with this PR does a HEAD request to google, and after 5 consecutive failures does a
System.restart()
.I tested
System.restart()
on opstech3 and it works reliably, cleanly shutting down the app and unloading the code, and then starting it back up again. The log train is maintained. Stopping and starting a service will prompt WinSW to clear the logs, but a restart from within the system does not.I introduced
mox
to aid in testing without having to reach out to the network as much.I tested this on opstech3 using API dev-blue, and then spun down the ECS instances to see it restart, and then spun them back up and it went on fine.
Reviewer Checklist