You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our "DevOps" leaves much to be desired, hoping to kickstart some progress here.
One of the main types of production incidents we deal with is our scraper going down, which is occasionally an indicator of a bigger problem. The following can lead to the scraper dying:
Scraper bug, or deployment-based bug (e.g. wrong env variables when deploying)
NUS API data format is changed without notifying us, or data entered from an NUS department is malformed
Server runs out of disk space or has malformed log / data files
We have gotten quite good at manually diagnosing and then fixing these problems (though we should work on automating some fixes too!). However, we are not good at actually detecting when our scraper goes down, so our overall response time is greatly bottlenecked by alerting / monitoring.
The good thing is that there is a very clear and obvious signal for when the scraper is up: For any API call on api.nusmods.com/v2 for the current AY / semester, the datetime in the last-modified response header should be ~1 hour ago.
Our "DevOps" leaves much to be desired, hoping to kickstart some progress here.
One of the main types of production incidents we deal with is our scraper going down, which is occasionally an indicator of a bigger problem. The following can lead to the scraper dying:
We have gotten quite good at manually diagnosing and then fixing these problems (though we should work on automating some fixes too!). However, we are not good at actually detecting when our scraper goes down, so our overall response time is greatly bottlenecked by alerting / monitoring.
The good thing is that there is a very clear and obvious signal for when the scraper is up: For any API call on api.nusmods.com/v2 for the current AY / semester, the datetime in the
last-modified
response header should be ~1 hour ago.An extremely basic monitoring service could:
curl -I https://api.nusmods.com/v2/2023-2024/moduleList.json
last-modified
is greater than 2 hours agoI think we can make this a DO serverless function and incur 0 extra costs
The text was updated successfully, but these errors were encountered: