case-notes-to-probation

A Spring Boot app to listen on an AWS queue and send case notes to probation.

To build:

./gradlew build

Health

/health/ping: will respond {"status":"UP"} to all requests. This should be used by dependent systems to check connectivity to keyworker, rather than calling the /health endpoint.
/health: provides information about the application health and its dependencies. This should only be used by keyworker health monitoring (e.g. pager duty) and not other systems who wish to find out the state of keyworker.
/info: provides information about the version of deployed application.

Pre Release Testing

Case notes to probation is best tested by the DPS front end. To manually smoke test / regression test:

Navigate to DPS and search for a prisoner
Add an OMIC case note to the prisoner
Add a keyworker case note to the prisoner
Wait 5 minutes or so and then check app insights to see that the case note has been sent to probation:

requests
| where cloud_RoleName == "community-api"
| where name == "PUT /secure/nomisCaseNotes/{nomisId}/{caseNotesId}"

For offenders that don't yet exist in Delius this will create a 404 which will then be ignored.

Running against localstack

Localstack has been introduced for some integration tests and it is also possible to run the application against localstack.

In the root of the localstack project, run command

sudo rm -rf /tmp/localstack && docker-compose down && docker-compose up

to clear down and then bring up localstack

Start the Spring Boot app with profile='localstack'
You can now use the aws CLI to send messages to the queue
The queue's health status should appear at the local healthcheck: http://localhost:8082/health
Note that you will also need local copies of Oauth server, Case notes API and Delius API running to do anything useful

Running the tests

With localstack now up and running (see previous section), run

./gradlew test

Investigating Dead Letter Queue (DLQ) messages

When we fail to process a case note due to an unexpected error an exception will be thrown and the case note will be moved to the DLQ.

If the failure was due to a recoverable error - e.g. network issues - then the DLQ message can and should be retried.

However, if the error is not recoverable - e.g. some new error scenario we weren't expecting - then we need to investigate the error and either:

fix the bug that is causing the error OR
handle and log the error so that the exception is no longer thrown and the message does not end up on the DLQ

Steps for investigating DLQ messages

Import the swagger collection into Postman - link to API docs at the top of this README.
Obtain an access token with ROLE_CASE_NOTE_QUEUE_ADMIN role - #dps_tech_team will be able to help with that
Call the /queue-admin/retry-all-dlq endpoint to transfer all DLQ entries back onto the main queue - this should get rid of any messages with recoverable errors
Check that the messages have gone from the dlq by going to https://case-notes-to-probation.prison.service.justice.gov.uk/health

For messages that don't then disappear from the dlq:

cd into the scripts directory and run the copy-dlq.sh script which copies the contents of the DLQ locally and summarises in summary.csv
run an AppInsights Logs query looking for exceptions shortly after the timestamp found in the csv
if there was an error calling a DPS service, check the logs for that service and possibly check the data in DPS
if there was an error calling a Delius service, check the Delius AWS logs and possibly check the data in Delius
identify mitigation for the error - fix bug or ignore error
once this code change is in production transfer the DLQ messages onto the main queue again and all should now be handled without exceptions

Alerts

Inactivity alert

We've had issues in the past where a pod stopped reading from the queue but nobody noticed. Eventually all 4 pods stopped reading and we stopped sending case notes to probation. Nobody noticed for a couple of weeks.

To warn us if this happens again we've created an alert in Application Insights that fires if any of the pods stop producing telemetry events. The alert is called Case Notes to Probation - office hours inactivity alert. Note that the alert only fires during office hours as low volumes outside office hours trigger false positives.

If the alert fires click on the View Search link which should run the query that failed in Application Insights. Run command kubectl -n case-notes-to-probation-prod get pods and compare the pods running to the pods from the query results. Restart the pod that doesn't appear in the query with command kubectl -n case-notes-to-probation-prod delete pod <insert pod name here>.

Name		Name	Last commit message	Last commit date
Latest commit History 232 Commits
.circleci		.circleci
.github		.github
gradle/wrapper		gradle/wrapper
helm_deploy		helm_deploy
src		src
.gitignore		.gitignore
.trivyignore		.trivyignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
applicationinsights.dev.json		applicationinsights.dev.json
applicationinsights.json		applicationinsights.json
build.gradle.kts		build.gradle.kts
docker-compose.yml		docker-compose.yml
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts
sonar-project.properties		sonar-project.properties

License

ministryofjustice/case-notes-to-probation

Folders and files

Latest commit

History

Repository files navigation

case-notes-to-probation

To build:

Health

Pre Release Testing

Running against localstack

Running the tests

Investigating Dead Letter Queue (DLQ) messages

Steps for investigating DLQ messages

Alerts

Inactivity alert

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages