releaseWF hits errors at update-marc step #4663

andrewjbtw · 2023-12-20T01:12:11Z

In the last couple of weeks, there have been multiple days where I see multiple druids stuck in the releaseWF with an error at the update-marc step:

update-marc : problem with update-marc (BackgroundJob: 30760096): {:errors=>[{:title=>"Update MARC error", :detail=>"Unexpected response: 400 {"message":"Unexpected error in InitProducerIdResponse; The request timed out.","type":"-3","code":"INTERNAL_SERVER_ERROR"}"}]}

When I reset the errors (by using the "reset" link from the workflow grid), most of them resolve on the retry. However, if there are 10 or more errors and they're all retried at the same time, then a couple of druids usually fail a second time. I then reset those and they succeed on the next try. It seems like something may have changed on the FOLIO side since it doesn't look like the release process has changed on the SDR side and this error wasn't happening earlier.

Additional information
I think that error may end up in HB as: https://app.honeybadger.io/projects/50568/faults/100168652 There's a noticeable increase after December 1. Note that the issue in this ticket is a different error than sul-dlss/argo#4174 , which had to do with the MARC record itself. When that error happens, retrying doesn't succeed because the MARC record itself needs to be updated.

The text was updated successfully, but these errors were encountered:

shelleydoljack · 2024-01-18T22:09:24Z

Is this error message the response from folio? {"message":"Unexpected error in InitProducerIdResponse; The request timed out.","type":"-3","code":"INTERNAL_SERVER_ERROR"} I search the folio-org repos for "InitProducerIdResponse" and I don't find anything so it is difficult to tell what's going on. The only thing I could say is "sometimes folio is slow, especially the mod-quickmarc endpoints". 🤷

lwrubel · 2024-01-19T12:34:16Z

@shelleydoljack Unexpected error in InitProducerIdResponse is a Kafka exception, if that helps in looking at logs.

andrewjbtw · 2024-01-23T22:37:49Z

Since retrying the step from Argo seems to eventually work, I wonder if we could automate retries? That's assuming we could do it safely without overloading FOLIO with requests. I've been finding items with this issue and manually retrying approximately every week since this started.

andrewjbtw added the bug label Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

releaseWF hits errors at update-marc step #4663

releaseWF hits errors at update-marc step #4663

andrewjbtw commented Dec 20, 2023

shelleydoljack commented Jan 18, 2024

lwrubel commented Jan 19, 2024 •

edited

andrewjbtw commented Jan 23, 2024

releaseWF hits errors at update-marc step #4663

releaseWF hits errors at update-marc step #4663

Comments

andrewjbtw commented Dec 20, 2023

shelleydoljack commented Jan 18, 2024

lwrubel commented Jan 19, 2024 • edited

andrewjbtw commented Jan 23, 2024

lwrubel commented Jan 19, 2024 •

edited