You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the last couple of weeks, there have been multiple days where I see multiple druids stuck in the releaseWF with an error at the update-marc step:
update-marc : problem with update-marc (BackgroundJob: 30760096): {:errors=>[{:title=>"Update MARC error", :detail=>"Unexpected response: 400 {"message":"Unexpected error in InitProducerIdResponse; The request timed out.","type":"-3","code":"INTERNAL_SERVER_ERROR"}"}]}
When I reset the errors (by using the "reset" link from the workflow grid), most of them resolve on the retry. However, if there are 10 or more errors and they're all retried at the same time, then a couple of druids usually fail a second time. I then reset those and they succeed on the next try. It seems like something may have changed on the FOLIO side since it doesn't look like the release process has changed on the SDR side and this error wasn't happening earlier.
Additional information
I think that error may end up in HB as: https://app.honeybadger.io/projects/50568/faults/100168652 There's a noticeable increase after December 1. Note that the issue in this ticket is a different error than sul-dlss/argo#4174 , which had to do with the MARC record itself. When that error happens, retrying doesn't succeed because the MARC record itself needs to be updated.
The text was updated successfully, but these errors were encountered:
Is this error message the response from folio? {"message":"Unexpected error in InitProducerIdResponse; The request timed out.","type":"-3","code":"INTERNAL_SERVER_ERROR"} I search the folio-org repos for "InitProducerIdResponse" and I don't find anything so it is difficult to tell what's going on. The only thing I could say is "sometimes folio is slow, especially the mod-quickmarc endpoints". 🤷
Since retrying the step from Argo seems to eventually work, I wonder if we could automate retries? That's assuming we could do it safely without overloading FOLIO with requests. I've been finding items with this issue and manually retrying approximately every week since this started.
In the last couple of weeks, there have been multiple days where I see multiple druids stuck in the
releaseWF
with an error at theupdate-marc
step:When I reset the errors (by using the "reset" link from the workflow grid), most of them resolve on the retry. However, if there are 10 or more errors and they're all retried at the same time, then a couple of druids usually fail a second time. I then reset those and they succeed on the next try. It seems like something may have changed on the FOLIO side since it doesn't look like the release process has changed on the SDR side and this error wasn't happening earlier.
Additional information
I think that error may end up in HB as: https://app.honeybadger.io/projects/50568/faults/100168652 There's a noticeable increase after December 1. Note that the issue in this ticket is a different error than sul-dlss/argo#4174 , which had to do with the MARC record itself. When that error happens, retrying doesn't succeed because the MARC record itself needs to be updated.
The text was updated successfully, but these errors were encountered: