-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error trying to harvest ERA via OAI #3435
Comments
Spoke with @nmacgreg because he was working on making firewall rules permanent. The IP from the EBSCO message is not being blocked. |
I have a hunch that this may be caused by the seemingly irregular 3:00am redis reboots which would bring this service down, I have yet to verify this however |
To add another thought to the above thought. The below is an excerpt from the HAProxy log for IP 140.234.252.9 on April 1. Notice the HTTP 500 error in the first. The second is a request with the same URI, it succeeded with an HTTP 200 but required 5191ms. Does this indicate that the OAI client sent a request, then waits 5000ms and timed out thus sending a second request which caused the HTTP 500 error? I'm assuming the log timestamps recorded at the end of the HTTP session (not the start) thus the second request in the log actually starts before the first. Testing response time:
Why do two requests to the same URI result in different HTTP status codes (the shorter one is HTTP 500 while the longer is HTTP 200)?
Description of the the HAProxy HTTP log format: https://sematext.com/blog/haproxy-logs/#http-format |
RE: Connor's speculation about this being related to the reboot of Redis:
I have 2 questions:
|
Jeff was provocative on Slack & brought this to my attention. ! Note that to perform this testing, you must mangle your /etc/hosts to make era.library.ualberta.ca "point to" the host address for either era-app-prd-1 or era-app-prd-2, and it's frightfully important that you undo this mangle after testing! ! Use this command for testing: The HTTPD logs on the Prod server era-app-prd-2 do record the 500 response, but don't give any useful details. OTOH, the production.log is pretty chatty about the 500 response, and hands off to rollbar. Maybe this will give you a place to start? Can you recreate this in Dev, for instance? Or, has anybody tried, in Staging? Is it just this one URL, or are other URL's failing? What is source of the data used to inform the response to this query -- redis, postgresql, or an maybe an external web resource -- and can you test that backend to find the source of the flakiness? I was unable to immediately find a explanation for this query working 100% of the time on one Prod server, but intermittently failing on another. Our business processes are really carefully designed to prevent differences between servers. I'm open to the idea of either building a new Prod VM, or performing the kind of rebuild operation we'd use to recover from a bad update (eg delete the jupiter package; clean primary directories; use the playbook to reinstall) on era-app-prd-2. Find me on Slack if you want to pursue this; I'm stepping away. |
A test from yesturday: I try an OAI request on each of the 3 prod Jupiter servers. The request works on only 2 of 3 with the third returning a 500 error.
I'm changing my /etc/hosts file to isolate each prod server The following also succeeds on 2 of 3 prod servers.
Fast forward to today, I tried reproducing yesterday's test and received different results (all 3 prod servers worked for both test URIs). What can change day-to-day and server to server?
Todo:
Fyi, I think
curl --resolve "era.library.ualberta.ca:443:${IP}" "https://era.library.ualberta.ca/oai/?verb=ListRecords&set=90b3539f-2198-4058-9786-5e3ccd9e671a:7966f60a-4353-4286-941b-dbd33cd74867&metadataPrefix=oai_dc"
curl --resolve "era.library.ualberta.ca:443:${IP}" "https://era.library.ualberta.ca/oai/?verb=ListRecords&metadataPrefix=oai_dc"
|
I had a chance to do some digging into this as I am fairly familiar with Oaisys, Redis is leveraged in Oaisys as a key value store to handle resumption tokens (defined OAI-PMH parameter). I believe Redis is on one server: with the discovery of the server discrepancy the redis reboots were likely a red herring. I spend some time in the Jupiter/Oaisys/ActsAsRdfable code-bases and have narrowed some things down. I believe this is the piece of code throwing the error (https://github.com/ualbertalib/acts_as_rdfable/blob/e7ae5f76157139bae3804230a0d789affee1a9af/lib/acts_as_rdfable/acts_as_rdfable_core.rb#L26). I did notice that the first occurrence we have of this error in rollbar was Feb 9th, 2024 (https://app.rollbar.com/a/ualbertalib/fix/item/jupiter/1833) which more or less lines up with the release containing the change linked above (jupiter version 2.7.0 released Jan 20, 2024.) which adds to my suspicion, that release also had a major Rails upgrade however. One thing that is likely unrelated but came to mind is an old ticket in shortcut (previously clubhouse which was used for project management at the time) which I have included a screenshot of below: Stepping away from this rabbit hole for now |
I suspect Connor's message is on the right track. I've been unable to find another explanation nor reproduce reliably. Expanding a bit on what I think Connor is saying:
raise InvalidArgumentError unless self.class.supported_metadata_serialization_formats.include?(format) Thoughts:
|
I've updated the previous comments with my existing thoughts. My Library sprint ends this week so I'm returning the issue to the queue in the event another person has cycles to move this forward. |
Describe the bug
EBSCO (who harvests ERA for inclusion in EDS, our discovery service) has recently (beginning mid-March) run into an issue when trying to harvest ERA. This is what they have described:
"We've recently encountered an issue with the OAI data harvests for your Institutional Repository, "ERA: University of Alberta Institutional Repository (ir00008a)". Harvest has been aborted by an error "ERROR: Could not harvest from https://era.library.ualberta.ca/oai: Error while harvesting using setSpec '90b3539f-2198-4058-9786-5e3ccd9e671a:7966f60a-4353-4286-941b-dbd33cd74867'. Will not continue.
Some reasons this might be happening:
--If your server is offline for a technical issue?
--If your server is shut down on a regular schedule?
--If we should still use the following base URL for our harvest attempts https://era.library.ualberta.ca/oai
In addition, is it possible that our IP address is blocked by a firewall? In order to access and harvest your IR records via OAI-PMH, we need the following IP addresses added to your network exceptions list:
140.234.252.9
140.234.253.9
140.234.255.9"
If it's of any help, I have been able to use the basic browser approach to get records from this set without an issue; see https://era.library.ualberta.ca/oai/?verb=ListRecords&set=90b3539f-2198-4058-9786-5e3ccd9e671a:7966f60a-4353-4286-941b-dbd33cd74867&metadataPrefix=oai_dc
Not sure if this was a one-time issue or if something more substantial is happening.
The text was updated successfully, but these errors were encountered: