Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple from Solr #4459

Open
mjgiarlo opened this issue Mar 23, 2023 · 0 comments · May be fixed by #4578
Open

Decouple from Solr #4459

mjgiarlo opened this issue Mar 23, 2023 · 0 comments · May be fixed by #4578
Assignees

Comments

@mjgiarlo
Copy link
Member

This is an "SDR evolution" ticket, the intent of which is to reduce dependencies within DSA. (HT to @jcoyne for the idea!)

As far as we know, Solr is used within DSA for:

Since DSA already sits atop the source of truth for Cocina (Postgres), and it's queryable, DSA can get this information directly from Postgres without needing to consult Solr.

@mjgiarlo mjgiarlo self-assigned this Aug 30, 2023
@mjgiarlo mjgiarlo linked a pull request Aug 30, 2023 that will close this issue
mjgiarlo added a commit that referenced this issue Aug 30, 2023
Fixes #4459

This is a spike commit towards an SDR Evolution the team has been batting around for a while now, namely severing DSA's dependency on Solr. The spike largely replaces Solr queries with direct DB queries, and for most use cases this works just fine. The key word here is "most..."

* The Solr queries have been replaced with DB queries that reach into JSONB columns which results in table scans. I tested all of these queries in stage with large-ish, but not prod-huge, data sets (~25K records) and most of them perform fine. That said, we might want to test this with prod-like data and do some benchmarking to determine if we want to index more of the JSONB data.
* A notable performance outlier is `MemberService.for` which needs to make a single Workflow API call for *each* member of a virtual object. These are impressively slow for a virtual object with a few thousand members, taking over a minute to complete.

Another question we'd need to answer to take this work forward is what to do about `bin/generate-druid-list`, which allows a user to issue Solr queries directly, and `lib/tasks/missing_druids.rake`, which compares what's in the DSA DB and what's in Solr to determine if any objects need (re-)indexing. Are these still useful? If so, could they live elsewhere or could we solve these problems in a different way? If the answer is no, we may not want to proceed with this decoupling.

**NOTE:** Since this is a spike meant to generate discussion, I have not yet bothered with deal with changing the tests (or caring about linting). That will naturally come later if we decide the idea and implementation has merit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants