-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DSEGOG-305 Config & Documentation for Data Re-ingestion #119
base: main
Are you sure you want to change the base?
Conversation
…script - This has been done because the move to MongoDB v7 with Database Services has resulted in a complex connection string so using that becomes easier than conerting the string to indivdual arugments, some of which don't seem to easily convert into CLI options
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #119 +/- ##
=======================================
Coverage 94.99% 94.99%
=======================================
Files 50 50
Lines 2837 2837
Branches 297 297
=======================================
Hits 2695 2695
Misses 105 105
Partials 37 37 ☔ View full report in Codecov by Sentry. |
@@ -7,8 +7,7 @@ ssh: | |||
enabled: false | |||
ssh_connection_url: 127.0.0.1 | |||
database: | |||
hostname: localhost | |||
port: 27017 | |||
connection_uri: mongodb://localhost:27017/opsgateway | |||
name: opsgateway |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can see, we're only still using this in one place:
db = client[Config.config.database.name] |
(and two places for print statements, which I'm less concerned about)
f" {Config.config.database.name}", |
f" {Config.config.database.name}", |
In most places since we're using mongoimport
as a subprocess I gather the fact that connection_uri
ends /opsgateway
means we don't need to pass it (the DB name) directly. So I'm sort of wondering if we need to still have Config.config.database.name
at all or if for that one case we still need it we could split it from the last /
in the uri (as apparently pymongo.errors.InvalidName: database names cannot contain the character '/'
)?
Alternatively, given that in the main config we have:
operationsgateway-api/operationsgateway_api/config.yml.example
Lines 23 to 25 in 1733055
mongodb: | |
mongodb_url: mongodb://localhost:27017 | |
database_name: opsgateway |
We could make this more consistent and keep the
name
and don't have it trailing the port.
The motivation of this change was around the connection string becoming to complex, so maybe the current approach is the most appropriate (without knowing the real string it's hard to judge). In reality this is all minor, but I suppose my concern would be that you change the name
but not the connection_uri
and now have one command that tries to use a different DB to the other (or vice versa). Ideally I think we'd have a single source of truth for opsgateway
, regardless of whether it's via connection_uri
or name
- but ideally not both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, one source of truth would be ideal. I think I kept Config.config.database.name
around because I couldn't (at the time) get mongoimport
to work with the connection URI - I'm not sure if that's the reality, I just couldn't get it working while working on this issue. Regardless, everywhere that Config.config.database.name
is used, I'm pretty sure I can refactor to use pymongo
instead, thus negating the need for that config option and fully moving to the URI.
Since developing this script (and classes in ingest/
to work alongside it), I've had the chance to use the script in anger. I haven't used the SSH functionality at all and to be honest, it's a bit of a pain to maintain - the code is similar across LocalCommandRunner
and SSHHandler
too. The original intention was to provide the ability to connect to the API of the dev server and be able to import the experiments etc., all in a single script. In reality, I have a separate cloud VM where I configure the script to point at the database and S3 bucket used by the dev server; I've taken this approach to benefit from the speed of a beefy cloud VM - the ingestion would take
I think I'm going to branch off this branch and fully switch to the connection URI and refactor/remove the separate LocalCommandRunner
and SSHHandler
classes. Not sure when I'll do this, hopefully in the next week or so. I'll fully digest your comments on #120 first (I've quickly scanned through them) to ensure the changes are done in a sensible way.
This is a small PR that makes a couple of minor changes for the re-ingestion of data on the new database that Database Services host. I've made a couple of updates to the documentation (mainly an update to the dev server docs to explain the current situation with the data on the machine). For the ingestion script, I've moved from the configuration using a hostname and port for the database to a connection string. I've made this change because the new connection string from Database Services is quite complex and therefore the string is a better fit than the hostname and port.