Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go haywire on configuration #1714

Closed
wants to merge 5 commits into from

Conversation

flyinghyrax
Copy link
Contributor

I ended up spending an absurd amount of time on this, bouncing back and forth between abstracting away ConfigParser as much as possible, and integrating with ConfigParser as much as possible. At one point I had a pretty cool config section validation mechanism going based on attrs and type annotations (it lives tagged over here, I couldn't bear to outright delete it). What I've settled on feels somewhere between the two. Goals included:

  • Do not change the interfaces for plugin authors at all
  • Consolidate handling of default values for core configuration
  • Support type-annotated access to core configuration

Change Summary

This "final" version makes these major changes:

  • Fixes handling of the diff-file option to not create a diff file if the option isn't set.
  • Fixes string interpolation for the diff-file option using the {{ section_option }} syntax it supports.
  • Changes BandersnatchConfig to not use the singleton pattern:
    • I aimed to do this because the configuration object was already so close to not needing a global singleton anyway: the object was already passed from main to subcommands as an argument, and both storage and filter plugin constructors accepted it as an optional argument.
    • The loader routines for storage and filter plugins now require a config object, and the init functions for plugins now require a config object.
    • The public interface for plugin subclasses should be unchanged - authors still do setup outside the constructor in a initialization method and can access the complete configuration via the ConfigParser API through an attribute.
    • Pros: no more managing Singleton state in unit tests, explicit dependency injection where configuration is required.
    • Cons: ...this cascaded into updating what felt like every single unit test.
  • Change BandersnatchConfig to be a subclass of ConfigParser:
    • Most uses of BandersnatchConfig were just accessing the .config attribute anyway, and I found this simplified some things.
  • Override optionxform in BandersnatchConfig to allow either - or _ in option names:
    • Previously some options were defined using _ and some using -; this way makes it a user style preference, the ConfigParser will ingest either.
  • Split the packaged default.conf into two files: default.conf and example.conf:
    • example.conf is the original default.conf renamed. It retains all the in-file comments and is the file written to disk by the main method if the config file path from sys.argv doesn't exist.
    • default.conf contains the default values for every option in the [mirror] section, sans the directory option.
  • Load [mirror] configuration default values from default.conf:
    • Config loading first loads the default values, then loads the user-provided config file. The user config file only requires mirror.directory, all other options in the user's mirror configuration replace the default values if specified.
    • No default/fallback value needs to be specified when getting values via the ConfigParser API, so defaults should not diverge across subcommands.
    • Default values are present whether the configuration is accessed through the ConfigParser API or the the typed interface.
    • The only major downside to this approach I encountered was the inability to distinguish whether options with empty defaults are not specified, because ConfigParser doesn't have the notion of an option being "defined but not set". This affects options like log-config and diff-file where the default value is "nothing", but we still want to consistently allow safely reading these options without having to specify a default at the usage site. (As a concrete example, ConfigParser.has_option no longer works to distinguish whether log-config is set, because it will have a default value of "".)
  • Create a TypedDict class for the mirror section of the config file (bandersnatch.configuration.mirror_options):
    • Using TypedDict seems minimally invasive since options can still be access through the Mapping interface supported by configparser's SectionProxy, and I had done enough damage to the BandersnatchMirror initializer already.
    • Options that represent paths are converted to PurePath - an "abstract" path representation that can be passed to a storage plugin's PATH_BACKEND.
    • Options with corresponding Enum types are converted to that type.
    • str, int, float, and bool use the underlying configparser "getters".
    • The validation routine uses Exception subtypes to generate consistent error messages for missing, empty, or invalid options; the error message always includes the section and option names.
    • str values that can't be empty are checked
    • int and float values are range checked (non-negative, max values for workers and verifiers)
    • BandersnatchConfig gets a method to run the validation/conversion function and caches the result, so the validated options can be accessed through that method multiple times without re-running the validation
  • Between removing Singleton, changing how defaults get loaded, and adding MirrorOptions, most of the unit tests were changed somehow. Many are mechanical transformations in test setup steps. The configuration tests were rewritten from unittest to pytest since I needed to update or replace the majority of those tests regardless.

Tasks

Some things still left to do...

  • Rebase on main
  • Update delete and verify subcommands fetching of mirror options
  • Needs more inline comments on rationale and flow
  • Update example.conf to include barebones plugin configuration
  • Verify test coverage of new configuration submodules
  • Update the documentation
  • Update/fix the changelog

Related Issues

This replaces the implementation for custom configparser option
references in the diff-file option. Moves detection and handling for
the custom syntax into a dedicated submodule. Uses a regular expression
to preserve text before and after the reference.
Restore original behavior where a diff file is only written if
the 'mirror.diff-file' config option is set to a non-empty value.
- Pass config to filter plugin constructors
- Pass config to storage plugin constructors
- Change BandersnatchMirror to require pre-loaded
  storage and filters

...this changed (nearly?) all the unit tests.
@cooperlees
Copy link
Contributor

Many thanks for this, but, wow, this is a big PR. I wonder if we could divide it up into smaller PRs getting to your ideal end state via a few smaller steps.

I've learnt over the years doing many smaller diffs/PRs has many wins. Mainly:

  • It's easier for the author to keep focus
    • e.g. Not knocking you here, but look how long you've spent here ... I feel lots of smaller PRs with us merging more often would have allowed you to actually get to the end state faster
  • It's WAY easier on the reviewer
  • It's easier to roll back some smaller diffs (sometimes that is not possible - But always awesome to make it so)

With this PR I now have a lot of context and goals to absorb and workout while trying to make sure none of it will break existing usages etc. etc. ...

I will solider through this one in the next few days, but in the future, lets try and come up with a way to get more PRs and land them quickly and move forward enhancing our way to the glorious end state.

@cooperlees
Copy link
Contributor

The CI looks like maybe a new config param snuck in for S3 while you've been working on this?

@flyinghyrax
Copy link
Contributor Author

You’re 100% right about the PR size! Well advised.

There are several pieces to this I can split into individual PRs. (The first few commits, for instance, are pretty standalone). Everything needs to be rebased and I have remaining cleanup anyway, so I will close this PR and split it up into successive ones that are hopefully easier to review.

re: CI - I think you’re right, that’d be the S3 related PR merged recently 🙂

@flyinghyrax
Copy link
Contributor Author

Closed to be split into smaller pieces, starting with #1715

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Many options do not have a configured fallback value even if the documentation says so
2 participants