Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a specifier for alerts that are linked to a library #9539

Open
PeterHamfelt opened this issue Apr 4, 2024 · 36 comments
Open

Add a specifier for alerts that are linked to a library #9539

PeterHamfelt opened this issue Apr 4, 2024 · 36 comments
Labels
Enhancement ✨ Improvement to a component Needs specification 🔐 Accepted as a potential improvement, and needs to specify edge cases, message names, etc.

Comments

@PeterHamfelt
Copy link
Member

Current problem

I'm trying to find a solution to specify an alert which is specific to a given library and version of that library.
An example is the pylint-ml plugin which has checkers associated with the Pandas library.
pylint-dev/pylint-ml#23

Desired solution

When creating a checker, I would like to be able to add

  • which library the code smell/alert is associated to
  • which version of the lib ?below/exact/above? it is linked to

The question I have is how can we check which lib version the user is using or does this need to be configured manually?
I am open to suggestions :)

Additional context

No response

@PeterHamfelt PeterHamfelt added the Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling label Apr 4, 2024
@Pierre-Sassoulas Pierre-Sassoulas added Enhancement ✨ Improvement to a component Needs specification 🔐 Accepted as a potential improvement, and needs to specify edge cases, message names, etc. and removed Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels Apr 4, 2024
@mbyrnepr2
Copy link
Member

Hi @PeterHamfelt perhaps this could be relevant to your question -

The question I have is how can we check which lib version the user is using or does this need to be configured manually?
I am open to suggestions :)

>>> from importlib.metadata import version
>>> version("pandas")
'2.2.1'
>>> 

https://docs.python.org/3/library/importlib.metadata.html

@Pierre-Sassoulas
Copy link
Member

Thank you Marc ! I think we also need to specify how we handle the message declaration of the lib version in the checker (this is already done for python version we can build on it).

Example of use for the current minversion:

from pylint.checkers import BaseChecker

class AsyncChecker(BaseChecker):
    msgs = {
        "E1700": (
            "Yield inside async function",
            "yield-inside-async-function",
            "Used when an `yield` or `yield from` statement is "
            "found inside an async function.",
            {"minversion": (3, 5)},
        ),
    }

... But we must also declare what lib(s) is (are) affected. So possible data structure:

from pylint.checkers import BaseChecker

class PandasNumpyChecker(BaseChecker):
    msgs = {
        "E0042": (
            "Some pandas/numpy version specific message",
            "pandas-numpy-message",
            "Some pandas/numpy version specific message rational",
            {"minversion": (3, 5), "libversion": {
                "pandas" : {"minversion": "1.0.1", "maxversion": "2.0.1"}, # version could be tuple
                "numpy":  {"minversion": "1.0.1}, #, but that's not what importlib.metadata.version returns
            }},
        ),
    }
  • What we do when the lib is not installed (warn the user, or do nothing, I can see both being useful.) For example in pylint-django is django is not installed, then something went very wrong and a fatal error message could be raised !
from pylint.checkers import BaseChecker

class DjangoChecker(BaseChecker):
   # ....
   msgs = {
       #...
       "libversion": {
           "django" : {"onmissing": "fatal"}, # if None nothing is done, if "fatal" a fatal error message is raised.
       }
   }

Let me know what you think about this spec proposal ;)

@mbyrnepr2
Copy link
Member

Looks good @Pierre-Sassoulas 👌

@PeterHamfelt
Copy link
Member Author

Hi @mbyrnepr2, @Pierre-Sassoulas ,

I apologize for the delayed response. I've reviewed the suggested setup for specifying library versions and I'm genuinely impressed. The structure you proposed:

"libversion": {
"pandas": {"minversion": "1.0.1", "maxversion": "2.0.1"},
"numpy": {"minversion": "1.0.1"}
}

This approach, especially considering versions as a range (with minimum and maximum bounds), seems both robust and flexible. I believe this could be particularly advantageous for pylint-ml. Given that each checker might depend on specific library versions, and these dependencies are subject to evolve over time, integrating this implementation would significantly enhance the analyzer's adaptability and accuracy.

I'm looking forward to discussing this further and exploring how we can incorporate this feature.

@DanielNoord
Copy link
Collaborator

How do we handle non semver versions in dependencies?

@Pierre-Sassoulas
Copy link
Member

Good point, we must use string in the data structure not tuple and then handle the complexity internally. Other versioning schemes I know are:

  • Date with (month and days ?) + character not in alpha/beta/rc/etc. (hash?) 2022.3.23f1 (unity). This looks almost like proper semver..
  • Rounded up date 24.1a1) (black alpha versions), but the actual version have the months (https://github.com/psf/black/releases/tag/23.12.0)
  • Improper semver 2.3.1, then 2.4, then 2.4.1(from antique colleague with a perl background, hi Benoit 👋 <3)

We can't use use alphabetical ordering on the string because we need to detect that 9.9.9 is less than 10.0.0, and 2022.9.23f1 is less than 2022.10.23f1 and 24.1a1 is less than 24.9.1 which is less than 24.10.1. I guess it's split on the dot and then try to cast to int. If the versioning scheme does not have separation with dot, then we try to sort it alphabetically. If sorting it alphabetically does not result in something chronological then we don't support it ? Do such schemes exists and do they deserve pylint's checkers 😄 ? (This part is probably going to be fun.)

@DanielNoord
Copy link
Collaborator

I think we should utilize something like https://github.com/pypa/packaging.

@Pierre-Sassoulas
Copy link
Member

On the one hand, yeah definitely. On the other hand I'd really hate to add a new dependency. Maybe we can live with supporting only semver or semver adjacent.

@mbyrnepr2
Copy link
Member

mbyrnepr2 commented Apr 9, 2024

Maybe this doesn't address your concern Pierre but maybe it could be added as an extra dependency since users of Pylint itself wouldn't need it; whereas the plug-in dependency on Pylint would be pylint[packaging].
( Similar to Celery in the way that if you need a redis task queue you would install as celery[redis].)

@PeterHamfelt
Copy link
Member Author

On the one hand, yeah definitely. On the other hand I'd really hate to add a new dependency. Maybe we can live with supporting only semver or semver adjacent.

If pylint does not support lib-version dependency, what would be the recommended approach to handle this locally for pylint plugins?

@Pierre-Sassoulas
Copy link
Member

it could be added as an extra dependency

If we add it as an extra, we'll still have to deal with all the issue associated with a new dependency (for dill we had to track an issue for months, then Daniel had to create our own version of dill so we can release pylint in time for python 3.12). My main concern is the maintenance cost and the bloat. The complexity seems high even if we only take what we need and vendor it in pylint : https://github.com/pypa/packaging/blob/32deafe8668a2130a3366b98154914d188f3718e/src/packaging/version.py#L117-L146 / https://github.com/pypa/packaging/blob/32deafe8668a2130a3366b98154914d188f3718e/src/packaging/version.py#L200-L202

What do you mean by "lib-version dependency" @PeterHamfelt ?

@PeterHamfelt
Copy link
Member Author

I understand your concern @Pierre-Sassoulas.
As this issue mainly adheres to pylint-ml, I'm interested in implementing simple version checks for dependencies, such as checking the pandas version which follows the "x.x.x" format (https://pandas.pydata.org/docs/whatsnew/index.html). Initially, at least.

Since such checks are not currently supported by pylint, what would be the recommended generic approach for incorporating this type of functionality into pylint plugins?

@Pierre-Sassoulas
Copy link
Member

Pierre-Sassoulas commented Apr 13, 2024

I'm interested in implementing simple version checks for dependencies, such as checking the pandas version which follows the "x.x.x" format

That would be semver, It's the most common versioning scheme, if we do the feature we're going to support this whatever we choose :)

Let's vote on what to do. Please let me know if you think of another solution, I'll add it to the vote :

  • support semver only (i.e. : split on . cast to int, use the int tuple for ordering version, fail if anything unexpected happens), without any new dependency 👀
  • support semver and "close to semver" (i.e. : split on ., try to cast to int, use int tuple or string tuple or full string for ordering versions), without any new dependency 😄
  • vendor what pypa/packaging is doing to support everything but without dependency ❤️
  • Add pypa/packaging as a dependency and support everything 🚀
  • semver only in pylint core by default, use dependency injection on the plugin side 👍

@PeterHamfelt
Copy link
Member Author

"Add pypa/packaging as a dependency and support everything 🚀"
In my opinion, in the best of worlds, this would always be the best choice. However, is it complex to implement? Worth it?

@Pierre-Sassoulas
Copy link
Member

@jacobtylerwalls do you have an opinion on the subject ?

@jacobtylerwalls
Copy link
Member

I'm for adding pypa/packaging. It looks stable, small footprint, and already supports python 3.13. dill is a much more sensitive subject area. I don't think we're taking on a big risk to add it.

@Pierre-Sassoulas
Copy link
Member

Sorry for not stating that earlier, probably due to lack of sleep, but two reasons why I'm reluctant to add a dependency if it's possible to avoid it, are:

Dependencies are vastly more disruptive in pylint than in any other projects. pylint must be installed alongside a project's own dependencies to properly analyses a project. So it means pylint dependencies can conflict with the project's (i.e. we want pypa/packaging > 1 in a project requiring pypy/packaging < 1 and pylint then can't be installed at all. (For example, because of a clash with typing_extensions (*) pylint was unusable on pytorch's projects for multiple months).

Also, one first step to increase the security of a package is to set the dependencies to exact versions (so that a new corrupted version of pypa/packaging might not retro-actively compromise all the old version of pylint). It means we'd want to be able to require pypa/packaging == 1.0.1 exactly and pylint would then clash with not only pypy/packaging < 1 but also with pypy/packaging > 1.01. Pylint could get financial support from external organisms if we do this kind of thing, but in the past I said "not reasonable" (considering the point above, pylint would often become uninstallable), and they said " 💰 ❌. 👋 ! ". Something to consider if we want to send a signal that we're interested by sponsoring from big actors.

So 😄 / ❤️ would be my personal choice.. anyway if the votes stays what it is with the new info, I'll disagree and commit :)

(*) in this case pytorch was requiring a specific old version, but this could happens more often if more maintainers choose to set dependencies to specific versions because of point 2

@mbyrnepr2
Copy link
Member

I suppose we could go for😄 or👀 initially and if future plugins need the added functionality of the dependency, it could be considered then (we could change our mind)?

@jacobtylerwalls
Copy link
Member

Can we delegate all of this decision making to the plugin?

@Pierre-Sassoulas
Copy link
Member

Pierre-Sassoulas commented Apr 15, 2024

Nice 🧠 ! Yes, permitting to inject a comparison function on the plugin side and provide semver by default in pylint "core" would be a really nice design.

from pylint.checkers import BaseChecker

def _esoteric_lib_comparator(actual_version: str, min_version: str, max_version)-> bool: 
    return min_version < actual_version < max_version

class PandasSomeEsotericLibChecker(BaseChecker):
    msgs = {
        "E0042": (
            "Some pandas/numpy version specific message",
            "pandas-numpy-message",
            "Some pandas/numpy version specific message rational",
            {"minversion": (3, 5), "libversion": {
                "pandas" : {
                    "minversion": "1.0.1",
                    "maxversion": "2.0.1",
                    "onmissing": "fatal",
                    "versioncomparator": None
                 }, 
                "someesotericlib":  {
                    "minversion": "duck", 
                    "maxversion": "spoon",
                    "versioncomparator": _esoteric_lib_comparator
                },
            }},
        ),
    }

not sure about "versioncomparator", sounds more like a check to see if the installed lib is in range. so "versionvalidator" ?

@PeterHamfelt
Copy link
Member Author

Nice 🧠 ! Yes, permitting to inject a comparison function on the plugin side and provide semver by default in pylint "core" would be a really nice design.

from pylint.checkers import BaseChecker

def _esoteric_lib_comparator(actual_version: str, min_version: str, max_version)-> bool: 
    return min_version < actual_version < max_version

class PandasSomeEsotericLibChecker(BaseChecker):
    msgs = {
        "E0042": (
            "Some pandas/numpy version specific message",
            "pandas-numpy-message",
            "Some pandas/numpy version specific message rational",
            {"minversion": (3, 5), "libversion": {
                "pandas" : {
                    "minversion": "1.0.1",
                    "maxversion": "2.0.1",
                    "onmissing": "fatal",
                    "versioncomparator": None
                 }, 
                "someesotericlib":  {
                    "minversion": "duck", 
                    "maxversion": "spoon",
                    "versioncomparator": _esoteric_lib_comparator
                },
            }},
        ),
    }

not sure about "versioncomparator", sounds more like a check to see if the installed lib is in range. so "versionvalidator" ?

This looks good to me 😃🐼

@PeterHamfelt
Copy link
Member Author

Do we have a decision?
Starting with a simpler version handling (😄 / ❤️) and in the future aim to add pypa/packaging as a dependency?
It would be great to have this in place when I start to add more and more checkers into pylint-ml 😃

@DanielNoord
Copy link
Collaborator

Can versioncomparator becomes something like Callable[[], bool] | None where the callable should just do everything necessary to do the version comparison in the callable?

@Pierre-Sassoulas
Copy link
Member

I'm not sure what you mean by that, could you upgrade the _esoteric_lib_comparator example for it to be a Callable[[], bool] | None ? I suppose we must provide target min/max and actual version to the injected function (i.e. def _esoteric_lib_comparator(actual_version: str, min_version: str, max_version)-> bool: ?

@DanielNoord
Copy link
Collaborator

Just make:

def _esoteric_lib_comparator() -> bool: ...

If we need to pass the version we need to have a way to get the version of a package for which we might do importlib or packaging magic. Let's just also offload that to the plugin?
Just make libversion an object that we can iterate over and call whatever is under versioncomparator?

@Pierre-Sassoulas
Copy link
Member

Ha, right, I understand. Are there a lot of other ways to recover the version ? Maybe we could provide another injection point for that if that's the case:

import antigravity
from pylint.checkers import BaseChecker

def _esoteric_lib_version_comparator(actual_version: str, min_version: str, max_version)-> bool: 
    return min_version < actual_version < max_version

def _esoteric_lib_version_recoverer() -> str: 
    return antigravity.push("elephant")

class PandasSomeEsotericLibChecker(BaseChecker):
               #...
                "someesotericlib":  {
                    "minversion": "duck", 
                    "maxversion": "spoon",
                    "versioncomparator": _esoteric_lib_version_comparator,
                    "versionrecoverer": _esoteric_lib_version_recoverer,
                },

That way you don't have to redefine the version recovering if you want to handle something else than semver. We'd have to implement the default way to do it in pylint so versionrecoverer is optional and convenient to use.

@jacobtylerwalls
Copy link
Member

If we need to pass the version we need to have a way to get the version of a package for which we might do importlib or packaging magic. Let's just also offload that to the plugin?

I think this is too paranoid. :D importlib is in the stdlib, and Mark already showed a snippet I trust to get the version. Let's keep the API for this as small as we can.

@DanielNoord
Copy link
Collaborator

I mean, the docs are full of warnings and caveats. I would personally prefer delegating all of this to plugins by making the API have no arguments at all. We can always add an argument later if plugins can't get the data they need themselves. Removing arguments later is much more of a hassle.

@jacobtylerwalls
Copy link
Member

Sorry, I'm not grokking the caveats. Do you have one in that doc you might point me to?

@DanielNoord
Copy link
Collaborator

"Specifically, it works with distributions with discoverable dist-info or egg-info directories, and metadata defined by the Core metadata specifications."
"Important

These are not necessarily equivalent to or correspond 1:1 with the top-level import package names that can be imported inside Python code."

I'm mostly worried about an influx of issues about "my obscure poorly packaged python library doesn't work with pylint" etc. I don't really see why we need to pass so much to the plugins why they could call importlib themselves as well. That also means that the plugin for a specific package is also responsible for determining whether that package is in accordance with all the packaging standards.

@jacobtylerwalls
Copy link
Member

Fair, but I think we can document that this interface is only for checkers to declare dependencies on pip-installable, importlib.metadata.version()'able packages.

I don't really see why we need to pass so much to the plugins why they could call importlib themselves as well. That also means that the plugin for a specific package is also responsible for determining whether that package is in accordance with all the packaging standards.

This was my first thought also. @Pierre-Sassoulas @PeterHamfelt any thoughts on whether we truly need anything in core pylint for this? Is it because we want pylint to throw an error when you load a plugin requiring a dependency you don't have? That might be nice to handle in core.

@Pierre-Sassoulas
Copy link
Member

I think we should provide a way for plugin maintainer to not have to re-implement the importlib.metadata's way of recovering the version. It's simpler for plugin creation and will reduce the duplication of code. Having a best effort default version recoverer in pylint core is a must have imo. If we really want to permit to recover a version another custom way, it should not complexify the general case's design and force plugin maintainer to consider version comparison if they only want to change version recovery. (i.e. two dependency injections "versioncomparator" with default semver / "versionrecoverer" with default importlib.metadata ; not making an API like "isvalidversion" that must handle both).

That being said, I don't know how many pylint plugins will be made for packages that do not even bother with proper metadata. My guess is that bad packaging is going to affect small packages, and small packages do not have specific pylint plugins. The plugins I know target pytest, numpy, pandas, tensorflow... huge (normalized) libs. So probably very few problematic use cases ? And we can implements versionrecoverer when someone asks for it, until then; "YAGNI".

@jacobtylerwalls
Copy link
Member

Oh yeah we don't want the plugin maintainer reimplementing nothing. I just had a bigger YAGNI question about this whole thing. Why can't the plugin just do:

self.pandas_version = version("pandas")

def my_check():
    if self.pandas_version < whatever:
        return  # or warnings.warn, or whatever

None of that requires changes in core pylint. Why do we need changes in core?

@Pierre-Sassoulas
Copy link
Member

Well, it would mean that every plugin needs to re-implement that check in each of their plugin, plus proper user interface to warn that the check is not available in this version of a lib or that the lib is not installed. We're providing a similar minversion / maxversion option for python interpreters when every plugin could simply guard their checks with if sys.version_info ....

@jacobtylerwalls
Copy link
Member

plus proper user interface to warn that the check is not available in this version of a lib or that the lib is not installed.

I guess that's what I'm getting at. If this is the goal, let's start discussing the specifics of that and make sure we have a volunteer with a draft PR. If we don't have one, then I'd say let's consider wontfixing. Just my 2¢

@PeterHamfelt
Copy link
Member Author

Sorry for the delayed response. I just read through the thread and thanks for discussing the issue!

Well, it would mean that every plugin needs to re-implement that check in each of their plugin, plus proper user interface to warn that the check is not available in this version of a lib or that the lib is not installed. We're providing a similar minversion / maxversion option for python interpreters when every plugin could simply guard their checks with if sys.version_info ....

Exactly, this is what I was asking for. Will it be up to each pylint-plugin to re-implement this solution or can we utilize a standard implemented in core?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component Needs specification 🔐 Accepted as a potential improvement, and needs to specify edge cases, message names, etc.
Projects
None yet
Development

No branches or pull requests

5 participants