Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BK: Add tox.ini with typing,lint (codespell); add workflows for codespell; update pyproject.toml #45

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

yarikoptic
Copy link
Member

To make template better tested and IMHO we should start aiming for type annotation of code in (new) extensions, although datalad itself is not yet annotated.

@jwodder - is it feasible to aim for downstream libraries (like datalad extensions) to be type annotated although only small portion of datalad itself is type annotated ATM?
Could you help finishing up this PR (type annotations for that small datalad_helloworld)?

@mih I also quite like pre-commit with black now as used in dandi-cli and other projects. May be we could/should introduce that to extensions to encourage code/imports consistency etc?

@jwodder
Copy link
Member

jwodder commented Sep 30, 2022

@yarikoptic

is it feasible to aim for downstream libraries (like datalad extensions) to be type annotated although only small portion of datalad itself is type annotated ATM?

The main problem I see is that, as long as datalad lacks a py.typed file, when mypy type-checks a library that uses datalad, any types & values from datalad will be treated as Any, limiting the amount of checking mypy can do. I'm not sure what would happen if datalad gained a py.typed file before it was completely type-annotated, but I doubt it'd be pretty.

pyproject.toml Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Sep 30, 2022

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.34% 🎉

Comparison is base (fac0fda) 92.68% compared to head (cd2a860) 93.02%.
Report is 19 commits behind head on main.

❗ Current head cd2a860 differs from pull request most recent head 60bfa81. Consider uploading reports for the commit 60bfa81 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #45      +/-   ##
==========================================
+ Coverage   92.68%   93.02%   +0.34%     
==========================================
  Files           5        5              
  Lines          41       43       +2     
==========================================
+ Hits           38       40       +2     
  Misses          3        3              
Files Changed Coverage Δ
datalad_helloworld/hello_cmd.py 88.88% <100.00%> (+0.88%) ⬆️
datalad_helloworld/tests/test_register.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.

📢 Have feedback on the report? Share it here.

@yarikoptic yarikoptic changed the title BK: Add tox.ini with typing,lint (codespell); add workflows for them; update pyproject.toml BK: Add tox.ini with typing,lint (codespell); add workflows for codespell; update pyproject.toml Jan 30, 2023
@yarikoptic
Copy link
Member Author

remove typing workflow -- could not figure out how to ignore _version.py . Left possibility to run it though via typing since IMHO we should strive to do typing.

@yarikoptic
Copy link
Member Author

smth likely in the _datalad_buildsupport changes I added which now makes

subcommand --help doc output not building
make: Entering directory '/home/runner/work/datalad-extension-template/datalad-extension-template/docs'
python -m sphinx -b html -d build/doctrees  -W source build/html
Running Sphinx v6.1.3
usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: setup.py --help [cmd1 cmd2 ...]
   or: setup.py --help-commands
   or: setup.py cmd --help

error: option --cmdsuite not recognized
making output directory... done
[autosummary] generating autosummary for: index.rst
[autosummary] generating autosummary for: /home/runner/work/datalad-extension-template/datalad-extension-template/docs/source/generated/datalad.api.hello_cmd.rst
loading intersphinx inventory from https://docs.python.org/objects.inv...
intersphinx inventory has moved: https://docs.python.org/objects.inv -> https://docs.python.org/3/objects.inv
building [mo]: targets for 0 po files that are out of date
writing output... 
building [html]: targets for 1 source files that are out of date
updating environment: [new config] 2 added, 0 changed, 0 removed
reading sources... [ 50%] generated/datalad.api.hello_cmd
reading sources... [100%] index


Warning, treated as error:
/home/runner/work/datalad-extension-template/datalad-extension-template/docs/source/index.rst:[24](https://github.com/datalad/datalad-extension-template/actions/runs/4040767962/jobs/6946719935#step:6:25):toctree contains reference to nonexisting document 'generated/man/datalad-hello-cmd'
make: *** [Makefile:55: html] Error 2
make: Leaving directory '/home/runner/work/datalad-extension-template/datalad-extension-template/docs'
Error: Process completed with exit code 2.

@yarikoptic
Copy link
Member Author

ah -- filed #51, will filter this branch to not bother about build support for now either.

@yarikoptic yarikoptic marked this pull request as ready for review January 31, 2023 16:29
@matrss
Copy link
Contributor

matrss commented Feb 3, 2023

remove typing workflow -- could not figure out how to ignore _version.py .

I ran into this issue as well. Just excluding _version.py is not enough, since the module is imported from other files which are type checked, which in turn means that the _version module will be checked. In addition to the exclude I needed to add an override which allows untyped calls and definitions in this module (and others, like _datalad_buildsupport.*), see:
https://github.com/matrss/datalad-getexec/blob/4847a6937f81f461ec4972aff963a7849f667f8b/pyproject.toml#L43-L50

@jwodder
Copy link
Member

jwodder commented Feb 3, 2023

The way I handled ignoring _version in tinuous was to add the following to the mypy config:

[mypy]
exclude = _version\.py

[mypy-tinuous._version]
follow_imports = skip

@matrss
Copy link
Contributor

matrss commented Feb 3, 2023

Yes, that should also work. I wasn't aware that you can specify follow_imports per module instead of just globally.

@yarikoptic
Copy link
Member Author

ha -- that last setting did the trick seems to me! thank you @jwodder. Pushing now back with typing CI enabled.

@yarikoptic
Copy link
Member Author

if no objections are voiced, I will merge it in 2-3 days. IMHO it would improve extensions

Copy link
Member

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I can say that I like the intent.

What I have issues with is the increasing complexity of the template that is furthered here. For those without expert knowledge it is unclear what is important or critical, what is a not-a-datalad-extension-without-it and what is more of a here-is-something-running-that-you-could-keep-if-liked.

All these little things become annoying quite quickly. If I need to maintain a spell checker configuration now, because it simply does not know something that I nevertheless need to write, the convenience is soured. I will leave brief comments on the diff for things that came to my mind in this regard.

@@ -1,7 +1,7 @@
from datalad.tests.utils_pytest import assert_result_count


def test_register():
def test_register() -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why this was done, but if it turns out to be required now to type-annotate any test function with -> None, I'd say this is counterproductive. There is very little (if any) value in this, and there are enough reasons already to not write tests. We should not make it even more complicated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be noted that these two are not equivalent in regards to typing: no annotation means a return type of "Any". Since test_* functions in pytest should always return None this might seem superfluous, but if the test code is to be type-checked then these annotations are necessary. I also do not see how this would be another barrier to writing tests; if someone forgot to add the return type they will still write a test first and be nagged by the typing workflow to add a return type later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. However, my comment is focused on whether extension developers must write type annotated tests. As you point out, all of them would need this. But which practical problem is solved with this additional investment. I am specifically talking about test functions, not type-annotation in general.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO type annotation in the tests would help with what type annotation helps in general - ensuring that we have expected types/know what types any particular code is working with. To some degree tests type annotation increases the benefit from those tests. So the 'practical problem' being solved is the same as solved by tests themselves -- ensuring that code is working as expected (returns expected types etc).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat recent versions of pytest emit a warning on return of something other than None from test functions. In the future this will become an error: pytest-dev/pytest#7337. In that regard, type annotations seem a bit redundant, other than making this implicit assumption explicit in the code (which, honestly, is one of the main reasons to type annotate anyway).

There are some test functions in the datalad test suite on maint which still yield values, e.g.: https://github.com/datalad/datalad/blob/6c2a573e9cf75f62b4aa8b03fdd346dca8d29820/datalad/tests/test_tests_utils.py#L134.
These instances could have been found using type annotations and mypy (I think they are fixed on master anyway).

I don't really see a reason not to be consistent and just type annotate everything though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha -- cool find. Just FTR: we are ok/safe -- that file is for testing old nose based utils etc, that test is "nose style parametric test", and indeed the file was altogether already removed in master. pytest counterpart is https://github.com/datalad/datalad/blob/HEAD/datalad/tests/test_tests_utils_pytest.py .

These instances could have been found using type annotations and mypy (I think they are fixed on master anyway).

how to do that? i.e. that there is some test_ function which returns something not None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to do that? i.e. that there is some test_ function which returns something not None

If the yielding function would have been annotated with a return type of None, then mypy would emit this error:

datalad/tests/test_tests_utils.py:133: error: The return type of a generator function should be "Generator" or one of its supertypes  [misc]

that is, the annotated return type and the real return type do not match.

@@ -0,0 +1,26 @@
name: Linters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to communicate what this workflow does, and where to look for answers when it fails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name: Linters
# This workflow runs tox -e lint to trigger checks such as codespell to help ensuring lint free code and docs
name: Linters

?

# As type hinting is not yet fully and consistently done through out the project only
# some (randomly) selected files (listed in tox.ini) will be checked. Feel welcome
# to add extra files to the list there. See https://github.com/datalad/datalad/issues/6884
# for the overall goal/progress towards type hints coverage in DataLad.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear to me how type-annotation progress in datalad is a concern for extension authors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwodder can correct me, but my limited understanding is that without type annotation of datalad interfaces, we simply cannot achieve complete type annotation/checking within the extension which would use those interfaces.

pyproject.toml Outdated Show resolved Hide resolved
# https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
# datalad-specific setting didn't work so we set for all imports.
# TODO: remove whenever datalad is typed.
# [mypy-datalad.*]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unfamiliar with this setup, but it looks as if the "need to ignore all datalad imports" might be commented out below? It is not obvious to me how this is done.

@@ -0,0 +1,75 @@
[tox]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that this file would need substantial tailoring when adopted for a specific extension. I would appreciate some comments on what is relevant and what not. Right now this communicates to extension developers that they will have to learn about tox first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some in now a4d26c4 . Check out if that clarifies it a bit.

@jsheunis
Copy link
Member

I have zero experience with typing, and minor with linting. But FWIW I agree with the sentiment expressed by @mih, especially the ambiguity of

what is a not-a-datalad-extension-without-it and what is more of a here-is-something-running-the-you-could-keep-if-liked.

I am already quite bad at keeping everything green in the extension that I maintain, and having another red cross will be annoying. I don't mean to say that this PR is not useful, rather that the version that gets merged would ideally be the least complex and least restrictive variation from the perspective of extension developers.

@mslw
Copy link
Contributor

mslw commented Feb 17, 2023

On one hand, I like how the extension template encourages extension maintainers to use continuous integration by providing appveyor and github actions configuration. And I like how manpage generation "just works" thanks to the boilerplate code for building docs. On the other hand I also feel that it's helpful to keep the extension template relatively bare-bones, as there is a bit of complexity to be grasped already. Perhaps linting tools are something that can be left up to extension maintainers rather than included in the template (that's not to say they are not useful - I began using black and flake8 in recently started redcap extension, although just through manual triggers python-black-buffer, not precommits or anything).

In summary, I'm on the fence about this one, so just sharing my thoughts.

P.s. I guess left between the lines of my comment and discussion above is who the "extension maintainers" or "authors" really are - I think that the extension template targets both maintainers of extensions under DataLad organization, as well as external DataLad users who want to fine-tune their workflows. In the first case, the template is more of a normative document where it makes sense to establish conventions; in the latter, it's more important to reduce complexity and make entry more accessible.

@bpoldrack
Copy link
Member

In the first case, the template is more of a normative document where it makes sense to establish conventions; in the latter, it's more important to reduce complexity and make entry more accessible.

Agree. I think the problem here is to mix those two roles of this repo. And I think the template was never meant to become that normative source for the datalad-organization universe. Hence, I lean towards @mih's point of view and to not continue to make this the most expensive hello world imaginable.

Does raise the question, though: Do we have or want a place for such things, where our other repos (in fact not only extensions) could pull such workflows and their updates from?

@matrss
Copy link
Contributor

matrss commented Feb 17, 2023

I think that the extension template targets both maintainers of extensions under DataLad organization, as well as external DataLad users who want to fine-tune their workflows. In the first case, the template is more of a normative document where it makes sense to establish conventions; in the latter, it's more important to reduce complexity and make entry more accessible.

This might get somewhat off-topic, but since I have recently build a small extension and am not part of the DataLad Team I want to give some input from my point of view re: complexity.

There are some hurdles when setting up a new extension which go beyond just "instantiating" the template:

  • obviously, updating metadata
  • setting up readthedocs
  • setting up appveyor
  • codeclimate?

I have setup readthedocs, but skipped the last two since I have never used appveyor before and didn't yet find a good reason to investigate further. Instead I have just setup a tox test suite in a GitHub action. Regarding codeclimate, I took a look at their website and couldn't see what it would provide to me, so I didn't bother either.

With that in mind, I think it would be a good idea to make the template work out-of-the-box as much as possible, which in practice would mean to reduce external services as much as possible. For example, the docbuild workflow could publish to a GitHub Page and make readthedocs redundant. The appveyor pipeline could be converted to a GitHub action if feasible and then require no further setup. Some more automated way of updating the metadata could be nice, but there is #42 for that.

Considering this, I do not see much complexity added here. I think it is useful to have some CI workflows that check best-practices via linting and type-checking, as long as they do not introduce even more setup steps. Maybe these could be self-contained GitHub actions instead of introducing another tool (tox) though.

@mih
Copy link
Member

mih commented Feb 17, 2023

@matrss Thank you for expressing your view. I think your comments are also point out the discrepancy between the normative and minimal approach that @mslw

…/fixed copy of buildsupport here

=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
As @mih mentioned -- downgrading is rarely a good thing. I guess it was just a blind copy-paste
and then packaging was removed anyways... so now reverting back so diff must be minimized
@yarikoptic
Copy link
Member Author

Thank you @matrss @mih @mslw @jsheunis ! @matrss: Indeed this PR does not introduce any need for extra steps on external services to enable any new added here QA feature, so I expect it only to robustify extensions developed with this set of "better practices", even though indeed it might be an additional red cross to address -- usually much easier if not put into the closet of "will do later".
Indeed there is a balance to strike between being minimal normative "howto/what an extension is" and being a pragmatic template to achieve the best result with minimal effort. But I think for the minimal normative view - we have stepped away from it long ago, and this template became more of the latter. But as a result of current shape of it I see lots of duplication (which I hate) of effort (as e.g. @mslw mentioned - enabling flake8 etc) since good practices are desired by everyone really, but then done different ways since done separately, and that makes it inefficient, and harder to fix/contribute across extensions. So it then boils down to us deciding on some common best practices and how to code for them.
IMHO type setting, guaranteed absence of typos, and consistent testing is something to strive for, and that is what this PR proposes. If we do not want any of those features -- let me know which and I remove its support in this PR.

RE @matrss

Maybe these could be self-contained GitHub actions instead of introducing another tool (tox) though.

Well, they are (lint and typing), they just use tox to streamline dependencies installation and possible invocation centralization, so could be done exactly the same way in CI and locally. Having said that, I do feel odd in me hiding codespell within tox's lint, and often I just add codespell workflow which uses codespell action. I can do it that other way too if we decide to proceed here at all.

@yarikoptic
Copy link
Member Author

re @mih

If I need to maintain a spell checker configuration now, because it simply does not know something that I nevertheless need to write, the convenience is soured.

In my experience with codespell through the past weeks where I submitted PRs to a number of projects, in some touching over a hundred of files (see e.g CDLUC3/dmptool#428 and cvmfs/cvmfs#3183) I am genuinely surprised on how well codespell has performed (kudos to many contributors who developed it)! With added config file it is trivial to ignore that rare to hit "false positive" and hopefully soon inline annotation would make it more specific. To that extent -- we had codespell CI enabled in datalad core since Sun Apr 25 18:12:03 2021 , and I don't remember much complaints about it severing the convenience. But I think we did not hit much of typos recently, did we?

@yarikoptic
Copy link
Member Author

ok, let me ask

  • do we want type annotations and checking for their correctness in the extensions?
  • do we want to tolerate typos in our code/docs?

I think knowing answers would help me since I kinda lost track on what you are fighting against here dear @datalad/developers . If not in the template -- everyone would need eventually to introduce proposed checks one way or another into their code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants