Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for R / renv as a language #1799

Merged
merged 1 commit into from Mar 7, 2021
Merged

Conversation

lorenzwalthert
Copy link
Contributor

@lorenzwalthert lorenzwalthert commented Feb 15, 2021

This PR aims to add 2nd level support for R to pre-commit via the virtual environment manager renv, motivated by isolated dependency management and (hopefully) compatibility with pre-commit.ci for 2nd level languages as outlined in lorenzwalthert/precommit#215. The implementation follows the recently contributed support for conda (#1232) for source code and tests as well as the implementation of the python virtualenv language.

Further considerations:

  • Initial tests only run on Linux. I have to add other platforms as we go if this PR is generally approved.
  • I'd appreciate feedback to the questions in the source code comments, in particular on whether or not to remove in_env() as it seems not needed or whether it should be kept for consistency with other languages. Also, let me know if you want certain comments that explain stuff to be removed. Was not sure how much.
  • If this gets through, we probably should add documentation for pre-commit.com.
  • We might want to support file types for R (https://pre-commit.com/#filtering-files-with-types).

Before this PR, I was not familiar with the pre-commit code base nor a very avid python user, hence sorry if there are obvious flaws in this PR.

# Alternatively, we require the renv
# package to be installed already, then we can
# omit that.
Rscript -e 'dir.create(Sys.getenv("R_LIBS_USER"), recursive = TRUE)'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work if a non-root user has never installed any R package (quite unlikely at that stage but...). Maybe catch the installation error and fail (more) informatively?

# expose entry to the user for maximal control
return helpers.run_xargs(
hook, hook.cmd, file_args,
color=color, cwd=hook.prefix.prefix_dir,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current expectation for hook authors is that hooks will be executed from the root of the repository, so cwd=... won't be workable here

I think that necessarily changes how the environment setup is working now -- the ~normalish way of setting this up is by pointing some set of environment variables at the ENVIRONMENT_DIR (and subdirs)

note also that things must be installed entirely inside the ENVIRONMENT_DIR or they potentially won't be persisted by caching

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current expectation for hook authors is that hooks will be executed from the root of the repository, so cwd=... won't be workable here.

of course 🤦.

the ~normalish way of setting this up is by pointing some set of environment variables at the ENVIRONMENT_DIR (and subdirs).

That's what I saw. Problem is (and that's a peculiarity of R) the most operations happen in an R session, not the terminal session. renv can only be activated properly from inside R. For that reason, my current suggestion is to place a renv::activate() call in a file, then point the R_PROFILE_USER environment variable to that file so the code will run at R startup. The idea is now to allow two versions, inline code expressions and file in entry::

Rscript -e expr # like Rscript -e "1+1" 
Rscript path/to/file 

An inject the options --no-save, --no-restore, --no-site-file and --no-environ after Rscript (to basically have --vanilla start, but run this one renv::activate() statement) and expand the path/to/file similar to language: script since we have to run the hooks in the working directory of the repo. The files and the hook args are then passed after the above, i.e.

Rscript --no-save --no-restore --no-site-file --no-environ /abs/path/ [files] [hook args]
Rscript --no-save --no-restore --no-site-file --no-environ -e expr [files] [hook args]

note also that things must be installed entirely inside the ENVIRONMENT_DIR or they potentially won't be persisted by caching.

This should be solved now. Just placing the whole env in a subdirectory now.

@@ -0,0 +1,18 @@
Package: gli.clu
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DESCRIPTION file is basically the equivalent of setup.py, renv.lock is the equivalent of the conda environment.yml. Not sure if one or both should be listed in LOCAL_RESOURCES as this was introduced here:
https://github.com/pre-commit/pre-commit/pull/1232/files#diff-a52ef0cf627ce783529c259e63d4ea82b77520936a2455cbf4f90ad2fb36d27dR176

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah LOCAL_RESOURCES is to allow an additional_dependencies-only repo: local hook -- it seems like additional_dependencies is supported for R so we should probably make an empty one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the explanation. I added a template for renv.lock in cf88089. However, there is an R version hard-coded in it which will get outdated. As far as I can tell from a quick experiment, the docs and issues in https://github.com/rstudio/renv, there is no indication that the R version recorded in the lock file will be used at renv::restore(), which we use in renv.install_environment(). Even removing seems fine (I left it there anyways). But that might still be a bit brittle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should we add a unit test for a local hook with additional dependencies to check if things work as expected? I don't think that's currently covered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it would probably be good to have one unit test demoing a local hook via additional_dependencies

@@ -279,14 +280,123 @@ def test_node_hook_with_npm_userconfig_set(tempdir_factory, store, tmpdir):
test_run_a_node_hook(tempdir_factory, store)


def _test_renv_parsing(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know if these are too many tests (because it slows down testing). I think we can get 100% coverage with fewer tests too (but then not cover as many combinations).

Copy link
Member

@asottile asottile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks great, really happy you took this on and I'm excited to add this support!

@@ -16,6 +16,7 @@
from pre_commit.languages import perl
from pre_commit.languages import pygrep
from pre_commit.languages import python
from pre_commit.languages import renv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should maybe just be called r?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought about that also. The thing is that language: conda can both host R and Python hooks and renv can manage python dependencies, so I thought I go for a more specific name. On the other hand, language: python uses venv (and is still called python, not venv). Since working with virtual environments is still not as common in R compared to Python and not every R user knows renv, using r as a name might also make it easier for people to find what they are looking for. So I am happy to switch to r, please let me know what you prefer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think r makes the most sense, conda is kinda the special case right now and I think if people google "pre-commit r" they'll be much more likely to hit this than if it were named renv

we can always add other names if new ways of installing isolated r environments happen in the future and/or shuffle them around (there used to be a python_venv language which was merged into python as virtualenv makes venv-style environments anyway nowadays)

with in_env(hook.prefix, hook.language_version):

return helpers.run_xargs(
hook, cmd_from_hook(hook), file_args, color=color,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually prefer to read a file from the bottom upwards -- also helper functions that live only in this module should be underscore-prefixed to make it more clear what is/isn't part of the api (the api specification is in the languages/all.py file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can adapt the order. I can also prefix the helper functions not part of the API. But why is in_env and get_env_patch not prefixed in other modules although not listed in languages/all.py?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmm yeah good point, those should probably be underscore-prefixed everywhere -- probably just legacy and copy paste 😅

# TODO what if expr is x;2? not captured by shlex.split()
"""
if entry[0] != 'Rscript':
raise SyntaxError('entry must start with `Rscript`.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SyntaxError is for python syntax being incorrect -- these should be ValueError probably

' '.join([
'The only valid syntax is `Rscript -e {expr}`',
'or `Rscript path/to/hook/script`',
]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be written as an implicitly joined string:

raise TypeError(
    'first line here '
    'second line here'
)

]),
)

None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line does nothing and can be removed, note that python implicitly returns None for "void" functions and doesn't support implicit-return-without-return-keyword

@@ -0,0 +1,18 @@
Package: gli.clu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah LOCAL_RESOURCES is to allow an additional_dependencies-only repo: local hook -- it seems like additional_dependencies is supported for R so we should probably make an empty one?

@@ -19,6 +19,7 @@
from pre_commit.languages import helpers
from pre_commit.languages import node
from pre_commit.languages import python
from pre_commit.languages import renv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tests specifically for the renv helper functions should live in tests/languages/renv_test.py (or r_test.py after renaming)



def test_renv_parsing_expr_non_Rscirpt(tempdir_factory, store):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(throughout the review, nbd I can fix this myself later -- I don't put blank lines at the beginning or the ending of blocks)


def test_renv_parsing_expr_non_Rscirpt(tempdir_factory, store):

with pytest.raises(SyntaxError, match='entry must start with'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like the match= arg and prefer to test the exact error message, you can follow this pattern:

with pytest.raises(TypeHere) as excinfo:
    ...
msg, = excinfo.value.args
assert msg == ...

def test_run_a_ruby_hook(tempdir_factory, store):
_test_hook_repo(
tempdir_factory, store, 'ruby_hooks_repo',
'ruby_hook', [os.devnull], b'Hello world from a ruby hook\n',
)


@xfailif_windows # pragma: win32 no cover
@ xfailif_windows # pragma: win32 no cover
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decorators shouldn't have spaces after them -- not sure what happened here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wondered, but one of the pre-commit hook changed it, maybe autopep? I even rebased my PR master and tried to get rid of this. Can revert but then not sure if the pre-commit.ci passes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah my guess is there was a syntax error earlier in the file and then autopep8 freaked -- I've been meaning to fix that in autopep8 but it's tricky (basically sometimes it will try and rewrite a file even if it has a syntax error 😱)

Copy link
Member

@asottile asottile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asottile asottile merged commit 19bffaa into pre-commit:master Mar 7, 2021
@asottile
Copy link
Member

asottile commented Mar 7, 2021

added a stub for this in the docs -- pre-commit/pre-commit.com#473

This was referenced Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants