Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically detect default branch name #179

Merged
merged 3 commits into from Jun 8, 2021

Conversation

sean-morris
Copy link
Contributor

  • Added requirements.txt file
  • Added venv/ folder to .gitignore
  • The application now resolves for the default branch of a repository
    if the branch name is not given. It will also return an error if the
    branch name given does not exist in the repository.

Resolves : #159

- Added requirements.txt file
- Added venv/ folder to .gitignore
- The application now resolves for the default branch of a repository
if the branch name is not given. It will also return an error if the
branch name given does not exist in the repository.

Resolves : jupyterhub#159
@welcome
Copy link

welcome bot commented May 28, 2021

Thanks for submitting your first pull request! You are awesome! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

Copy link
Contributor

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I left some comments around code style, and will try to find some time to test locally in the next few days.

For tests, can you add a few more tests that check for branch names being other than master ('main', 'random-something', etc)? Both for original cloning, and for subsequent pulls?

Thanks!

This checks to make sure the branch we are told to access
exists in the repo
"""
p_heads = subprocess.run(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should handle possible errors here. We can either pass check=True (or use subprocess.check_output), which will raise an exception if the process exits with a non zero code, or check the status code output directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This throws a subprocess.CalledProcessError; I am going to log the exception and then raise a ValueError that I imagine should be passed to the UI in some interprettable sentence by the user. (I was struggling with this peice; it is unclear to me if the handling of errors on the UI side is working. I have a couple examples where it did not seem to be which I thought should be a different issue/PR if indeed there is problem and I am not screwing something up!)

capture_output=True,
text=True,
)
p_tags = subprocess.run(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the p_ prefix stand for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p = process = subprocess (!) -- I got rid of the "p_"

This will resolve the default branch of the repo in
the case where the branch given does not exist
"""
p = subprocess.run(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about error handling here, and about the p_ prefix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

requirements.txt Outdated
@@ -0,0 +1,3 @@
traitlets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we specify notebook in install_requires in setup.py, which pulls in traitlets as a transitive dependency, so it shouldn't be necessary here. Since pytest isn't required when running the code, it is often listed in a dev-requirements.txt instead, and we add installing that to docs/contributing.md. I think the -e . is often not set up in dev-requirements.txt either, and it's explicitly just specified in docs/contributing.md - not sure exactly why, but that is the general consensus I've seen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some packages behave differently when they're installed in editable mode, so it's better not to force an installation method so a developer has the freedom to test the package installed in production mode (particularly important for CI), or to install in editable mode for development and accept there may be differences.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am with you here. good deal

@@ -70,11 +70,60 @@ def __init__(self, git_url, branch_name, repo_dir, **kwargs):
assert git_url and branch_name

self.git_url = git_url
self.branch_name = branch_name

if branch_name == "None":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if branch_name == "None":
if branch_name is None:

Since None is a special object in Python, rather than a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't the NoneType type that gets pased back through the handlers. I drove myself in a couple circles trying to figure out why "is None" wasn't working and realized the handlers were passing Strings. Maybe I am missing something here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dug around some more. SyncHandler services the api endpoint, which is called by the JS in static/index.js. I see the following request being made by JS: http://localhost:8888/git-pull/api?repo=https://github.com/yuvipanda/requirements&branch=None&targetpath=requirements. See that branch is explicitly set to None - so it's something in the JS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JS actually gets this from Python, in

utils.get_body_data('branch'),
.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here we put it in the HTML, in Python:

data-branch="{{ branch | urlencode }}"
.

You can see that depth is treated as optional, and only put in there if it is not None. However, branch is not considered as optional, and put in there all the time! So we need to add a similar conditional to branch, and only add it there if the branch argument is present.

var syncUrlParams = {
will also need to be amended to make branch operate in a similar fashion to depth, and
branch = self.get_argument('branch')
as well. There might be a couple other places that assume branch is always present, we'll have to track those down too.

@yuvipanda
Copy link
Contributor

I just tested these two URLs, one with 'master' and one with 'main':

  1. http://localhost:8888/git-sync?repo=https://github.com/jupyterhub/jupyterhub -> main
  2. http://localhost:8888/git-sync?repo=https://github.com/yuvipanda/requirements -> master

Both initial clone and further pulls seem to work! \o/

@yuvipanda
Copy link
Contributor

the test failure is a small issue with python code style I think

@yuvipanda
Copy link
Contributor

@sean-morris In #181, I've added a dev-requirements.txt, documented it, and documented how to run tests / linting / docs locally. I hope that is useful. We should remove the dev-requirements.txt from here to simplify merging though.

@yuvipanda
Copy link
Contributor

@sean-morris ah, so that PR I opened also has the same failing test - https://github.com/jupyterhub/nbgitpuller/runs/2728756643?check_suite_focus=true. Not sure what's happening there.

@sean-morris
Copy link
Contributor Author

sean-morris commented Jun 2, 2021

@sean-morris ah, so that PR I opened also has the same failing test - https://github.com/jupyterhub/nbgitpuller/runs/2728756643?check_suite_focus=true. Not sure what's happening there.

@yuvipanda
My theory is that the url, with the url query string containing branch=master, is returning a 404 error -- but I could be wrong here. I have not seen anything online that indicates others struggling with a similar issue.

- removed requirements.txt - misguided moment!
- added exception handling to subprocess.run
- added tests for exception handling
- added tests for checking that branch does exist
Copy link
Contributor

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dug into why we're getting a string value of None for branch, and left some comments on what I found + a possible fix. I think we should sort that out before merging.

@@ -70,11 +70,60 @@ def __init__(self, git_url, branch_name, repo_dir, **kwargs):
assert git_url and branch_name

self.git_url = git_url
self.branch_name = branch_name

if branch_name == "None":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dug around some more. SyncHandler services the api endpoint, which is called by the JS in static/index.js. I see the following request being made by JS: http://localhost:8888/git-pull/api?repo=https://github.com/yuvipanda/requirements&branch=None&targetpath=requirements. See that branch is explicitly set to None - so it's something in the JS.

@@ -70,11 +70,60 @@ def __init__(self, git_url, branch_name, repo_dir, **kwargs):
assert git_url and branch_name

self.git_url = git_url
self.branch_name = branch_name

if branch_name == "None":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JS actually gets this from Python, in

utils.get_body_data('branch'),
.

@@ -70,11 +70,60 @@ def __init__(self, git_url, branch_name, repo_dir, **kwargs):
assert git_url and branch_name

self.git_url = git_url
self.branch_name = branch_name

if branch_name == "None":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here we put it in the HTML, in Python:

data-branch="{{ branch | urlencode }}"
.

You can see that depth is treated as optional, and only put in there if it is not None. However, branch is not considered as optional, and put in there all the time! So we need to add a similar conditional to branch, and only add it there if the branch argument is present.

var syncUrlParams = {
will also need to be amended to make branch operate in a similar fashion to depth, and
branch = self.get_argument('branch')
as well. There might be a couple other places that assume branch is always present, we'll have to track those down too.

The following changes ensure branch is only passed when it has been
defined:
- branch becomes keyword parameter to GitPuller
- branch only addd to query string when not undefined
- branch only encoded into url when exists
@yuvipanda yuvipanda merged commit 8d65dfc into jupyterhub:master Jun 8, 2021
@yuvipanda
Copy link
Contributor

\o/ Thank you very much, @sean-morris!

@consideRatio consideRatio changed the title Handle default or non-existing branch name Automatically detect default branch name Jun 9, 2021
yuvipanda added a commit to yuvipanda/datahub that referenced this pull request Jun 9, 2021
yuvipanda added a commit to yuvipanda/pilot-hubs that referenced this pull request Jun 9, 2021
yuvipanda added a commit to berkeley-dsep-infra/stat159-image that referenced this pull request Mar 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adapt to 'main' being default GitHub branch, not 'master'
5 participants