Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a /gitdiff API #482

Merged
merged 14 commits into from Aug 2, 2019
Merged

Add a /gitdiff API #482

merged 14 commits into from Aug 2, 2019

Conversation

jaipreet-s
Copy link
Contributor

@jaipreet-s jaipreet-s commented Jun 18, 2019

Overview

This change adds a server endpoint to diff two git refs for a single file. This is flexible to perform diffs across various points in the Git lifecycle such as diff'ing changes in the Working Tree, Index, or two arbitrary commits. There are two reserved refs WORKING and INDEX to designate the Working Tree and the git Index

A separate API was added to avoid using the "git:" prefix to designate git diffs and add many optional parameters in the existing /api/diff. This is backwards compatible, the existing /api/diff will work as expected.

Addresses #480

curl -X POST \
  http://localhost:8888/nbdime/api/diff \
  -H 'Content-Type: application/json' \
  -H 'cache-control: no-cache' \
  -d '{
    "base": "git:Untitled1.ipynb"
}'

Other details

  • Submitting early to get feedback. I'll add tests if the overall approach looks fine

Testing scenarios

Various cases for the git diff API

(Diff all changes in Working Tree)

curl -X POST \
  http://localhost:8888/nbdime/api/gitdiff \
  -H 'Content-Type: application/json' \
  -H 'cache-control: no-cache' \
  -d '{
	"ref_prev": "HEAD",
	"ref_curr": "WORKING",
	"file_name": "Untitled1.ipynb"
}'

(Diff all changes in staging)

curl -X POST \
  http://localhost:8888/nbdime/api/gitdiff \
  -H 'Content-Type: application/json' \
  -H 'cache-control: no-cache' \
  -d '{
	"ref_prev": "HEAD",
	"ref_curr": "INDEX",
	"file_name": "Untitled1.ipynb"
}'

(Diff changes between to Git refs in the local or remote history)

curl -X POST \
  http://localhost:8888/nbdime/api/gitdiff \
  -H 'Content-Type: application/json' \
  -H 'cache-control: no-cache' \
  -d '{
	"ref_prev": "origin/HEAD",
	"ref_curr": "HEAD",
	"file_name": "Untitled1.ipynb"
}'

Jaipreet Singh added 3 commits June 17, 2019 15:03
This change adds a server endpoint to diff two git refs for a single file. This is flexible to perform diffs across various points in the Git lifecycle such as diff'ing changes in the Working Tree, Index, or two arbitrary commits. There are two reserverd refs WORKING and INDEX to designate the Working Tree and the git Index
@vidartf
Copy link
Collaborator

vidartf commented Jun 18, 2019

Thanks for this. Two small points for now:

  • Would it make more sense to use objects for special refs (ref_prev: {special: 'INDEX/WORKING'}) to clearly differentiate, and still allow for tags to be called INDEX or WORKING (very minor corner case, but maybe more extensible).
  • Would it be more explicit to rename file_name to file_path? Does it accept any path (relative and absolute)?

@jaipreet-s
Copy link
Contributor Author

  • Would it make more sense to use objects for special refs (ref_prev: {special: 'INDEX/WORKING'}) to clearly differentiate, and still allow for tags to be called INDEX or WORKING (very minor corner case, but maybe more extensible).

I like it. How do you feel about just making it a string with a reserved prefix such as special:INDEX so that the API type is still a simple string for both regular git refs and "special" refs

@vidartf
Copy link
Collaborator

vidartf commented Jun 21, 2019

@jaipreet-s That just renames the corner case. I figured since most of the improvement of this API over the one in my branch was by getting rid of "magic" prefixes, we should go all the way.

Additionally,

* Improve error handling to return a JSON response so that the frontend can process it gracefully.
@jaipreet-s
Copy link
Contributor Author

jaipreet-s commented Jul 12, 2019

@vidartf : All in for going all the way! My latest commit adds the special case handling for the current_ref. I haven’t added this for the prev_ref yet since it didn’t quite make sense. For e.g.,

  • {curr_ref={“special”: “WORKING”}, prev_ref={“git”: “HEAD”}
    • This makes sense since we’re trying to see the changes in the Working Tree over the HEAD ref
  • {curr_ref={“git”: “HEAD”}, prev_ref={“git”: “WORKING”}
    • This didn’t quite make sense since it feels backwards.

So for prev_ref, the only accepted ref is “git” for now, but this is forwards compatible since we can just add a “special” field if the need arises.

@jaipreet-s
Copy link
Contributor Author

jaipreet-s commented Jul 12, 2019

Would it be more explicit to rename file_name to file_path? Does it accept any path (relative and absolute)?

Yes this should be file_path. This accepts both relative paths to the Jupyter server, or absolute file paths anywhere on the file system. I've renamed it to file_path in my latest commit.

Jaipreet Singh added 2 commits July 16, 2019 22:45
Again, this is backwards compatible with the existing /diff API which uses the "git:" prefixes.
@jaipreet-s
Copy link
Contributor Author

The latest CI build failure is because of the JS tests, the Python tests are good. https://travis-ci.org/jupyter/nbdime/jobs/559806046

@jaipreet-s
Copy link
Contributor Author

@vidartf : Do you have a chance to take a look at this? Understood you were busy with the JL1.0 work and vacation so this did not get much traction. Thanks!

@vidartf
Copy link
Collaborator

vidartf commented Jul 30, 2019

@jaipreet-s Thanks for the ping. I'm having a look, will push some code for discussion.

Avoid extra param and other changes to internal functions, whie still supporting the new cleaner API.
@vidartf
Copy link
Collaborator

vidartf commented Jul 30, 2019

@jaipreet-s I cleaned it up a bit. A quick overview:

  • Avoids using "special ref" extra argument in internal functions.
  • Moved get_git_notebooks to a new base class to keep the code in nb_server_extension and off the more generic base.
  • Renamed ref names to local/remote instead of prev/curr to keep it more in line with the rest of nbdime (although maybe we should use base/remote).
  • Allows local to be "index" such that index can be diffed to working tree ("changes that can be staged, but which are currently not").
  • Expanded the test to include some more combinations.

@vidartf
Copy link
Collaborator

vidartf commented Jul 30, 2019

Please review and let me know any comments (probably best to comment directly on the commit?)

Copy link
Contributor Author

@jaipreet-s jaipreet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vidartf : This is great! A couple of things below.

"""Get a stream to the notebook, for a given diff entry's path and blob

Returns None if path is not a Notebook file, and EXPLICIT_MISSING_FILE
if path is missing."""
if path:
if not path.endswith('.ipynb'):
return None
if special_ref_name == 'WORKING':
if blob is None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an issue here while performing the diff between two Git refs, where a file was added. The Base always points to the "Working Tree" version, whereas it should be empty, so when the client computes the diff on top of the base it shows an incorrect result. I attempted to fix this in the commit ef13feb. I'll try and have a stab at fixing this over your commits See 7bb0928.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this issue. In the docs you link in the commit (https://gitpython.readthedocs.io/en/stable/reference.html#git.diff.Diff), it says that for added/deleted files the path will be None. We check if the path is None at the start of the function. According to the docs, if the path is not None and the blob is, then we are diffing against the working tree.

Would you mind sharing a minimal reproducible sample? I don't really understand the screenshot..

Copy link
Contributor Author

@jaipreet-s jaipreet-s Aug 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vidartf

Repro setup:

  • Created this repo for the repo case https://github.com/jaipreet-s/diff-harness/commits/master
  • I have some changes in my Working Tree, print('Simple Diff Committed => print('Simple Diff in Working Tree)
  • We're trying to diff between commits eaf1741 and b82e191 in the repo so the API input is [1].

Repro output:

  • In the current state of the API (at commit 7bb0928) the output is as expected with base file empty here.
  • In the state at 94100ab, the "base" file is the content in the Working Tree here

So the issue is about the "base" output being returned incorrectly if we're trying to diff a commit in the history where a file has been added/

[1]

{
	"ref_local": {
		"git": "eaf1741"
		
	},
	"ref_remote": {
		"git": "b82e191"
	},
	"file_path": "SimpleDiff.ipynb"
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs, if the path is not None and the blob is, then we are diffing against the working tree.

I've documented the values for path, blob, and ref_name for 3 cases https://gist.github.com/jaipreet-s/396da4a2c13f476dec393e480b4e8b20#file-ref-blob-path-values

  • eaf1741 vs b82e191: blob is None in eaf1741 since the file was first added in b82e191. ref_name is the Git ref in both cases.
  • HEAD vs WORKING: blob is 6ee0 for HEAD. For Working, ref_name as well as blob is None.
  • HEAD vs INDEX: blob is 6ee0 for HEAD. For Index, ref_name is Diffable.Index and blob is 58d9d9

So, the issue is coming from the fact that blob can be None both in the Working Tree and if file is added/removed in a Git ref. But only the Working Tree will have the ref_name as None.

That is why in 7bb0928 I've checked if ref_name is the WorkingTree explicitly and only then use the content from the Working Tree.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see now on the GitPython repo there is an issue about this gitpython-developers/GitPython#749 .

nbdime/tests/test_server_extension.py Show resolved Hide resolved
Jaipreet Singh and others added 6 commits July 31, 2019 11:18
If a file was added or deleted, GitPython will indicate the blob as None. In the current case, we would always read the file from disk so the file contents would be same as the Working Tree. This commit modifies that handling to handle the Working Tree case separately, and the None blob case separately.

Tested on the both the nbdime and Git lab extensions.
@vidartf
Copy link
Collaborator

vidartf commented Aug 2, 2019

Add some minor tweaks and cleanups, merging! Thanks for this!

@vidartf vidartf merged commit f96ac65 into jupyter:master Aug 2, 2019
@jaipreet-s
Copy link
Contributor Author

Add some minor tweaks and cleanups, merging! Thanks for this!

Thanks @vidartf. Is there a plan to release a minor/pre-release version with this API so that clients (e.g. jupyterlab-git) can declare a dependency on it? (Otherwise the alternate would be a direct GitHub dependency)

@vidartf
Copy link
Collaborator

vidartf commented Aug 3, 2019

I was originally thinking that I would try to get the visual stuff in as well, but since this only touches the Python side, I guess a minor release would be quite simple. Aiming for a release on Monday then.

@jaipreet-s
Copy link
Contributor Author

@vidartf - I tried to release a version of jupyterlab-git with a Git dependency on nbdime, but PyPI doesn't allow uploading a package with a URL dependency. So, we'll need a new minor version of nbdime to release the visual diff changes for jupyterlab-git

@vidartf
Copy link
Collaborator

vidartf commented Aug 6, 2019

1.1.0 is out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants