New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a /gitdiff
API
#482
Add a /gitdiff
API
#482
Conversation
This change adds a server endpoint to diff two git refs for a single file. This is flexible to perform diffs across various points in the Git lifecycle such as diff'ing changes in the Working Tree, Index, or two arbitrary commits. There are two reserverd refs WORKING and INDEX to designate the Working Tree and the git Index
Thanks for this. Two small points for now:
|
I like it. How do you feel about just making it a string with a reserved prefix such as |
@jaipreet-s That just renames the corner case. I figured since most of the improvement of this API over the one in my branch was by getting rid of "magic" prefixes, we should go all the way. |
Additionally, * Improve error handling to return a JSON response so that the frontend can process it gracefully.
@vidartf : All in for going all the way! My latest commit adds the special case handling for the
So for |
Yes this should be |
Again, this is backwards compatible with the existing /diff API which uses the "git:" prefixes.
The latest CI build failure is because of the JS tests, the Python tests are good. https://travis-ci.org/jupyter/nbdime/jobs/559806046 |
@vidartf : Do you have a chance to take a look at this? Understood you were busy with the JL1.0 work and vacation so this did not get much traction. Thanks! |
@jaipreet-s Thanks for the ping. I'm having a look, will push some code for discussion. |
Avoid extra param and other changes to internal functions, whie still supporting the new cleaner API.
@jaipreet-s I cleaned it up a bit. A quick overview:
|
Please review and let me know any comments (probably best to comment directly on the commit?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vidartf : This is great! A couple of things below.
nbdime/gitfiles.py
Outdated
"""Get a stream to the notebook, for a given diff entry's path and blob | ||
|
||
Returns None if path is not a Notebook file, and EXPLICIT_MISSING_FILE | ||
if path is missing.""" | ||
if path: | ||
if not path.endswith('.ipynb'): | ||
return None | ||
if special_ref_name == 'WORKING': | ||
if blob is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an issue here while performing the diff between two Git refs, where a file was added. The Base always points to the "Working Tree" version, whereas it should be empty, so when the client computes the diff on top of the base it shows an incorrect result. I attempted to fix this in the commit ef13feb. I'll try and have a stab at fixing this over your commits See 7bb0928.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand this issue. In the docs you link in the commit (https://gitpython.readthedocs.io/en/stable/reference.html#git.diff.Diff), it says that for added/deleted files the path
will be None
. We check if the path
is None
at the start of the function. According to the docs, if the path is not None
and the blob
is, then we are diffing against the working tree.
Would you mind sharing a minimal reproducible sample? I don't really understand the screenshot..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repro setup:
- Created this repo for the repo case https://github.com/jaipreet-s/diff-harness/commits/master
- I have some changes in my Working Tree,
print('Simple Diff Committed
=>print('Simple Diff in Working Tree)
- We're trying to diff between commits eaf1741 and b82e191 in the repo so the API input is [1].
Repro output:
- In the current state of the API (at commit 7bb0928) the output is as expected with base file empty here.
- In the state at 94100ab, the "base" file is the content in the Working Tree here
So the issue is about the "base" output being returned incorrectly if we're trying to diff a commit in the history where a file has been added/
[1]
{
"ref_local": {
"git": "eaf1741"
},
"ref_remote": {
"git": "b82e191"
},
"file_path": "SimpleDiff.ipynb"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the docs, if the path is not None and the blob is, then we are diffing against the working tree.
I've documented the values for path, blob, and ref_name for 3 cases https://gist.github.com/jaipreet-s/396da4a2c13f476dec393e480b4e8b20#file-ref-blob-path-values
- eaf1741 vs b82e191: blob is
None
ineaf1741
since the file was first added inb82e191
. ref_name is the Git ref in both cases. - HEAD vs WORKING: blob is
6ee0
for HEAD. For Working, ref_name as well as blob is None. - HEAD vs INDEX: blob is 6ee0 for HEAD. For Index, ref_name is
Diffable.Index
and blob is58d9d9
So, the issue is coming from the fact that blob
can be None
both in the Working Tree and if file is added/removed in a Git ref. But only the Working Tree will have the ref_name
as None.
That is why in 7bb0928 I've checked if ref_name
is the WorkingTree
explicitly and only then use the content from the Working Tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see now on the GitPython repo there is an issue about this gitpython-developers/GitPython#749 .
If a file was added or deleted, GitPython will indicate the blob as None. In the current case, we would always read the file from disk so the file contents would be same as the Working Tree. This commit modifies that handling to handle the Working Tree case separately, and the None blob case separately. Tested on the both the nbdime and Git lab extensions.
Add some minor tweaks and cleanups, merging! Thanks for this! |
Thanks @vidartf. Is there a plan to release a minor/pre-release version with this API so that clients (e.g. jupyterlab-git) can declare a dependency on it? (Otherwise the alternate would be a direct GitHub dependency) |
I was originally thinking that I would try to get the visual stuff in as well, but since this only touches the Python side, I guess a minor release would be quite simple. Aiming for a release on Monday then. |
@vidartf - I tried to release a version of jupyterlab-git with a Git dependency on nbdime, but PyPI doesn't allow uploading a package with a URL dependency. So, we'll need a new minor version of nbdime to release the visual diff changes for jupyterlab-git |
1.1.0 is out |
Overview
This change adds a server endpoint to diff two git refs for a single file. This is flexible to perform diffs across various points in the Git lifecycle such as diff'ing changes in the Working Tree, Index, or two arbitrary commits. There are two reserved refs WORKING and INDEX to designate the Working Tree and the git Index
A separate API was added to avoid using the "git:" prefix to designate git diffs and add many optional parameters in the existing
/api/diff
. This is backwards compatible, the existing/api/diff
will work as expected.Addresses #480
Other details
Testing scenarios
Various cases for the
git diff
API(Diff all changes in Working Tree)
(Diff all changes in staging)
(Diff changes between to Git refs in the local or remote history)