Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a /gitdiff API #482

Merged
merged 14 commits into from Aug 2, 2019
57 changes: 27 additions & 30 deletions nbdime/gitfiles.py
Expand Up @@ -10,13 +10,20 @@
os.environ['GIT_PYTHON_REFRESH'] = 'quiet'
from git import (
Repo, InvalidGitRepositoryError, BadName, NoSuchPathError,
GitCommandNotFound,
GitCommandNotFound, Diffable
)

from nbdime.vcs.git.filter_integration import apply_possible_filter
from .utils import EXPLICIT_MISSING_FILE, pushd


# Git ref representing the working tree
GitRefWorkingTree = None

# Git ref representing the index
GitRefIndex = Diffable.Index


class BlobWrapper(io.StringIO):
"""StringIO with a name attribute"""
name = ''
Expand Down Expand Up @@ -93,15 +100,15 @@ def is_path_in_repo(path):
return False


def _get_diff_entry_stream(path, blob, ref_name, repo_dir, special_ref_name):
def _get_diff_entry_stream(path, blob, ref_name, repo_dir):
"""Get a stream to the notebook, for a given diff entry's path and blob

Returns None if path is not a Notebook file, and EXPLICIT_MISSING_FILE
if path is missing."""
if path:
if not path.endswith('.ipynb'):
return None
if special_ref_name == 'WORKING':
if blob is None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an issue here while performing the diff between two Git refs, where a file was added. The Base always points to the "Working Tree" version, whereas it should be empty, so when the client computes the diff on top of the base it shows an incorrect result. I attempted to fix this in the commit ef13feb. I'll try and have a stab at fixing this over your commits See 7bb0928.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this issue. In the docs you link in the commit (https://gitpython.readthedocs.io/en/stable/reference.html#git.diff.Diff), it says that for added/deleted files the path will be None. We check if the path is None at the start of the function. According to the docs, if the path is not None and the blob is, then we are diffing against the working tree.

Would you mind sharing a minimal reproducible sample? I don't really understand the screenshot..

Copy link
Contributor Author

@jaipreet-s jaipreet-s Aug 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vidartf

Repro setup:

  • Created this repo for the repo case https://github.com/jaipreet-s/diff-harness/commits/master
  • I have some changes in my Working Tree, print('Simple Diff Committed => print('Simple Diff in Working Tree)
  • We're trying to diff between commits eaf1741 and b82e191 in the repo so the API input is [1].

Repro output:

  • In the current state of the API (at commit 7bb0928) the output is as expected with base file empty here.
  • In the state at 94100ab, the "base" file is the content in the Working Tree here

So the issue is about the "base" output being returned incorrectly if we're trying to diff a commit in the history where a file has been added/

[1]

{
	"ref_local": {
		"git": "eaf1741"
		
	},
	"ref_remote": {
		"git": "b82e191"
	},
	"file_path": "SimpleDiff.ipynb"
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs, if the path is not None and the blob is, then we are diffing against the working tree.

I've documented the values for path, blob, and ref_name for 3 cases https://gist.github.com/jaipreet-s/396da4a2c13f476dec393e480b4e8b20#file-ref-blob-path-values

  • eaf1741 vs b82e191: blob is None in eaf1741 since the file was first added in b82e191. ref_name is the Git ref in both cases.
  • HEAD vs WORKING: blob is 6ee0 for HEAD. For Working, ref_name as well as blob is None.
  • HEAD vs INDEX: blob is 6ee0 for HEAD. For Index, ref_name is Diffable.Index and blob is 58d9d9

So, the issue is coming from the fact that blob can be None both in the Working Tree and if file is added/removed in a Git ref. But only the Working Tree will have the ref_name as None.

That is why in 7bb0928 I've checked if ref_name is the WorkingTree explicitly and only then use the content from the Working Tree.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see now on the GitPython repo there is an issue about this gitpython-developers/GitPython#749 .

# Diffing against working copy, use file on disk!
with pushd(repo_dir):
# Ensure we filter if appropriate:
Expand All @@ -115,10 +122,6 @@ def _get_diff_entry_stream(path, blob, ref_name, repo_dir, special_ref_name):
return io.open(path, encoding='utf-8')
except IOError:
return EXPLICIT_MISSING_FILE
elif blob is None:
# GitPython uses a None blob to indicate if a file was deleted or added.
# https://gitpython.readthedocs.io/en/stable/reference.html#git.diff.Diff
return EXPLICIT_MISSING_FILE
else:
# There were strange issues with passing blob data_streams around,
# so we solve this by reading into a StringIO buffer.
Expand All @@ -130,18 +133,14 @@ def _get_diff_entry_stream(path, blob, ref_name, repo_dir, special_ref_name):
return EXPLICIT_MISSING_FILE


def changed_notebooks(ref_base, ref_remote, paths=None, repo_dir=None, special_remote=None):
def changed_notebooks(ref_base, ref_remote, paths=None, repo_dir=None):
"""Iterator over all notebooks in path that has changed between the two git refs

References are all valid values according to git-rev-parse.

- If ref_remote is None or special_remote is None, the difference is taken between
ref_base and the working directory. (Default Case for Backwards Compatibility)
- If special_remote is WORKING, the difference is taken between ref_base and the working directory
- If special_remote is INDEX, the difference is taken between ref_base and the index.
References are all valid values according to git-rev-parse, or one of
the special sentinel values GitRefWorkingTree or GitRefIndex.

Iterator value is a base/remote pair of streams to Notebooks
(or possibly EXPLICIT_MISSING_FILE for added/removed files).
or EXPLICIT_MISSING_FILE for added/removed files.
"""
repo, popped = get_repo(repo_dir or os.curdir)
if repo_dir is None:
Expand All @@ -152,20 +151,18 @@ def changed_notebooks(ref_base, ref_remote, paths=None, repo_dir=None, special_r
# All paths need to be prepended by popped
paths = [os.path.join(*(popped + (p,))) for p in paths]

if (not ref_remote) and (not special_remote):
if ref_remote is None:
# Default case for backwards compatibility
special_remote = 'WORKING'

# Get tree for base and do the actual diff
tree_base = repo.commit(ref_base).tree

if special_remote == 'WORKING':
# If a None value is provided, GitPython diffs against the Working Tree
# https://gitpython.readthedocs.io/en/stable/reference.html#module-git.diff
diff = tree_base.diff(None, paths)
elif special_remote == 'INDEX':
# If no value is provided, GitPython diffs against the Index
diff = tree_base.diff(paths=paths)
ref_remote = GitRefWorkingTree

# Get tree/index for base
if ref_base == GitRefIndex:
tree_base = repo.index
else:
tree_base = repo.commit(ref_base).tree

if ref_remote in (GitRefWorkingTree, GitRefIndex):
diff = tree_base.diff(ref_remote, paths)
else:
# Get remote tree and diff against base:
tree_remote = repo.commit(ref_remote).tree
Expand All @@ -174,11 +171,11 @@ def changed_notebooks(ref_base, ref_remote, paths=None, repo_dir=None, special_r
# Return the base/remote pair of Notebook file streams
for entry in diff:
fa = _get_diff_entry_stream(
entry.a_path, entry.a_blob, ref_base, repo_dir, None)
entry.a_path, entry.a_blob, ref_base, repo_dir)
if fa is None:
continue
fb = _get_diff_entry_stream(
entry.b_path, entry.b_blob, ref_remote, repo_dir, special_remote)
entry.b_path, entry.b_blob, ref_remote, repo_dir)
if fb is None:
continue
yield (fa, fb)
54 changes: 36 additions & 18 deletions nbdime/tests/test_server_extension.py
Expand Up @@ -96,26 +96,44 @@ def test_diff_api(git_repo2, server_extension_app):
assert data['diff']
assert len(data.keys()) == 2

@pytest.mark.timeout(timeout=WEB_TEST_TIMEOUT)
def test_git_diff_api(git_repo2, server_extension_app):
@pytest.mark.timeout(timeout=WEB_TEST_TIMEOUT*6)
def test_git_diff_api(git_repo2, server_extension_app, filespath):
local_path = os.path.relpath(git_repo2, server_extension_app['path'])
url = 'http://127.0.0.1:%i/nbdime/api/gitdiff' % server_extension_app['port']
r = requests.post(
url, headers=auth_header,
data=json.dumps({
'ref_prev': {
'git': 'HEAD'
},
'ref_curr': {
'special': 'WORKING'
},
'file_path': pjoin(local_path, 'diff.ipynb')
}))
r.raise_for_status()
data = r.json()
nbformat.validate(data['base'])
assert data['diff']
assert len(data.keys()) == 2

# Add a difference betweeen index and working tree:
shutil.copy(
pjoin(filespath, 'foo--1.ipynb'),
pjoin(git_repo2, 'sub', 'subfile.ipynb')
)

def _make_ref(key):
if key.lower() in ('working', 'index'):
return {'special': key}
return {'git': key}

# Test various diffs:
for args in (
('HEAD', 'WORKING', 'diff.ipynb'),
('HEAD', 'INDEX', 'diff.ipynb'),
('INDEX', 'HEAD', 'diff.ipynb'),
('INDEX', 'WORKING', 'sub/subfile.ipynb'),
('index', 'working', 'sub/subfile.ipynb'),
('iNdeX', 'WorKING', 'sub/subfile.ipynb'),
):
print(args)
r = requests.post(
url, headers=auth_header,
data=json.dumps({
'ref_local': _make_ref(args[0]),
vidartf marked this conversation as resolved.
Show resolved Hide resolved
'ref_remote': _make_ref(args[1]),
'file_path': pjoin(local_path, args[2])
}))
r.raise_for_status()
data = r.json()
nbformat.validate(data['base'])
assert data['diff']
assert len(data.keys()) == 2


@pytest.mark.timeout(timeout=WEB_TEST_TIMEOUT)
Expand Down
156 changes: 104 additions & 52 deletions nbdime/webapp/nb_server_extension.py
Expand Up @@ -19,6 +19,7 @@
from ..gitfiles import (
changed_notebooks, is_path_in_repo, find_repo_root,
InvalidGitRepositoryError, BadName, GitCommandNotFound,
GitRefWorkingTree, GitRefIndex
)
from ..ignorables import diff_ignorables
from ..utils import read_notebook
Expand All @@ -33,6 +34,12 @@
)


special_refs = {
'working': GitRefWorkingTree,
'index': GitRefIndex,
}


class AuthMainDifftoolHandler(MainDifftoolHandler):
@authenticated
def get(self):
Expand Down Expand Up @@ -69,7 +76,55 @@ def get(self):
))


class ExtensionApiDiffHandler(ApiDiffHandler):
class BaseGitDiffHandler(ApiDiffHandler):

def get_git_notebooks(self, file_path, ref_base='HEAD', ref_remote=None):
"""
Gets the content of the before and after state of the notebook based on the given Git refs.

:param file_path: The path to the file being diffed
:param ref_base: the Git ref for the "local" or the "previous" state
:param ref_remote: the Git ref for the "remote" or the "current" state
:return: (base_nb, remote_nb)
"""
# Sometimes the root dir of the files is not cwd
nb_root = getattr(self.contents_manager, 'root_dir', None)
# Resolve base argument to a file system path
file_path = os.path.realpath(to_os_path(file_path, nb_root))

# Ensure path/root_dir that can be sent to git:
try:
git_root = find_repo_root(file_path)
except InvalidGitRepositoryError as e:
self.log.exception(e)
raise HTTPError(422, 'Invalid notebook: %s' % file_path)
file_path = os.path.relpath(file_path, git_root)

# Get the base/remote notebooks:
try:
for fbase, fremote in changed_notebooks(ref_base, ref_remote, file_path, git_root):
base_nb = read_notebook(fbase, on_null='minimal')
remote_nb = read_notebook(fremote, on_null='minimal')
break # there should only ever be one set of files
else:
# The filename was either invalid or the file is unchanged
# Assume unchanged, and let read_notebook handle error
# reporting if invalid
base_nb = self.read_notebook(os.path.join(git_root, file_path))
remote_nb = base_nb
except (InvalidGitRepositoryError, BadName) as e:
self.log.exception(e)
raise HTTPError(422, 'Invalid notebook: %s' % file_path)
except GitCommandNotFound as e:
self.log.exception(e)
raise HTTPError(
500, 'Could not find git executable. '
'Please ensure git is available to the server process.')

return base_nb, remote_nb


class ExtensionApiDiffHandler(BaseGitDiffHandler):
"""Diff API handler that also handles diff to git HEAD"""

@gen.coroutine
Expand Down Expand Up @@ -101,6 +156,7 @@ def _get_checkpoint_notebooks(self, base):
@authenticated
@gen.coroutine
def post(self):
# TODO: Add deprecation warning (for git/checkpoint only?)
# Assuming a request on the form "{'argname':arg}"
body = json.loads(escape.to_unicode(self.request.body))
base = body['base']
Expand Down Expand Up @@ -134,49 +190,52 @@ def curdir(self):
return root_dir


class GitDiffHandler(ApiDiffHandler):
class GitDiffHandler(BaseGitDiffHandler):
"""Diff API handler that handles diffs for two git refs"""

def _validate_request(self, body):
# Validate ref_prev
try:
ref_prev = body['ref_prev']
@classmethod
def parse_ref(cls, data):
return data.get('git', None) or special_refs[data['special'].lower()]

# Only git is supported in ref_prev
if 'git' not in ref_prev:
self.log.exception('Required key git not provided in ref_prev')
raise HTTPError(400, 'Required key git not provided in ref_prev.')

if 'special' in ref_prev:
self.log.exception('special is not supported in ref_prev')
raise HTTPError(400, ':qspecial is not supported in ref_prev')

except KeyError:
self.log.exception('Required key ref_prev not provided in the request')
raise HTTPError(400, 'Required key ref_prev not provided in the request')

# Validate ref_curr
try:
ref_curr = body['ref_curr']

# Either of special or git is supported in ref_curr
if 'special' in ref_curr and 'git' in ref_curr:
self.log.exception('Only one of special and git should be present in ref_curr.')
raise HTTPError(400, 'Only one of special and git should be present in ref_curr.')
def _validate_request(self, body):
def _fail(msg):
self.log.exception(msg)
raise HTTPError(400, msg)

# Validate refs
for refname in ('ref_local', 'ref_remote'):

# Validate ref_curr
try:
ref = body[refname]
except KeyError:
_fail('Required key %s not provided in the request' % (refname))

# Either of special or git is supported in ref
if 'special' in ref and 'git' in ref:
_fail('Only one of special and git should be present in git '
'reference.')

if not ('special' in ref or 'git' in ref):
_fail('At least one of special and git should be present in git '
'reference.')

if 'special' in ref:
special = ref['special'].lower()
if refname == 'ref_local':
if special != 'index':
_fail('Only "index" is allowed for the "special" value '
'on ref_local, got %r.' % (special,))
elif special not in ('index', 'working'):
_fail('Only "index" or "working is allowed for the "special" value '
'on ref_remote, got %r.' % (special,))

if not ('special' in ref_curr or 'git' in ref_curr):
self.log.exception('Atleast one of special and git should be present in ref_curr.')
raise HTTPError(400, 'Atleast one of special and git should be present in ref_curr.')
except KeyError:
self.log.exception('Required key ref_curr not provided in the request')
raise HTTPError(400, 'Required key ref_curr not provided in the request')

# Validate file_name
try:
body['file_path']
except KeyError:
self.log.exception('Required key file_name not provided in the request')
raise HTTPError(400, 'Required key file_name not provided in the request')
_fail('Required value file_path not provided in the request')


@authenticated
Expand All @@ -189,14 +248,14 @@ def post(self):
self._validate_request(body)

# Get file contents based on Git regs
ref_prev = body['ref_prev']
ref_curr = body['ref_curr']
ref_local = body['ref_local']
ref_remote = body['ref_remote']
file_path = body['file_path']
base_nb, remote_nb = self.get_git_notebooks(file_path,
ref_prev.get('git'),
ref_curr.get('git'),
ref_curr.get('special')
)
base_nb, remote_nb = self.get_git_notebooks(
file_path,
GitDiffHandler.parse_ref(ref_local),
GitDiffHandler.parse_ref(ref_remote),
)

# Perform actual diff and return data
thediff = diff_notebooks(base_nb, remote_nb)
Expand All @@ -206,18 +265,11 @@ def post(self):
'diff': thediff,
}
self.finish(data)
except HTTPError as e:
self.set_status(e.status_code)
self.finish({
'message': e.log_message,
'reason': e.reason
})
except HTTPError:
raise
except Exception:
self.log.exception('Error diffing documents:')
self.set_status(500)
self.finish({
'message': 'Error while attempting to diff documents'
})
raise HTTPError(500, 'Error while attempting to diff documents')


@property
Expand Down