-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking βSign up for GitHubβ, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plots: grouping: stop using dpath.util.search #7811
Conversation
I would love it if someone could confirm the results. |
599eb19
to
3b22f9c
Compare
@pared Let's add |
@efiop yep, already working on that |
I can confirm. Last release:
This:
Thanks @pared. |
Performance issues will be back with us again now: Screen.Recording.2022-05-26.at.10.16.03.am.movcc @sroy3 Relates to iterative/vscode-dvc#1689 & iterative/vscode-dvc#1643 |
We can easily adjust the number of "buffer" rows to be rendered below and above the scroll line to prepare for faster scrolling. |
@karajan1001 @iterative/dvc Could someone review please? It's an important performance improvement for the VS Code release. |
@dberenbaum Mostly waiting for iterative/dvc-bench#352 to be able to confirm and keep an eye on. |
dvc/render/match.py
Outdated
revisions = list(plots_data) | ||
|
||
grouped: Dict[str, Dict] = defaultdict(dict) | ||
|
||
for revision in revisions: | ||
for file in files: | ||
path = [revision, "data", file] | ||
content = _get(plots_data, path) | ||
if content: | ||
dpath.util.new(grouped[file], path, content) | ||
return dict(grouped) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that the code above works, looks like we could just
revisions = list(plots_data) | |
grouped: Dict[str, Dict] = defaultdict(dict) | |
for revision in revisions: | |
for file in files: | |
path = [revision, "data", file] | |
content = _get(plots_data, path) | |
if content: | |
dpath.util.new(grouped[file], path, content) | |
return dict(grouped) | |
grouped = {} | |
for revision in plots_data.keys(): | |
data = plots_data[revision].get("data", {}) | |
for file in data.keys(): | |
content = data.get(file) | |
if content: | |
dpath.util.new(grouped, [file, revision, "data", file], content) | |
return grouped |
and not need get_files
(redundant walk and sort. Btw, not used anywhere, why wasn't it _get_files
? same question about other functions here CC @daavoo ), not need _get
, and not need revisions (were creating a list for no reason) and reduce complexity?
Unrelated: [file, revision, "data", file]
kinda hints that our format is pretty odd here π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question about other functions here CC @daavoo )
I don't recall I remember refactoring when dvc-render
extraction but I think that the logic was already there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was used somewhere else. Dropping it now.
3b22f9c
to
8b8efb0
Compare
8b8efb0
to
e881676
Compare
@@ -28,9 +26,9 @@ def test_plots_order(tmp_dir, dvc): | |||
name="stage2", | |||
) | |||
|
|||
assert get_files(dvc.plots.show()) == [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not have used that here, as it was supposed to test the order of show
results.
Thank you! |
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. π
Seems like
dpath.util.search
is really slow. Tried modifying the code to stop usingsearch
and usedpath.util.get
instead. There was a significant improvement (4x) but still seemed unreasonably long. Ended up implementing this method myself.We are still using
dpath.util.new
but its influence is orders of magnitude smaller than "reading" methods.test repository:
command:
time dvc plots diff
Result for main:
14.88s user 0.22s system 99% cpu 15.182 total
Result for this change:
1.33s user 0.22s system 95% cpu 1.627 total