Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve node.children handling #1969

Open
mozzie opened this issue Mar 6, 2024 · 9 comments
Open

Improve node.children handling #1969

mozzie opened this issue Mar 6, 2024 · 9 comments

Comments

@mozzie
Copy link
Contributor

mozzie commented Mar 6, 2024

I'm dealing with very large SVG pictures which I need to shrink. My setup is running SVGO in AWS Lambda through API Gateway, which puts my time limit to 29 seconds, which in several cases is not enough for the optimization.

I've started delving deeper into where the time goes, since in my mind the optimization process should be linear in several of the preset plugins. I have an example SVG with around 46000 lines/polylines/what have you (which eventually shrink to 27 paths) and even the simplest plugins seem to take 500 ms on average. This is caused by the following .includes check in xast.js:
if (parentNode.children.includes(node)) {
for (const child of node.children) {
visit(child, visitor, node);
}
}
Although I'm sure this check is there for a reason, removing it had no effect on my sample data result aside from removing 15 seconds out of the 19 it previously took. The way xast.visit function currently works it's not possible to, for example, use a Set to handle the validity of nodes. Handling the deletion of children in plugins is also fairly ineffective, sometimes using splice with indexOf and sometimes xast.detachNodeFromParent.

Introducing an index Set to each node, refactoring xast.js and preset plugins to support the index Set is easy, but this change would possibly break existing self-made plugins. My goal here is to start a small discussion about this, since there are easy performance improvements to be gained.

What does removing the if-clause break? Should the plugins be advised not to modify their siblings (so the check is unneccessary)? One possible solution might be to offer an interface to get .next() and previous() siblings for the plugins, and then tag the nodes deleted and only delete them after the forementioned for-loop has ended.

@KTibow
Copy link
Contributor

KTibow commented Mar 6, 2024

Notes from reading the code:

  • visit is always called with a root and never with parentNode.
  • The only time visit has parentNode is when it's calling itself on an element.
  • The only time that a node would be detached is when the enter call detached itself.
  • This should be easily fixable by returning visitSkip whenever we call detachNodeFromParent.

@mozzie
Copy link
Contributor Author

mozzie commented Mar 7, 2024

Enforcing the usage of visitSkip would allow the if-clause to be deleted. Just using it and leaving the if-clause in would not benefit in this situation. I believe that if-clause can be avoided with easy restrictions and there's good performance improvements to be gained.

Another issue I have is with detachNodeFromParent. Merging 46000 paths into a handful calls it a lot and the function goes through parentNode.children every time (although shrinking it in every call, but it's still a lot). My current solution is to copy mergePaths.js to my own plugin and modify the way it handles child deletion and path-merging. This more than halved the runtime of svgo alltogether.

@KTibow
Copy link
Contributor

KTibow commented Mar 7, 2024

Another issue I have is with detachNodeFromParent. Merging 46000 paths into a handful calls it a lot

mergePaths doesn't use detachNodeFromParent at all right now. Are you on an older version?

@mozzie
Copy link
Contributor Author

mozzie commented Mar 7, 2024

I opened this issue because of the problematic if-clause. It's O(n²) and there are no ways around the problem with custom plugins. Like you suggested, removing the check and using visitSkip any time the current node is deleted would solve the issue, although it would cause problems with existing custom plugins.

As for mergePaths, I see it doesn't use detachNodeFromParent anymore, but due to the excess updatePreviousPath calls, it's slow. This is a separate issue.

@mozzie
Copy link
Contributor Author

mozzie commented Mar 11, 2024

UPDATE: since there is no way to bypass this problem with the custom plugin codes itself, I did the following:

  1. copied svgo.js, xast.js and plugin.js to my project
  2. set my project to import { optimize } from ./custom-svgo/svgo.js
  3. fixed imports in the three files:
  • in svgo.js, imported invokePlugins from ./plugin.js
  • in plugin.js, imported visit from ./xast.js
  • in xast.js, imported visitSkip from the original package
  • other imports seemed to work when prefixed with svgo/lib/ (as in, import { builtin } from 'svgo/lib/builtin.js')
  1. removed the if-clause from xast.js
  2. only plugin I am using affected by the if-clause seemed to be removeEditorNSData, so I copied the source and had it return visitSkip when node was removed.

notes:

  • my monstrous SVG with 46,000 paths is now ~5 seconds compared to the previous 20
  • haven't noticed any broken SVG files so far, out of several thousands
  • this has to be the ugliest solution I've done since beginner courses in University
  • detaching a node is still done with an ineffective .filter (same with replacing node with it's children, which is done with .indexOf and .splice). I'm very sure there's a faster solution available where you return the node(or nodes) from enter/exit functions and then use that information to recreate the child-list in xast.js. I tried it quickly but couldn't wrap my head around it, and due to the urgency of this, I settled for the forementioned dumb solution.

@KTibow
Copy link
Contributor

KTibow commented Mar 11, 2024

The thing is that of course there isn't any breakage because it's deleted anyway. I bet this change could slow down some SVGs where the hidden elements have many many children.

@mozzie
Copy link
Contributor Author

mozzie commented Mar 11, 2024

I'm having hard time seeing how this would slow down anything? The elements with many children were the problem in my case, since each element of the list calls .includes. If children do not need to be visited by a certain plugin, the enter function should return visitSkip. Otherwise (even the hidden) child elements will get to the troublesome if-clause.

@KTibow
Copy link
Contributor

KTibow commented Mar 11, 2024

Well ideally it should always return visitSkip whenever a node is detached. I might be misinterpreting your comment, because what I got from it is that you only added a new visitSkip to one plugin.

@mozzie
Copy link
Contributor Author

mozzie commented Mar 11, 2024

Ah, I see. I also removed the if-clause from xast.js and manually added all the necessary files to my project up until that.
So TLDR: my fix does not have the if-clause anymore, and the only problem that I got taking it out was the one plugin, which I fixed by adding visitSkip.
I know I'm a bad explainer and having to translate everything to english adds an extra spice to it, but I'll try to be more precise in the future.

That being said, if you decide to fix this, I'll be glad to assist. I'd even love to get the detachNodeFromParent issues fixed too, but that requires a structural change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants