Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Bug #1946 #1947

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Fix Bug #1946 #1947

wants to merge 3 commits into from

Conversation

codermapuche
Copy link

Optimize a large number of vectors sequentially instead of parallel to avoid the simultaneously opened file limit

Optimize a large number of vectors sequentially instead of parallel to avoid the simultaneously opened file limit
@KTibow
Copy link
Contributor

KTibow commented Jan 28, 2024

I don't like how this code is formatted. Just make it asynchronous if you want to use Promises, it makes it easier to read.

@codermapuche
Copy link
Author

@KTibow Feel free to format it however you prefer, I just tried using the tool, it broke and I fixed it, I hope this code helps you improve it faster

@SethFalco
Copy link
Member

SethFalco commented Jan 29, 2024

I don't like how this code is formatted. Just make it asynchronous if you want to use Promises, it makes it easier to read.

Agreed, this is an odd way to approach it in JavaScript. I'm guessing it's just habits from other languages.

Feel free to format it however you prefer, I just tried using the tool, it broke and I fixed it

If you're not prepared to address feedback constructively, it's better to just open an issue instead of a pull request. Not to mention, there are literally failing tests. ^-^'


While this does technically resolve the problem, it's also problematic to drop the parallelization.

  • We should continue handling multiple files at once, either batching according to cores count, using an arbitrary number, or using a library like graceful-fs which handles it automatically via backoff.
  • If we do any solution other than use graceful-fs, we should still check for the file limit error and provide a helpful error message.

In any case, happy to address formatting later, but could you please resolve linting and tests? Feel free to read the CONTRIBUTING guide for more context.

);

if ( svgFilesDescriptions.length == 0 ) {
throw 'No SVG files have been found in "' + dir + '" directory.';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should still be an Error

throw 'No SVG files have been found in "' + dir + '" directory.';
}

let results = [ ],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run Prettier to fix some of the failing tests, it will fix the formatting on a couple lines including this one

while ( svgFilesDescriptions.length > 0 ) {
let _svgFilesDescriptions = svgFilesDescriptions.splice(0, cpus);

_svgFilesDescriptions = _svgFilesDescriptions.map(function(fileDescription) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this line errors or not, I would expect it to since with TS once a variable has a type you don't change it. Usually here you would create a new variable to store these results.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you are right, i dont use TS, i do this changes over the npm downloaded code

}

let results = [ ],
cpus = os.cpus().length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth noting that since SVGO doesn't run in parallel it might not make sense to scale this by CPUs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventloop run in single thread in fact.
The use of cpu count is a number >= 1 and <= ulimit.
The better number is ulimit, but is only available under linux platform, i dont found a cross-platform way to determine max files open limit.

);
});

results = results.concat(await Promise.all(_svgFilesDescriptions));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slower than the existing version because each "chunk" of files takes as long as the longest file. I would suggest changing this to a worker-based method, let me know if you want me to explain

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is slower. This PR in fact run a throttle mechanism over chunks, this is the goal. Workers dont fix this, under linux the kernel has a limit over how many files can be open at same time, this limit is shared by all process of user, and another limit is shared for all process. The only way to prevent EMFILE error runing svgo over a huge number of svg is go slower and do the job in chunks.
The performance impact is minimun, runing over folder with 80k+ of svg works well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're misunderstanding what I said. Right now, it goes like "10 files open - > 6 files open - > 1 file open" if one takes longer than the rest, leading to it taking longer to start the next set of files. If you instead create 10 workers, each one will continously have 1 file open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants