fix: remove spread opts and toString of integrity #71

H4ad · 2023-04-01T05:11:53Z

I was looking the CPU profiler of pnpm and I saw this call:

https://github.com/pnpm/pnpm/blob/ef6c22e129dc3d76998cee33647b70a66d1f36bf/store/cafs/src/getFilePathInCafs.ts#L29-L30

I thought about what I could do to optimize and then I found good performance improvements.

Removing spread of opts

The first thing I notice was the spread of defaultOpts in every method, sometimes being called twice without needing.
So remove all the calls, before this change:

ssri.parse(base64, { single: true }) x 2,119,460 ops/sec ±1.93% (90 runs sampled)
ssri.parse(base64, { single: true, strict: true }) x 1,376,919 ops/sec ±0.93% (86 runs sampled)
ssri.parse(parsed, { single: true }) x 685,384 ops/sec ±0.91% (95 runs sampled)
ssri.parse(parsed, { single: true, strict: true }) x 448,575 ops/sec ±0.87% (95 runs sampled)

With the deletion of opts:

ssri.parse(base64, { single: true }) x 4,928,681 ops/sec ±2.46% (85 runs sampled)
ssri.parse(base64, { single: true, strict: true }) x 2,339,789 ops/sec ±0.83% (96 runs sampled)
ssri.parse(parsed, { single: true }) x 1,531,463 ops/sec ±1.10% (88 runs sampled)
ssri.parse(parsed, { single: true, strict: true }) x 805,785 ops/sec ±1.24% (87 runs sampled)

benchmark.js

const Benchmark = require('benchmark')
const ssri = require('./lib/index');
const suite = new Benchmark.Suite;
const fs = require('fs');
const crypto = require('crypto');

const TEST_DATA = fs.readFileSync(__filename)

function hash (data, algorithm) {
  return crypto.createHash(algorithm).update(data).digest('base64')
}

const sha = hash(TEST_DATA, 'sha512')
const integrity = `sha512-${sha}`;
const parsed = ssri.parse(integrity, { single: true });

suite
.add('ssri.parse(base64, { single: true })', function () {
  ssri.parse(integrity, { single: true })
})
.add('ssri.parse(base64, { single: true, strict: true })', function () {
  ssri.parse(integrity, { single: true, strict: true })
})
.add('ssri.parse(parsed, { single: true })', function () {
  ssri.parse(parsed, { single: true })
})
.add('ssri.parse(parsed, { single: true, strict: true })', function () {
  ssri.parse(parsed, { single: true, strict: true })
})
.on('cycle', function(event) {
  console.log(String(event.target))
})
.run({ 'async': false });

Faster toString of Integrity

I look at a bunch of maps and filters and I just rewrite everything to perform the same operation without a bunch of loops.

With this optimization, we gain a little bit more performance:

ssri.parse(base64, { single: true }) x 5,046,410 ops/sec ±0.98% (93 runs sampled)
ssri.parse(base64, { single: true, strict: true }) x 2,306,927 ops/sec ±1.26% (94 runs sampled)
ssri.parse(parsed, { single: true }) x 2,597,882 ops/sec ±1.19% (92 runs sampled)
ssri.parse(parsed, { single: true, strict: true }) x 1,005,282 ops/sec ±0.79% (96 runs sampled)

But with this change, I introduce a little breaking change, before, when calling toString of Integrity, the order of the hashes was defined by the order of the hashes during the insert/parsing.

Now, to optimize and avoid calling Object.keys on strict mode, I just call the properties directly, so the order will always be deterministic as: sha512, sha384, and sha256. If I change the order of these calls, the tests break.

If you think this is a problem, I can call Object.keys even in strict mode (-40k/ops).

Faster integrity check when is stream

I also take a look at streams mode because PNPM also verify the integrity of the files using streams.

The initial version was already fast compare to the main:

ssri.fromStream(stream, largeIntegrity) x 136 ops/sec ±3.17% (79 runs sampled)
ssri.fromStream(stream, tinyIntegrity) x 6,134 ops/sec ±2.32% (78 runs sampled)
ssri.checkStream(stream, largeIntegrity) x 150 ops/sec ±0.89% (77 runs sampled)
ssri.checkStream(stream, tinyIntegrity) x 8,121 ops/sec ±2.19% (78 runs sampled)

I also saw that checkStream doesn't support the option single and almost all verifications that are done by PNPM only verify a single hash, so I see an opportunity to push the performance a little bit further.

ssri.fromStream(stream, largeIntegrity) x 145 ops/sec ±1.86% (83 runs sampled)
ssri.fromStream(stream, tinyIntegrity) x 9,760 ops/sec ±2.97% (76 runs sampled)
ssri.checkStream(stream, largeIntegrity) x 150 ops/sec ±1.91% (77 runs sampled)
ssri.checkStream(stream, tinyIntegrity) x 9,024 ops/sec ±2.49% (76 runs sampled)

ssri.checkStream(stream, largeIntegrity, { single: true }) x 151 ops/sec ±1.10% (81 runs sampled)
ssri.checkStream(stream, tinyIntegrity, { single: true }) x 9,537 ops/sec ±1.64% (78 runs sampled)

But I did an experiment, If we ignore all the checkStream codes and jump to the final verification, we can achieve this performance:

ssri + createHash (largeIntegrity) x 343 ops/sec ±1.03% (82 runs sampled)
ssri + createHash (tinyIntegrity) x 17,360 ops/sec ±1.73% (79 runs sampled)

I put the code in the file above, the assumption is: if we verify only one hash, we can skip a lot of verifications.
So I think I could be good to ssri to export single hash verifications, what do you think?

benchmark-stream.js

const Benchmark = require('benchmark');
// const wtf = require("wtfnode");
// wtf.init();
const ssri = require('./lib/index');
const suite = new Benchmark.Suite();
const fs = require('fs');
const crypto = require('crypto');
const { Readable } = require('stream');

const largeText = 'a'.repeat(64).repeat(100);
const largeTextSplitted = largeText.split('');

const tinyText = 'a'.repeat(64);
const tinyTextSplitted = tinyText.split('');

const getStream = (text) => Readable.from(text);

function hash(data, algorithm) {
  return crypto.createHash(algorithm).update(data).digest('base64');
}

const largeIntegrity = `sha512-${hash(largeText, 'sha512')}`;
const tinyIntegrity = `sha512-${hash(tinyText, 'sha512')}`;

suite
  .add('ssri.fromStream(stream, largeIntegrity)', {
    defer: true,
    fn: function (deferred) {
      const stream = getStream(largeTextSplitted);

      ssri.fromStream(stream, largeIntegrity).then(() => {
        deferred.resolve();
      });
    },
  })
  .add('ssri.fromStream(stream, tinyIntegrity)', {
    defer: true,
    fn: function (deferred) {
      const stream = getStream(tinyTextSplitted);

      ssri.fromStream(stream, tinyIntegrity).then(() => {
        deferred.resolve();
      });
    },
  })
  .add('ssri.checkStream(stream, largeIntegrity)', {
    defer: true,
    fn: function (deferred) {
      const stream = getStream(largeTextSplitted);

      ssri.checkStream(stream, largeIntegrity).then(() => {
        deferred.resolve();
      });
    },
  })
  .add('ssri.checkStream(stream, tinyIntegrity)', {
    defer: true,
    fn: function (deferred) {
      const stream = getStream(tinyTextSplitted);

      ssri.checkStream(stream, tinyIntegrity).then(() => {
        deferred.resolve();
      });
    },
  })
  .add('ssri.checkStream(stream, largeIntegrity, { single: true })', {
    defer: true,
    fn: function (deferred) {
      const stream = getStream(largeTextSplitted);

      ssri.checkStream(stream, largeIntegrity, { single: true }).then(() => {
        deferred.resolve();
      });
    },
  })
  .add('ssri.checkStream(stream, tinyIntegrity, { single: true })', {
    defer: true,
    fn: function (deferred) {
      const stream = getStream(tinyTextSplitted);

      ssri.checkStream(stream, tinyIntegrity, { single: true }).then(() => {
        deferred.resolve();
      });
    },
  })
  .add('ssri + createHash (largeIntegrity)', {
    defer: true,
    fn: function (deferred) {
      const stream = getStream(largeTextSplitted);
      const parsed = ssri.parse(largeIntegrity, { single: true });
      const hash = crypto.createHash(parsed.algorithm);

      stream.pipe(hash);
      stream.on('end', () => {
        const digest = hash.digest('base64');

        if (parsed.digest !== digest) {
          throw new Error('Integrity check failed');
        }
        deferred.resolve();
      });
    },
  })
  .add('ssri + createHash (tinyIntegrity)', {
    defer: true,
    fn: function (deferred) {
      const stream = getStream(tinyTextSplitted);
      const parsed = ssri.parse(tinyIntegrity, { single: true });
      const hash = crypto.createHash(parsed.algorithm);

      stream.pipe(hash);
      stream.on('end', () => {
        const digest = hash.digest('base64');

        if (parsed.digest !== digest) {
          throw new Error('Integrity check failed');
        }
        deferred.resolve();
      });
    },
  })
  .on('cycle', function (event) {
    console.log(String(event.target));
    // wtf.dump();
  })
  .run({ async: false });

In general, with these optimizations, we had a bump of more than 2x in performance.

wraithgar · 2023-04-01T17:41:26Z

$ npm view ssri engines
{ node: '^14.17.0 || ^16.13.0 || >=18.0.0' }

we can use opts?.algorithms et al

wraithgar · 2023-04-01T18:55:11Z

lib/index.js

@@ -161,7 +162,7 @@ class Hash {
    if (!match) {
      return
    }
-    if (strict && !SPEC_ALGORITHMS.some(a => a === match[1])) {
+    if (strict && SPEC_ALGORITHMS[match[1]] !== true) {


We can keep SPEC_ALGORITHMS as an array and use .includes

Using includes is little bit slower than just the property index, do you have some reason to keep it as an array?

It's a balance against performance and developer experience. I think "a little bit" slower is ok here given that the vast majority of an npm install is disk and io bound.

I did a mistake, using includes is faster if we don't know the value:

const Benchmark = require('benchmark'); const suite = new Benchmark.Suite(); const SPEC_ALGORITHMS = { sha256: true, sha384: true, sha512: true, }; const SPEC_ALGORITHMS_ARRAY = Object.keys(SPEC_ALGORITHMS); const randomAndUnkown = [...SPEC_ALGORITHMS_ARRAY, 'test']; suite .add('includes', function () { const random = randomAndUnkown[Math.floor(Math.random() * randomAndUnkown.length)]; const r = SPEC_ALGORITHMS_ARRAY.includes(random); }) .add('index access', function () { const random = randomAndUnkown[Math.floor(Math.random() * randomAndUnkown.length)]; const r = SPEC_ALGORITHMS[random] === true; }) .on('cycle', function (event) { console.log(String(event.target)); }) .run({ async: false });

Perf:

includes x 83,638,547 ops/sec ±1.96% (92 runs sampled) index access x 28,349,129 ops/sec ±2.07% (90 runs sampled)

My assumptions not always are good, I forgot that random access for an object is slower.

I will turn back to includes.

wraithgar · 2023-04-01T19:01:45Z

I really appreciate the time you put into this, can we maybe break this up so that the changes aren't so huge in one PR? Specifically I'm worried about the default opts handling. It'd be nice to isolate those changes against the other tweaks.

H4ad · 2023-04-01T19:03:07Z

@wraithgar What about 3 PRs:

One for options
One for faster to string
One for stream

It will be better?

wraithgar · 2023-04-01T19:04:23Z

Yes I think 3 PRs would be ok.

H4ad · 2023-04-01T19:30:48Z

@wraithgar First one created at #72, when it was merged, I create the next one.

H4ad added 2 commits April 1, 2023 01:56

perf: remove spread of defaultOpts

3743043

perf: faster toString for integrity

00378da

H4ad requested a review from a team as a code owner April 1, 2023 05:11

H4ad requested a review from wraithgar April 1, 2023 05:11

wraithgar changed the title ~~perf: remove spread opts and toString of integrity~~ fix: remove spread opts and toString of integrity Apr 1, 2023

wraithgar reviewed Apr 1, 2023

View reviewed changes

H4ad added 2 commits April 1, 2023 15:58

perf: faster stream verification

d54207b

refactor: use more optional chaining instead &&

694f7ed

H4ad closed this Apr 1, 2023

This was referenced Apr 3, 2023

fix: faster toString for integrity #75

Merged

fix: faster stream verification #76

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: remove spread opts and toString of integrity #71

fix: remove spread opts and toString of integrity #71

H4ad commented Apr 1, 2023 •

edited

wraithgar commented Apr 1, 2023

wraithgar Apr 1, 2023

H4ad Apr 1, 2023

wraithgar Apr 1, 2023

H4ad Apr 1, 2023

wraithgar commented Apr 1, 2023

H4ad commented Apr 1, 2023

wraithgar commented Apr 1, 2023

H4ad commented Apr 1, 2023

fix: remove spread opts and toString of integrity #71

fix: remove spread opts and toString of integrity #71

Conversation

H4ad commented Apr 1, 2023 • edited

Removing spread of opts

Faster toString of Integrity

Faster integrity check when is stream

wraithgar commented Apr 1, 2023

wraithgar Apr 1, 2023

Choose a reason for hiding this comment

H4ad Apr 1, 2023

Choose a reason for hiding this comment

wraithgar Apr 1, 2023

Choose a reason for hiding this comment

H4ad Apr 1, 2023

Choose a reason for hiding this comment

wraithgar commented Apr 1, 2023

H4ad commented Apr 1, 2023

wraithgar commented Apr 1, 2023

H4ad commented Apr 1, 2023

H4ad commented Apr 1, 2023 •

edited