Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: deduplicate assets with buffer source #4712

Merged
2 changes: 1 addition & 1 deletion docs/05-plugin-development.md
Expand Up @@ -732,7 +732,7 @@ If there are no dynamic imports, this will create exactly three chunks where the

Note that even though any module id can be used in `implicitlyLoadedAfterOneOf`, Rollup will throw an error if such an id cannot be uniquely associated with a chunk, e.g. because the `id` cannot be reached implicitly or explicitly from the existing static entry points, or because the file is completely tree-shaken. Using only entry points, either defined by the user or of previously emitted chunks, will always work, though.

If the `type` is _`asset`_, then this emits an arbitrary new file with the given `source` as content. It is possible to defer setting the `source` via [`this.setAssetSource(referenceId, source)`](guide/en/#thissetassetsource) to a later time to be able to reference a file during the build phase while setting the source separately for each output during the generate phase. Assets with a specified `fileName` will always generate separate files while other emitted assets may be deduplicated with existing assets if they have the same `string` source even if the `name` does not match. If the source is not a string but a typed array or `Buffer`, no deduplication will occur for performance reasons. If an asset without a `fileName` is not deduplicated, the [`output.assetFileNames`](guide/en/#outputassetfilenames) name pattern will be used.
If the `type` is _`asset`_, then this emits an arbitrary new file with the given `source` as content. It is possible to defer setting the `source` via [`this.setAssetSource(referenceId, source)`](guide/en/#thissetassetsource) to a later time to be able to reference a file during the build phase while setting the source separately for each output during the generate phase. Assets with a specified `fileName` will always generate separate files while other emitted assets may be deduplicated with existing assets if they have the same source even if the `name` does not match. If an asset without a `fileName` is not deduplicated, the [`output.assetFileNames`](guide/en/#outputassetfilenames) name pattern will be used.

#### `this.error`

Expand Down
70 changes: 37 additions & 33 deletions src/utils/FileEmitter.ts
Expand Up @@ -28,33 +28,35 @@ import { extname } from './path';
import { isPathFragment } from './relativeId';
import { makeUnique, renderNamePattern } from './renderNamePattern';

function generateAssetFileName(
function getSourceHash(source: string | Uint8Array, size?: number): string {
return createHash()
.update(source)
.digest('hex')
.slice(0, Math.max(0, size || defaultHashSize));
}

function pregenerateAssetFileName(
name: string | undefined,
source: string | Uint8Array,
outputOptions: NormalizedOutputOptions,
bundle: OutputBundleWithPlaceholders
): string {
outputOptions: NormalizedOutputOptions
): { fileName: string; sourceHash: string } {
let sourceHash: string | undefined;
const emittedName = outputOptions.sanitizeFileName(name || 'asset');
return makeUnique(
renderNamePattern(
typeof outputOptions.assetFileNames === 'function'
? outputOptions.assetFileNames({ name, source, type: 'asset' })
: outputOptions.assetFileNames,
'output.assetFileNames',
{
ext: () => extname(emittedName).slice(1),
extname: () => extname(emittedName),
hash: size =>
createHash()
.update(source)
.digest('hex')
.slice(0, Math.max(0, size || defaultHashSize)),
name: () =>
emittedName.slice(0, Math.max(0, emittedName.length - extname(emittedName).length))
}
),
bundle
const fileName = renderNamePattern(
typeof outputOptions.assetFileNames === 'function'
? outputOptions.assetFileNames({ name, source, type: 'asset' })
: outputOptions.assetFileNames,
'output.assetFileNames',
{
ext: () => extname(emittedName).slice(1),
extname: () => extname(emittedName),
hash: size => (sourceHash = getSourceHash(source, size)),
name: () =>
emittedName.slice(0, Math.max(0, emittedName.length - extname(emittedName).length))
}
);
sourceHash ??= getSourceHash(source);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought: Why not just const sourceHash = getSourceHash(source) at the top? But then I noticed that we allow specifying a hash size. Now this raises a problem, though an extremely edge case one: If for some reason with a dynamic asset file name function two assets with the same source are given file names with different hash length, they are not recognized as identical.
Going even deeper, why would we even be calling the assetFileNames function twice in such a situation? What we actually want is that we first calculate the hash, then compare the hash, and only if it does not match we call the function for the name. For this to work reliably, the first hash needs to have the maximum hash length (e.g. by passing Infinity as the size, which means we do not slice). Then during the actual file name generation we can slice the hash to the required size. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the contortions in the code were because of trying to reuse the dynamic hash size.

Your idea is great, implemented it in refactor: precompute hash then slice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot easier to directly check the final changelog, as the code is now a lot closer to the original https://github.com/rollup/rollup/pull/4712/files

return { fileName, sourceHash };
}

function reserveFileNameInBundle(
Expand Down Expand Up @@ -246,9 +248,6 @@ export class FileEmitter {
for (const emittedFile of this.filesByReferenceId.values()) {
if (emittedFile.fileName) {
reserveFileNameInBundle(emittedFile.fileName, output, this.options.onwarn);
if (emittedFile.type === 'asset' && typeof emittedFile.source === 'string') {
fileNamesBySource.set(emittedFile.source, emittedFile.fileName);
}
}
}
for (const [referenceId, consumedFile] of this.filesByReferenceId) {
Expand Down Expand Up @@ -332,17 +331,22 @@ export class FileEmitter {
referenceId: string,
{ bundle, fileNamesBySource, outputOptions }: FileEmitterOutput
): void {
const fileName =
consumedFile.fileName ||
(typeof source === 'string' && fileNamesBySource.get(source)) ||
generateAssetFileName(consumedFile.name, source, outputOptions, bundle);
let fileName = consumedFile.fileName;

// Deduplicate assets if an explicit fileName is not provided
if (!fileName) {
const pregenerated = pregenerateAssetFileName(consumedFile.name, source, outputOptions);
fileName = fileNamesBySource.get(pregenerated.sourceHash);
if (!fileName) {
fileName = makeUnique(pregenerated.fileName, bundle);
fileNamesBySource.set(pregenerated.sourceHash, fileName);
}
}

// We must not modify the original assets to avoid interaction between outputs
const assetWithFileName = { ...consumedFile, fileName, source };
this.filesByReferenceId.set(referenceId, assetWithFileName);
if (typeof source === 'string') {
fileNamesBySource.set(source, fileName);
}
patak-dev marked this conversation as resolved.
Show resolved Hide resolved

bundle[fileName] = {
fileName,
name: consumedFile.name,
Expand Down
35 changes: 35 additions & 0 deletions test/chunking-form/samples/emit-file/deduplicate-assets/_config.js
Expand Up @@ -5,10 +5,45 @@ module.exports = {
plugins: {
buildStart() {
this.emitFile({ type: 'asset', name: 'string.txt', source: 'string' });
this.emitFile({ type: 'asset', name: 'stringSameSource.txt', source: 'string' });
this.emitFile({
type: 'asset',
name: 'sameStringAsBuffer.txt',
source: Buffer.from('string') // Test cross Buffer/string deduplication
});
// Different string source
this.emitFile({ type: 'asset', name: 'otherString.txt', source: 'otherString' });

const bufferSource = () => Buffer.from([0x62, 0x75, 0x66, 0x66, 0x65, 0x72]);
this.emitFile({
type: 'asset',
name: 'buffer.txt',
source: bufferSource()
});
this.emitFile({
type: 'asset',
name: 'bufferSameSource.txt',
source: bufferSource()
});
this.emitFile({
type: 'asset',
name: 'sameBufferAsString.txt',
source: bufferSource().toString() // Test cross Buffer/string deduplication
});
// Different buffer source
this.emitFile({
type: 'asset',
name: 'otherBuffer.txt',
source: Buffer.from('otherBuffer')
});

// specific file names will not be deduplicated
this.emitFile({ type: 'asset', fileName: 'named/string.txt', source: 'named' });
this.emitFile({
type: 'asset',
fileName: 'named/buffer.txt',
source: bufferSource()
});
return null;
}
}
Expand Down
@@ -0,0 +1 @@
buffer
@@ -0,0 +1 @@
otherBuffer
@@ -0,0 +1 @@
buffer
@@ -0,0 +1 @@
buffer
@@ -0,0 +1 @@
otherBuffer
@@ -0,0 +1 @@
buffer
@@ -0,0 +1 @@
buffer
@@ -0,0 +1 @@
otherBuffer
@@ -0,0 +1 @@
buffer
@@ -0,0 +1 @@
buffer
@@ -0,0 +1 @@
otherBuffer
@@ -0,0 +1 @@
buffer
1 change: 1 addition & 0 deletions test/cli/samples/watch/bundle-error/main.js
@@ -0,0 +1 @@
export default 42;