Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend file type (updated) #603

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
bda3f46
Allow specification of custom detectors + readme update
Jul 14, 2023
6007eff
Simplify logic in runCustomDetectors
Jul 14, 2023
c3dba6e
add custom detectors to fileTypeFromStream
Jul 14, 2023
fab97ae
fix linting issue
Jul 14, 2023
c7c3190
Execute custom detectors before default ones
Jul 17, 2023
4bcddff
add tests
Jul 17, 2023
733bfac
fix docs
Jul 17, 2023
37e1e57
compatibility with Node.js 14 and 16
Jul 17, 2023
bfd18b1
Remove blank space
FredrikSchaefer Jul 25, 2023
ee4cb2c
Wrap custom detectors into file type options
Jul 25, 2023
ad6d44f
Merge branch 'extend-file-type-updated' of github.com:FredrikSchaefer…
Jul 25, 2023
29930bf
Adjust fileTypeFromFile(...) to recent changes
FredrikSchaefer Jul 25, 2023
7ea6efd
Moved custom detectors from function to constructor argument
Oct 17, 2023
748ffee
fix fileTypeStream (add back fileTypeOptions)
Oct 17, 2023
2adec69
Update documentation
Oct 17, 2023
0d1464c
add check for illegal tokenizer position change
Oct 18, 2023
6b6188c
Update core.d.ts
FredrikSchaefer Oct 23, 2023
6806753
Update core.d.ts
FredrikSchaefer Oct 23, 2023
61e052e
Update readme.md (move custom detectors section as suggested by revie…
Oct 23, 2023
eed198d
Remove fileType prefix from class member functions
Oct 24, 2023
ff84f3e
Make runCustomDetectors private
Oct 24, 2023
326ccd1
Add class based approach to fileTypeStream
Oct 24, 2023
011fa53
Change error handling for read operations of custom detectors
Oct 25, 2023
b346f7c
Remove obsolete @throws from documentation
Oct 25, 2023
9e24ed9
Make usage of FileTypeParser class consistent
Oct 25, 2023
a926bf2
Rename stream(...) to toDetectingStream(...)
Oct 25, 2023
5e2a0fd
Fix error handling
Oct 25, 2023
f38565d
Suggested changes to simplify code
Borewit Oct 25, 2023
e25c294
Merge pull request #2 from sindresorhus/extend-file-type-updated-sugg…
FredrikSchaefer Oct 25, 2023
080ac75
Fix TypeScript declaration
Oct 25, 2023
de706c5
Remove comments from unit tests and redundant empty line
Borewit Nov 6, 2023
331502d
Make code examples executable.
Borewit Nov 6, 2023
9d85f05
Remove empty comment lines
Borewit Nov 6, 2023
ede94d9
Remove unused `fileTypeOptions` parameter from typings
Borewit Nov 6, 2023
ca6e449
Adjust number code and comment style suggestions
Borewit Nov 10, 2023
a50e37a
Update core.d.ts
sindresorhus Nov 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
90 changes: 89 additions & 1 deletion core.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -419,7 +419,10 @@ if (stream2.fileType?.mime === 'image/jpeg') {
export function fileTypeStream(readableStream: ReadableStream, options?: StreamOptions): Promise<ReadableStreamWithFileType>;

/**
Detect the file type of a [`Blob`](https://nodejs.org/api/buffer.html#class-blob).
Detect the file type of a [`Blob`](https://nodejs.org/api/buffer.html#class-blob) or [`File`](https://developer.mozilla.org/en-US/docs/Web/API/File).

@param blob [`Blob`](https://nodejs.org/api/buffer.html#class-blob) used for file detection
Borewit marked this conversation as resolved.
Show resolved Hide resolved
@returns The detected file type and MIME type, or `undefined` when there is no match.
sindresorhus marked this conversation as resolved.
Show resolved Hide resolved

@example
```
Expand All @@ -435,3 +438,88 @@ console.log(await fileTypeFromBlob(blob));
```
*/
export declare function fileTypeFromBlob(blob: Blob): Promise<FileTypeResult | undefined>;

/**
Function that allows specifying custom detection mechanisms.

An iterable of detectors can be provided via the `fileTypeOptions` argument for the {@link FileTypeParser.constructor}.

The detectors are called before the default detections in the provided order.

Custom detectors can be used to add new FileTypeResults or to modify return behaviour of existing FileTypeResult detections.
Borewit marked this conversation as resolved.
Show resolved Hide resolved

If the detector returns `undefined`, there are 2 possible scenarios:

1. The detector has not read from the tokenizer, it will be proceeded with the next available detector.
2. The detector has read from the tokenizer (`tokenizer.position` has been increased).
In that case no further detectors will be executed and the final conclusion is that file-type returns undefined.
Note that this an exceptional scenario, as the detector takes the opportunity from any other detector to determine the file type.

Example detector array which can be extended and provided via the fileTypeOptions argument:

```
import {FileTypeParser} from 'file-type';

const customDetectors = [
async tokenizer => {
const unicornHeader = [85, 78, 73, 67, 79, 82, 78]; // "UNICORN" as decimal string
const buffer = Buffer.alloc(7);
await tokenizer.peekBuffer(buffer, {length: unicornHeader.length, mayBeLess: true});
if (unicornHeader.every((value, index) => value === buffer[index])) {
return {ext: 'unicorn', mime: 'application/unicorn'};
}

return undefined;
},
];

const buffer = Buffer.from("UNICORN");
const parser = new FileTypeParser({customDetectors});
const fileType = await parser.fromBuffer(buffer);
console.log(fileType);
```

@param tokenizer - [Tokenizer](https://github.com/Borewit/strtok3#tokenizer), used to read the file content from.
@param fileType - FileTypeResult detected by the standard detections or a previous custom detection. Undefined if no matching fileTypeResult could be found.
@returns supposedly detected file extension and MIME type as a FileTypeResult-like object, or `undefined` when there is no match.
*/
export type Detector = (tokenizer: ITokenizer, fileType?: FileTypeResult) => Promise<FileTypeResult | undefined>;

export type FileTypeOptions = {
customDetectors?: Iterable<Detector>;
};

export declare class TokenizerPositionError extends Error {
constructor(message?: string);
}

export declare class FileTypeParser {
detectors: Iterable<Detector>;

constructor(options?: {customDetectors?: Iterable<Detector>});

/**
Works the same way as {@link fileTypeFromBuffer}, additionally taking into account custom detectors (if any were provided to the constructor).
*/
fromBuffer(buffer: Uint8Array | ArrayBuffer): Promise<FileTypeResult | undefined>;

/**
Works the same way as {@link fileTypeFromStream}, additionally taking into account custom detectors (if any were provided to the constructor).
*/
fromStream(stream: ReadableStream): Promise<FileTypeResult | undefined>;

/**
Works the same way as {@link fileTypeFromTokenizer}, additionally taking into account custom detectors (if any were provided to the constructor).
*/
fromTokenizer(tokenizer: ITokenizer): Promise<FileTypeResult | undefined>;

/**
Works the same way as {@link fileTypeFromBlob}, additionally taking into account custom detectors (if any were provided to the constructor).
*/
fromBlob(blob: Blob): Promise<FileTypeResult | undefined>;

/**
Works the same way as {@link fileTypeStream}, additionally taking into account custom detectors (if any were provided to the constructor).
*/
toDetectionStream(readableStream: ReadableStream, options?: StreamOptions): Promise<FileTypeResult | undefined>;
}
155 changes: 95 additions & 60 deletions core.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,31 +11,15 @@ import {extensions, mimeTypes} from './supported.js';
const minimumBytes = 4100; // A fair amount of file-types are detectable within this range.

export async function fileTypeFromStream(stream) {
const tokenizer = await strtok3.fromStream(stream);
try {
return await fileTypeFromTokenizer(tokenizer);
} finally {
await tokenizer.close();
}
return new FileTypeParser().fromStream(stream);
}

export async function fileTypeFromBuffer(input) {
if (!(input instanceof Uint8Array || input instanceof ArrayBuffer)) {
throw new TypeError(`Expected the \`input\` argument to be of type \`Uint8Array\` or \`Buffer\` or \`ArrayBuffer\`, got \`${typeof input}\``);
}

const buffer = input instanceof Uint8Array ? input : new Uint8Array(input);

if (!(buffer?.length > 1)) {
return;
}

return fileTypeFromTokenizer(strtok3.fromBuffer(buffer));
return new FileTypeParser().fromBuffer(input);
}

export async function fileTypeFromBlob(blob) {
const buffer = await blob.arrayBuffer();
return fileTypeFromBuffer(new Uint8Array(buffer));
return new FileTypeParser().fromBlob(blob);
}

function _check(buffer, headers, options) {
Expand All @@ -60,16 +44,98 @@ function _check(buffer, headers, options) {
}

export async function fileTypeFromTokenizer(tokenizer) {
try {
return new FileTypeParser().parse(tokenizer);
} catch (error) {
if (!(error instanceof strtok3.EndOfStreamError)) {
throw error;
return new FileTypeParser().fromTokenizer(tokenizer);
}

export class FileTypeParser {
constructor(options) {
this.detectors = options?.customDetectors;

this.fromTokenizer = this.fromTokenizer.bind(this);
this.fromBuffer = this.fromBuffer.bind(this);
this.parse = this.parse.bind(this);
}

async fromTokenizer(tokenizer) {
const initialPosition = tokenizer.position;

for (const detector of this.detectors || []) {
const fileType = await detector(tokenizer);
if (fileType) {
return fileType;
}

if (initialPosition !== tokenizer.position) {
return undefined; // Cannot proceed scanning of the tokenizer is at an arbitrary position
}
}

return this.parse(tokenizer);
}

async fromBuffer(input) {
if (!(input instanceof Uint8Array || input instanceof ArrayBuffer)) {
throw new TypeError(`Expected the \`input\` argument to be of type \`Uint8Array\` or \`Buffer\` or \`ArrayBuffer\`, got \`${typeof input}\``);
}

const buffer = input instanceof Uint8Array ? input : new Uint8Array(input);

if (!(buffer?.length > 1)) {
return;
}

return this.fromTokenizer(strtok3.fromBuffer(buffer));
}

async fromBlob(blob) {
const buffer = await blob.arrayBuffer();
return this.fromBuffer(new Uint8Array(buffer));
}

async fromStream(stream) {
const tokenizer = await strtok3.fromStream(stream);
try {
return await this.fromTokenizer(tokenizer);
} finally {
await tokenizer.close();
}
}

async toDetectionStream(readableStream, options = {}) {
const {default: stream} = await import('node:stream');
const {sampleSize = minimumBytes} = options;

return new Promise((resolve, reject) => {
readableStream.on('error', reject);

readableStream.once('readable', () => {
(async () => {
try {
// Set up output stream
const pass = new stream.PassThrough();
const outputStream = stream.pipeline ? stream.pipeline(readableStream, pass, () => {}) : readableStream.pipe(pass);

// Read the input stream and detect the filetype
const chunk = readableStream.read(sampleSize) ?? readableStream.read() ?? Buffer.alloc(0);
try {
pass.fileType = await this.fromBuffer(chunk);
} catch (error) {
if (error instanceof strtok3.EndOfStreamError) {
pass.fileType = undefined;
} else {
reject(error);
}
}

resolve(outputStream);
} catch (error) {
reject(error);
}
})();
});
});
}
}

class FileTypeParser {
check(header, options) {
return _check(this.buffer, header, options);
}
Expand Down Expand Up @@ -211,7 +277,7 @@ class FileTypeParser {
}

await tokenizer.ignore(id3HeaderLength);
return fileTypeFromTokenizer(tokenizer); // Skip ID3 header, recursion
return this.fromTokenizer(tokenizer); // Skip ID3 header, recursion
}

// Musepack, SV7
Expand Down Expand Up @@ -1602,39 +1668,8 @@ class FileTypeParser {
}
}

export async function fileTypeStream(readableStream, {sampleSize = minimumBytes} = {}) {
const {default: stream} = await import('node:stream');

return new Promise((resolve, reject) => {
readableStream.on('error', reject);

readableStream.once('readable', () => {
(async () => {
try {
// Set up output stream
const pass = new stream.PassThrough();
const outputStream = stream.pipeline ? stream.pipeline(readableStream, pass, () => {}) : readableStream.pipe(pass);

// Read the input stream and detect the filetype
const chunk = readableStream.read(sampleSize) ?? readableStream.read() ?? Buffer.alloc(0);
try {
const fileType = await fileTypeFromBuffer(chunk);
pass.fileType = fileType;
} catch (error) {
if (error instanceof strtok3.EndOfStreamError) {
pass.fileType = undefined;
} else {
reject(error);
}
}

resolve(outputStream);
} catch (error) {
reject(error);
}
})();
});
});
export async function fileTypeStream(readableStream, options = {}) {
return new FileTypeParser().toDetectionStream(readableStream, options);
}

export const supportedExtensions = new Set(extensions);
Expand Down
1 change: 1 addition & 0 deletions fixture/fixture.unicorn
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
UNICORN FILE CONTENT
7 changes: 4 additions & 3 deletions index.js
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
import * as strtok3 from 'strtok3';
import {fileTypeFromTokenizer} from './core.js';
import {FileTypeParser} from './core.js';

export async function fileTypeFromFile(path) {
export async function fileTypeFromFile(path, fileTypeOptions) {
const tokenizer = await strtok3.fromFile(path);
try {
return await fileTypeFromTokenizer(tokenizer);
const parser = new FileTypeParser(fileTypeOptions);
return await parser.fromTokenizer(tokenizer);
} finally {
await tokenizer.close();
}
Expand Down
60 changes: 60 additions & 0 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,10 @@ console.log(await fileTypeFromBlob(blob));
//=> {ext: 'txt', mime: 'plain/text'}
```

#### blob

Type: [`Blob`](https://developer.mozilla.org/en-US/docs/Web/API/Blob)

### fileTypeFromTokenizer(tokenizer)

Detect the file type from an `ITokenizer` source.
Expand Down Expand Up @@ -305,6 +309,48 @@ Returns a `Set<string>` of supported file extensions.

Returns a `Set<string>` of supported MIME types.

## Custom detectors

A custom detector is a function that allows specifying custom detection mechanisms.

An iterable of detectors can be provided via the `fileTypeOptions` argument for the `FileTypeParser.constructor`.

The detectors are called before the default detections in the provided order.

Custom detectors can be used to add new `FileTypeResults` or to modify return behaviour of existing FileTypeResult detections.

If the detector returns `undefined`, there are 2 possible scenarios:

1. The detector has not read from the tokenizer, it will be proceeded with the next available detector.
2. The detector has read from the tokenizer (`tokenizer.position` has been increased).
In that case no further detectors will be executed and the final conclusion is that file-type returns undefined.
Note that this an exceptional scenario, as the detector takes the opportunity from any other detector to determine the file type.
Comment on lines +326 to +327
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In that case no further detectors will be executed and the final conclusion is that file-type returns undefined.
Note that this an exceptional scenario, as the detector takes the opportunity from any other detector to determine the file type.
In that case no further detectors will be executed and the final conclusion is that file-type returns undefined.
Note that this an exceptional scenario, as the detector takes the opportunity from any other detector to determine the file type.



Example detector array which can be extended and provided to each public method via the `fileTypeOptions` argument:
```js
import {FileTypeParser} from 'file-type';

const customDetectors = [
async tokenizer => {
const unicornHeader = [85, 78, 73, 67, 79, 82, 78]; // "UNICORN" as decimal string
const buffer = Buffer.alloc(7);
await tokenizer.peekBuffer(buffer, {length: unicornHeader.length, mayBeLess: true});

if (unicornHeader.every((value, index) => value === buffer[index])) {
return {ext: 'unicorn', mime: 'application/unicorn'};
}
Borewit marked this conversation as resolved.
Show resolved Hide resolved

return undefined;
},
];

const buffer = Buffer.from("UNICORN");
const parser = new FileTypeParser({customDetectors});
const fileType = await parser.fromBuffer(buffer);
console.log(fileType);
```

## Supported file types

- [`3g2`](https://en.wikipedia.org/wiki/3GP_and_3G2#3G2) - Multimedia container format defined by the 3GPP2 for 3G CDMA2000 multimedia services
Expand Down Expand Up @@ -469,6 +515,20 @@ The following file types will not be accepted:
- `.csv` - [Reason.](https://github.com/sindresorhus/file-type/issues/264#issuecomment-568439196)
- `.svg` - Detecting it requires a full-blown parser. Check out [`is-svg`](https://github.com/sindresorhus/is-svg) for something that mostly works.

#### tokenizer

Type: [`ITokenizer`](https://github.com/Borewit/strtok3#tokenizer)

Usable as source of the examined file.

#### fileType

Type: FileTypeResult

Object having an `ext` (extension) and `mime` (mime type) property.

Detected by the standard detections or a previous custom detection. Undefined if no matching fileTypeResult could be found.

## Related

- [file-type-cli](https://github.com/sindresorhus/file-type-cli) - CLI for this module
Expand Down