Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend file type (updated) #603

Merged
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
bda3f46
Allow specification of custom detectors + readme update
Jul 14, 2023
6007eff
Simplify logic in runCustomDetectors
Jul 14, 2023
c3dba6e
add custom detectors to fileTypeFromStream
Jul 14, 2023
fab97ae
fix linting issue
Jul 14, 2023
c7c3190
Execute custom detectors before default ones
Jul 17, 2023
4bcddff
add tests
Jul 17, 2023
733bfac
fix docs
Jul 17, 2023
37e1e57
compatibility with Node.js 14 and 16
Jul 17, 2023
bfd18b1
Remove blank space
FredrikSchaefer Jul 25, 2023
ee4cb2c
Wrap custom detectors into file type options
Jul 25, 2023
ad6d44f
Merge branch 'extend-file-type-updated' of github.com:FredrikSchaefer…
Jul 25, 2023
29930bf
Adjust fileTypeFromFile(...) to recent changes
FredrikSchaefer Jul 25, 2023
7ea6efd
Moved custom detectors from function to constructor argument
Oct 17, 2023
748ffee
fix fileTypeStream (add back fileTypeOptions)
Oct 17, 2023
2adec69
Update documentation
Oct 17, 2023
0d1464c
add check for illegal tokenizer position change
Oct 18, 2023
6b6188c
Update core.d.ts
FredrikSchaefer Oct 23, 2023
6806753
Update core.d.ts
FredrikSchaefer Oct 23, 2023
61e052e
Update readme.md (move custom detectors section as suggested by revie…
Oct 23, 2023
eed198d
Remove fileType prefix from class member functions
Oct 24, 2023
ff84f3e
Make runCustomDetectors private
Oct 24, 2023
326ccd1
Add class based approach to fileTypeStream
Oct 24, 2023
011fa53
Change error handling for read operations of custom detectors
Oct 25, 2023
b346f7c
Remove obsolete @throws from documentation
Oct 25, 2023
9e24ed9
Make usage of FileTypeParser class consistent
Oct 25, 2023
a926bf2
Rename stream(...) to toDetectingStream(...)
Oct 25, 2023
5e2a0fd
Fix error handling
Oct 25, 2023
f38565d
Suggested changes to simplify code
Borewit Oct 25, 2023
e25c294
Merge pull request #2 from sindresorhus/extend-file-type-updated-sugg…
FredrikSchaefer Oct 25, 2023
080ac75
Fix TypeScript declaration
Oct 25, 2023
de706c5
Remove comments from unit tests and redundant empty line
Borewit Nov 6, 2023
331502d
Make code examples executable.
Borewit Nov 6, 2023
9d85f05
Remove empty comment lines
Borewit Nov 6, 2023
ede94d9
Remove unused `fileTypeOptions` parameter from typings
Borewit Nov 6, 2023
ca6e449
Adjust number code and comment style suggestions
Borewit Nov 10, 2023
a50e37a
Update core.d.ts
sindresorhus Nov 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
93 changes: 93 additions & 0 deletions core.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -421,6 +421,9 @@ export function fileTypeStream(readableStream: ReadableStream, options?: StreamO
/**
Detect the file type of a [`Blob`](https://nodejs.org/api/buffer.html#class-blob).

@param blob
Borewit marked this conversation as resolved.
Show resolved Hide resolved
@returns The detected file type and MIME type, or `undefined` when there is no match.
sindresorhus marked this conversation as resolved.
Show resolved Hide resolved

@example
```
import {fileTypeFromBlob} from 'file-type';
Expand All @@ -435,3 +438,93 @@ console.log(await fileTypeFromBlob(blob));
```
*/
export declare function fileTypeFromBlob(blob: Blob): Promise<FileTypeResult | undefined>;

/**
Function that allows specifying custom detection mechanisms.

An iterable of detectors can be provided via the fileTypeOptions argument for the {@link FileTypeParser.constructor}.
Borewit marked this conversation as resolved.
Show resolved Hide resolved

The detectors are called before the default detections in the provided order.

Custom detectors can be used to add new FileTypeResults or to modify return behaviour of existing FileTypeResult detections.
Borewit marked this conversation as resolved.
Show resolved Hide resolved

If the detector returns `undefined`, there are 2 possible scenarios:

1. The detector has not read from the tokenizer, it will be proceeded with the next available detector.
2. The detector has read from the tokenizer (`tokenizer.position` has been increased).
In that case no further detectors will be executed and the final conclusion is that file-type returns undefined.
Note that this an exceptional scenario, as the detector takes the opportunity from any other detector to determine the file type.
Borewit marked this conversation as resolved.
Show resolved Hide resolved

Example detector array which can be extended and provided via the fileTypeOptions argument:

```js
Borewit marked this conversation as resolved.
Show resolved Hide resolved
const customDetectors = [
async tokenizer => {
const unicornHeader = [85, 78, 73, 67, 79, 82, 78]; // "UNICORN" as decimal string
const buffer = Buffer.alloc(7);
await tokenizer.peekBuffer(buffer, {length: unicornHeader.length, mayBeLess: true});
if (unicornHeader.every((value, index) => value === buffer[index])) {
return {ext: 'unicorn', mime: 'application/unicorn'};
}
return undefined;
}
]

Example usage:

```js
Borewit marked this conversation as resolved.
Show resolved Hide resolved
const buffer = ...
const parser = new FileTypeParser({customDetectors});
const fileType = await parser.fromBuffer(buffer);

fromStream(...), fromTokenizer(...), fromBlob(...) and toDetectingStream(...) are available in the same manner.

@param tokenizer - An [`ITokenizer`](https://github.com/Borewit/strtok3#tokenizer) usable as source of the examined file.
@param fileType - FileTypeResult detected by the standard detections or a previous custom detection. Undefined if no matching fileTypeResult could be found.
@returns supposedly detected file extension and MIME type as a FileTypeResult-like object, or `undefined` when there is no match.
Borewit marked this conversation as resolved.
Show resolved Hide resolved
*/
export type Detector = (tokenizer: ITokenizer, fileType?: FileTypeResult) => Promise<FileTypeResult | undefined>;

export type FileTypeOptions = {
customDetectors?: Iterable<Detector>;
};

export declare class TokenizerPositionError extends Error {
constructor(message?: string);
}

export declare class FileTypeParser {
detectors: Iterable<Detector>;

constructor(options?: {customDetectors?: Iterable<Detector>});

/**
*
* Works the same way as {@link fileTypeFromBuffer}, additionally taking into account custom detectors (if any were provided to the constructor).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Works the same way as {@link fileTypeFromBuffer}, additionally taking into account custom detectors (if any were provided to the constructor).
Works the same way as {@link fileTypeFromBuffer}, additionally taking into account custom detectors (if any were provided to the constructor).

*/
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*/
*/

fromBuffer(buffer: Uint8Array | ArrayBuffer): Promise<FileTypeResult | undefined>;

/**
*
Borewit marked this conversation as resolved.
Show resolved Hide resolved
* Works the same way as {@link fileTypeFromStream}, additionally taking into account custom detectors (if any were provided to the constructor).
*/
fromStream(stream: ReadableStream): Promise<FileTypeResult | undefined>;

/**
*
* Works the same way as {@link fileTypeFromTokenizer}, additionally taking into account custom detectors (if any were provided to the constructor).
*/
fromTokenizer(tokenizer: ITokenizer): Promise<FileTypeResult | undefined>;

/**
*
* Works the same way as {@link fileTypeFromBlob}, additionally taking into account custom detectors (if any were provided to the constructor).
*/
fromBlob(blob: Blob): Promise<FileTypeResult | undefined>;

/**
*
* Works the same way as {@link fileTypeStream}, additionally taking into account custom detectors (if any were provided to the constructor).
*/
toDetectingStream(readableStream: ReadableStream, options?: StreamOptions): Promise<FileTypeResult | undefined>;
}
155 changes: 95 additions & 60 deletions core.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,31 +11,15 @@ import {extensions, mimeTypes} from './supported.js';
const minimumBytes = 4100; // A fair amount of file-types are detectable within this range.

export async function fileTypeFromStream(stream) {
const tokenizer = await strtok3.fromStream(stream);
try {
return await fileTypeFromTokenizer(tokenizer);
} finally {
await tokenizer.close();
}
return new FileTypeParser().fromStream(stream);
}

export async function fileTypeFromBuffer(input) {
if (!(input instanceof Uint8Array || input instanceof ArrayBuffer)) {
throw new TypeError(`Expected the \`input\` argument to be of type \`Uint8Array\` or \`Buffer\` or \`ArrayBuffer\`, got \`${typeof input}\``);
}

const buffer = input instanceof Uint8Array ? input : new Uint8Array(input);

if (!(buffer?.length > 1)) {
return;
}

return fileTypeFromTokenizer(strtok3.fromBuffer(buffer));
return new FileTypeParser().fromBuffer(input);
}

export async function fileTypeFromBlob(blob) {
const buffer = await blob.arrayBuffer();
return fileTypeFromBuffer(new Uint8Array(buffer));
return new FileTypeParser().fromBlob(blob);
}

function _check(buffer, headers, options) {
Expand All @@ -60,16 +44,98 @@ function _check(buffer, headers, options) {
}

export async function fileTypeFromTokenizer(tokenizer) {
try {
return new FileTypeParser().parse(tokenizer);
} catch (error) {
if (!(error instanceof strtok3.EndOfStreamError)) {
throw error;
return new FileTypeParser().fromTokenizer(tokenizer);
}

export class FileTypeParser {
constructor(options) {
this.detectors = options?.customDetectors;

this.fromTokenizer = this.fromTokenizer.bind(this);
this.fromBuffer = this.fromBuffer.bind(this);
this.parse = this.parse.bind(this);
}

async fromTokenizer(tokenizer) {
const initialPosition = tokenizer.position;

for (const detector of this.detectors || []) {
const fileType = await detector(tokenizer);
if (fileType) {
return fileType;
}

if (initialPosition !== tokenizer.position) {
return undefined; // Cannot proceed scanning of the tokenizer is at an arbitrary position
}
}

return this.parse(tokenizer);
}

async fromBuffer(input) {
if (!(input instanceof Uint8Array || input instanceof ArrayBuffer)) {
throw new TypeError(`Expected the \`input\` argument to be of type \`Uint8Array\` or \`Buffer\` or \`ArrayBuffer\`, got \`${typeof input}\``);
}

const buffer = input instanceof Uint8Array ? input : new Uint8Array(input);

if (!(buffer?.length > 1)) {
return;
}

return this.fromTokenizer(strtok3.fromBuffer(buffer));
}

async fromBlob(blob) {
const buffer = await blob.arrayBuffer();
return this.fromBuffer(new Uint8Array(buffer));
}

async fromStream(stream) {
const tokenizer = await strtok3.fromStream(stream);
try {
return await this.fromTokenizer(tokenizer);
} finally {
await tokenizer.close();
}
}

async toDetectingStream(readableStream, options = {}) {
Borewit marked this conversation as resolved.
Show resolved Hide resolved
const {default: stream} = await import('node:stream');
const {sampleSize = minimumBytes} = options;

return new Promise((resolve, reject) => {
readableStream.on('error', reject);

readableStream.once('readable', () => {
(async () => {
try {
// Set up output stream
const pass = new stream.PassThrough();
const outputStream = stream.pipeline ? stream.pipeline(readableStream, pass, () => {}) : readableStream.pipe(pass);

// Read the input stream and detect the filetype
const chunk = readableStream.read(sampleSize) ?? readableStream.read() ?? Buffer.alloc(0);
try {
pass.fileType = await this.fromBuffer(chunk);
} catch (error) {
if (error instanceof strtok3.EndOfStreamError) {
pass.fileType = undefined;
} else {
reject(error);
}
}

resolve(outputStream);
} catch (error) {
reject(error);
}
})();
});
});
}
}

class FileTypeParser {
check(header, options) {
return _check(this.buffer, header, options);
}
Expand Down Expand Up @@ -211,7 +277,7 @@ class FileTypeParser {
}

await tokenizer.ignore(id3HeaderLength);
return fileTypeFromTokenizer(tokenizer); // Skip ID3 header, recursion
return this.fromTokenizer(tokenizer); // Skip ID3 header, recursion
}

// Musepack, SV7
Expand Down Expand Up @@ -1602,39 +1668,8 @@ class FileTypeParser {
}
}

export async function fileTypeStream(readableStream, {sampleSize = minimumBytes} = {}) {
const {default: stream} = await import('node:stream');

return new Promise((resolve, reject) => {
readableStream.on('error', reject);

readableStream.once('readable', () => {
(async () => {
try {
// Set up output stream
const pass = new stream.PassThrough();
const outputStream = stream.pipeline ? stream.pipeline(readableStream, pass, () => {}) : readableStream.pipe(pass);

// Read the input stream and detect the filetype
const chunk = readableStream.read(sampleSize) ?? readableStream.read() ?? Buffer.alloc(0);
try {
const fileType = await fileTypeFromBuffer(chunk);
pass.fileType = fileType;
} catch (error) {
if (error instanceof strtok3.EndOfStreamError) {
pass.fileType = undefined;
} else {
reject(error);
}
}

resolve(outputStream);
} catch (error) {
reject(error);
}
})();
});
});
export async function fileTypeStream(readableStream, options = {}) {
return new FileTypeParser().toDetectingStream(readableStream, options);
}

export const supportedExtensions = new Set(extensions);
Expand Down
1 change: 1 addition & 0 deletions fixture/fixture.unicorn
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
UNICORN FILE CONTENT
5 changes: 3 additions & 2 deletions index.d.ts
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
import type {FileTypeResult} from './core.js';
import type {FileTypeResult, FileTypeOptions} from './core.js';

/**
Detect the file type of a file path.

The file type is detected by checking the [magic number](https://en.wikipedia.org/wiki/Magic_number_(programming)#Magic_numbers_in_files) of the buffer.

@param path - The file path to parse.
@param fileTypeOptions - Optional: An options object including the `customDetectors` property as an Iterable of Detector functions. Those are called in the order provided.
@returns The detected file type and MIME type or `undefined` when there is no match.
*/
export function fileTypeFromFile(path: string): Promise<FileTypeResult | undefined>;
export function fileTypeFromFile(path: string, fileTypeOptions?: FileTypeOptions): Promise<FileTypeResult | undefined>;
Borewit marked this conversation as resolved.
Show resolved Hide resolved

export * from './core.js';
7 changes: 4 additions & 3 deletions index.js
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
import * as strtok3 from 'strtok3';
import {fileTypeFromTokenizer} from './core.js';
import {FileTypeParser} from './core.js';

export async function fileTypeFromFile(path) {
export async function fileTypeFromFile(path, fileTypeOptions) {
const tokenizer = await strtok3.fromFile(path);
try {
return await fileTypeFromTokenizer(tokenizer);
const parser = new FileTypeParser(fileTypeOptions);
return await parser.fromTokenizer(tokenizer);
} finally {
await tokenizer.close();
}
Expand Down