Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend file type (updated) #603

Merged

Conversation

FredrikSchaefer
Copy link
Contributor

Hey There :)

I came across this issue #340 and found that it's just what we need. As the initial PR is way outdated, I decided to start over from scratch.

Resolves #340.

@FredrikSchaefer
Copy link
Contributor Author

Gentle nudge towards @sindresorhus . We are really looking forward to using this feature in the next official version.

@sindresorhus
Copy link
Owner

I'm on vacation, so I won't be able to review this for another couple of weeks.

core.d.ts Outdated Show resolved Hide resolved
core.d.ts Outdated Show resolved Hide resolved
@FredrikSchaefer
Copy link
Contributor Author

I'm on vacation, so I won't be able to review this for another couple of weeks.

Thanks a lot for squeezing in time to send change requests, even on vacation! We will then see that we don't rely on quick action from your side. Enjoy your time off!

@Borewit
Copy link
Collaborator

Borewit commented Jul 25, 2023

I am on holiday as well, this time without a laptop. Back in 2 weeks.

@FredrikSchaefer
Copy link
Contributor Author

Another gentle nudge towards @sindresorhus and @Borewit , so this PR does not get lost in the the vortex of time.

@FredrikSchaefer
Copy link
Contributor Author

... And yet another gentle nudge to @sindresorhus . I'd really appreciate if you found the time for a review :-)

@FredrikSchaefer
Copy link
Contributor Author

Is there anything I can do to support the review of this PR?

@Borewit
Copy link
Collaborator

Borewit commented Sep 25, 2023

I will try find some time as well to review this PR @FredrikSchaefer. Please give me 2 weeks.

Copy link
Collaborator

@Borewit Borewit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your patience @FredrikSchaefer.

I think you have done a fantastic job. I just have some minor comments.

I will be off the grid for a bit more then a week. So sorry.

core.d.ts Outdated

Custom detectors can be used to add new FileTypeResults or to modify return behaviour of existing FileTypeResult detections.

If the detector returns `undefined`, the `tokenizer.position` should be 0 (unless it's a stream). That allows other detectors to parse the file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I agree with the "unless it's a stream". Essentially you can iterate to other detectors if you took a bite of the apple. Only peek is allowed, if read you you have consumed the tokenizer, which is very similar to a stream.

I fear this an area where we can expect a lot questions from users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for the comment.

Yeah, guess you're right about the lot of questions.

I just mindlessly took this information from this previous discussion.

Let me suggest a more detailed explanation here:

If the detector returns undefined, the tokenizer.position should typically be 0. This allows easy parsing by other detectors, unless subsequent custom detectors specify otherwise. Additionally, the detector shouldn't consume the tokenizer; while peeking is non-consuming, reading is.

What do you think of this?

I'm really open to anything here!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also my other comment: https://github.com/sindresorhus/file-type/pull/603/files#r1356979704

I suggest something like this:

If the detector returns no_match it is not allowed to read from the tokenizer (the tokeinzer.position must remain 0) otherwise following scanners will read from the wrong file offset.

If the detector return undefined the scanner is certain the file type cannot be determined, neither by other scanners.

no_match represents option 1 explained in here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your point that custom detectors should be able to interrupt detection. However, I see two small downsides of the suggested approach:

  1. The wording no_match does not really make clear to me whether it means no match for this detector, or no match at all.

  2. The standard FileTypeParser returns undefined when no file type could be recognized. Therefore requiring the custom detectors to return something else is a bit counter intuitive.

I therefore suggest to do it the other way around:

If the detector returns undefined, it is not allowed to read from the tokenizer (the tokenizer.position must remain 0) otherwise following scanners will read from the wrong file offset.

If the detector returns file_type_undetectable, the detector is certain the file type cannot be determined, even by other scanners. The FileTypeParser interrupts the parsing and immediately returns undefined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, one could argue that file_type_undetectable also does not clearly say whether it means file type undetectable for this detector or for all detectors, but it still makes it a bit clearer in my opinion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your point that custom detectors should be able to interrupt detection. However, I see two small downsides of the suggested approach:

  1. The wording no_match does not really make clear to me whether it means no match for this detector, or no match at all.

  2. The standard FileTypeParser returns undefined when no file type could be recognized. Therefore requiring the custom detectors to return something else is a bit counter intuitive.

I therefore suggest to do it the other way around:

If the detector returns undefined, it is not allowed to read from the tokenizer (the tokenizer.position must remain 0) otherwise following scanners will read from the wrong file offset.

If the detector returns file_type_undetectable, the detector is certain the file type cannot be determined, even by other scanners. The FileTypeParser interrupts the parsing and immediately returns undefined.

Sounds good to me. The second case can be also be something like, detector started reading but for some reason failed to determine the file-type. Not ideally, but it can happen. If the detector starts reading, there is no way back.

We could also check the position after each custom scanner. It may not be 0 actually, there is also an iterated use case with ID3 header. The position should be remain unchanged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Just pushed a commit taking care of that check.

core.d.ts Outdated Show resolved Hide resolved
core.d.ts Outdated Show resolved Hide resolved
Fredrik added 3 commits October 17, 2023 14:38
Added detectionImpossible to allow interruption by custom detectors
Updated doc of fileTypeFromFile
@FredrikSchaefer
Copy link
Contributor Author

Okay then. I adjusted the logic. The docstring you suggested is also included, I just polished it a bit.

Copy link

@midmarch midmarch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there

core.js Outdated Show resolved Hide resolved
core.js Outdated Show resolved Hide resolved
core.js Outdated Show resolved Hide resolved
core.d.ts Outdated Show resolved Hide resolved
@Borewit
Copy link
Collaborator

Borewit commented Oct 25, 2023

Forgive me, wrong account, same person.

Copy link
Collaborator

@Borewit Borewit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks a lot for your effort @FredrikSchaefer

For your final approval @sindresorhus.

midmarch

This comment was marked as resolved.

@Borewit
Copy link
Collaborator

Borewit commented Nov 1, 2023

@sindresorhus, kind reminder, please proceed with merging if you are happy with this one

core.js Outdated Show resolved Hide resolved
core.d.ts Outdated Show resolved Hide resolved
core.d.ts Outdated Show resolved Hide resolved
core.d.ts Show resolved Hide resolved
core.d.ts Outdated Show resolved Hide resolved
test.js Outdated Show resolved Hide resolved
test.js Outdated Show resolved Hide resolved
core.d.ts Outdated Show resolved Hide resolved
readme.md Outdated Show resolved Hide resolved
readme.md Outdated Show resolved Hide resolved
core.d.ts Outdated Show resolved Hide resolved
core.d.ts Show resolved Hide resolved
core.d.ts Outdated Show resolved Hide resolved
core.d.ts Outdated Show resolved Hide resolved
core.d.ts Outdated Show resolved Hide resolved
readme.md Show resolved Hide resolved
readme.md Outdated Show resolved Hide resolved
core.d.ts Outdated Show resolved Hide resolved
core.d.ts Show resolved Hide resolved
test.js Outdated Show resolved Hide resolved
@Borewit Borewit force-pushed the extend-file-type-updated branch 3 times, most recently from e62ca5d to 4164767 Compare November 10, 2023 17:56
@sindresorhus sindresorhus merged commit f5b232c into sindresorhus:main Nov 10, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extending file-type
4 participants