Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tokenizer #262

Merged
merged 10 commits into from
Dec 20, 2019
Merged

Use tokenizer #262

merged 10 commits into from
Dec 20, 2019

Conversation

Borewit
Copy link
Collaborator

@Borewit Borewit commented Nov 28, 2019

Implementation of Change proposal: break the fixed sample limit #248:

Migrating most file detection to tokenizer

Improved parsing using tokenizer of:

  1. PNG parser (detecting APNG) demonstrates shows how much more elegant and deeper parser is possible if required
  2. matroska sub-type selection
  3. zip sub-type selection
  4. ASF
  5. MPEG / ID3

As this involves significant effort, it would be good to hear your are happy with approach you would like me to take @sindresorhus.

Copy link
Owner

@sindresorhus sindresorhus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than nitpick, these changes are looking good.

index.d.ts Outdated Show resolved Hide resolved
index.d.ts Outdated
/**
Detect the file type of a `Buffer`/`Uint8Array`/`ArrayBuffer`. The file type is detected by checking the [magic number](https://en.wikipedia.org/wiki/Magic_number_(programming)#Magic_numbers_in_files) of the buffer.
Detect the file type of a `Buffer`/`Uint8Array`/`ArrayBuffer`.
The file type is detected by checking the [magic number](https://en.wikipedia.org/wiki/Magic_number_(programming)#Magic_numbers_in_files) of the buffer.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't do unrelated changes.

package.json Outdated Show resolved Hide resolved
@@ -64,7 +64,6 @@ const names = {
],
mp3: [
'fixture',
'fixture-offset1-id3',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this fixture removed?

Copy link
Collaborator Author

@Borewit Borewit Dec 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this fixture because I don't think it is a realistic representation of an MP3 file:

  1. It is not normal an ID3 header starts at offset 1 or any other arbitrary offset in combination with an MPEG stream.
  2. Common tools like Mp3Tag or Foobar2000, and a low level MPEG frame decoder I use, cannot make any sense of this file (neither from the MPEG frames).

I am not sure if it was this file, or another file from the same source, even the extension appeared to be wrong. This kind of fixtures are just causing confusion.

util.js Outdated Show resolved Hide resolved
get: (buf, off) => {
return (buf[off + 3] & 0x7F) | ((buf[off + 2]) << 7) | ((buf[off + 1]) << 14) | ((buf[off]) << 21);
},
len: 4
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
len: 4
length: 4

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this part if the token interface.
Cannot do that, will break the token.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. No worries. Although, I would recommend taking a look at https://github.com/sindresorhus/eslint-plugin-unicorn/blob/master/docs/rules/prevent-abbreviations.md It really makes code more readable to not use abbreviations.

Copy link
Collaborator Author

@Borewit Borewit Dec 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, despite the fact I cheat sometimes for good or bad reasons. I am firm believer in readable code. If that takes a bit more text that is fine. There are other ways to optimize, if that is the reason to deviate from writing verbose code. Easily being able to understand what is going on is one of them.

I did not design the token interface btw, I inherited it via strtok2 (which I own and maintained and extended, but I did not write it initially). I wrote strtok3 replacing the events with promises which allowed the tokens to be typed. The tokens remained intact.

@sindresorhus sindresorhus merged commit 96671a5 into sindresorhus:stream-tokenizer Dec 20, 2019
*/
readonly mimeTypes: fileType.MimeType;
*/
const mimeTypes: fileType.MimeType;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: eff47af

@Borewit
Copy link
Collaborator Author

Borewit commented Dec 21, 2019

Thanks a lot for merging @sindresorhus, allows to move on.

@Borewit Borewit deleted the use-tokenizer branch December 21, 2019 10:57
This was referenced Jan 1, 2020
@Borewit Borewit mentioned this pull request Feb 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants