New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change proposal: break the fixed sample limit #248
Comments
This sounds reasonable. I think it should still be possible to pass a partial file as a buffer for the detections that doesn't need the whole file. |
Fair enough. |
@sindresorhus can you please open a 'development' branch, e.g. stream-tokenizer, allowing to divide this effort over multiple PR's against that branch? |
@sindresorhus, implementation is ready for review. |
Limitation
Some file types cannot be determined reliable within the current 4k (or any reasonable single buffer limit). For example, an audio file with 600 kB ID3v2 header, may not be necessary an MP3 file. There are other audio formats which may be prefixed with an ID3v2 header.
Proposed solution
Using a stream based solution, we could read information, starting from the beginning required to determine file type. This is an approach I use music-metadata using strtok3 tokenizer, which supports peeking. It picks the most efficient way to jump to next required offset, depending if the underlying data is random accessible, like a file, blob or buffer. Or just read ahead in case of stream. This access abstraction can re-used from readable-web-to-node-stream.
The first 4 bytes will be read, no more.
It will keep requesting data at the calculated offsets, in sequential manner, until it can determine the file type. But no more then required.
As this is major change, and will introduce a number of specialized function, e.g.:
A dedicated module for browser usage (using is effectively an adapter) offers browser specific functions:
The browser part adapter is comparable with how I wrapped music-metadata with music-metadata-browser.
I love to hear your feedback to make an assessment if this proposal is worth implementing.
Looking forward to hear your feedback.
Related issues: #210
Implementation status
Development branch: stream-tokenizer
Progress / ToDo
/browser
sub-moduleThe text was updated successfully, but these errors were encountered: