Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infer::get for R: Read #51

Open
StephanvanSchaik opened this issue Nov 20, 2021 · 0 comments
Open

infer::get for R: Read #51

StephanvanSchaik opened this issue Nov 20, 2021 · 0 comments

Comments

@StephanvanSchaik
Copy link

Currently infer just seems to offer infer::get for a buffer of bytes and infer::get_from_path for a path. However, the crates I am working with generally accept a R: Read or R: Read + Seek type, such that you can pass in a std::io::Cursor or std::io::File or really anything that just implements Read and perhaps Seek.

Another problem is that infer::get either assumes the user has the full byte buffer available or somehow knows the worst-case number of bytes required. The first option is not always feasible, for instance, in my particular use case I am working with archives which can easily end up being several hundreds of megabytes or even gigabytes.

I think it would be nice to have a function that accepts a R: Read, such that it can take as many bytes as it needs to figure out what the file type is. However, for this to work efficiently, it would probably be best to construct some sort of finite state machine from the known byte patterns that takes bytes until it knows what the actual file type is, such that for a ZIP archive it would only need to read 3 bytes and for a tarball it would need to read 261 bytes. Alternatively, you can sort the patterns by length and try each of them in ascending order to achieve the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant