Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent read operation?' #356

Closed
iwaduarte opened this issue Aug 5, 2021 · 12 comments
Closed

Concurrent read operation?' #356

iwaduarte opened this issue Aug 5, 2021 · 12 comments

Comments

@iwaduarte
Copy link

iwaduarte commented Aug 5, 2021

I am trying to read audio/video metadata with two distinct packages that uses your implementation.

       const fileType = require("file-type");
       const musicMetadata = require("music-metadata");
       const strtok3 = require('strtok3');
       
       const someHttpRequestStream = axios.get(url, { responseType: stream})
       
       const tokenizer = await strtok3.fromStream(someHttpRequestStream );
       
        const f =  fileType.fromTokenizer(tokenizer);
        const mm=  musicMetadata.parseFromTokenizer(tokenizer);
        const t = await Promise.all([ff, f]);

I got an error that says:

Error: Concurrent read operation?

if(this.request) throw new Error('Concurrent read operation?');

Is there anyway that I can achieve that ? I thought that maybe having a interface that allowed me to extract mimeType and extension from the library music-metadata (since it uses internally file-type if fileInfo is not provided) would be the best .

But since I could not find a way of getting duration, mime and extension in one go. I am looking for alternatives to use these packages in parallel by having an intermediate passthrough stream.

Could you help me here ?
@Borewit

@Borewit
Copy link
Owner

Borewit commented Aug 5, 2021

You can do things in parallel, there should not be any limitation there.
The error is within one of the stream handlings. Something got in an odd sequence, and that is why the error is thrown. This error should basically never occur and this is the first time I heard it is triggered.

I will be of the grid for at least 1 week. If you can prepare reliable reproduction, that would be very helpful.

@iwaduarte
Copy link
Author

Sorry I am not used to developers mainteners to answer that fast haha. Thanks!!
I have created a reliable reproduction it is pretty much what I have stated above.
You could check here:

https://runkit.com/iwaduarte/nodejs-strtok3-error

@Borewit
Copy link
Owner

Borewit commented Aug 13, 2021

@iwaduarte, you cannot read from a stream twice, neither from a tokenizer.

You can use fileType.stream() to generate a second artificial stream. Not that this file-type detection will fail if to much data is required to determine the file type. This is something which will easily happen with MP3 files, with a preceding ID3v2 tag header.

const fileType = require("file-type");
const musicMetadata = require("music-metadata");
const strtok3 = require('strtok3')
const axios = require('axios');
const url = 'https://test-audio.netlify.app/' + encodeURI('Various Artists - 2008 - netBloc Vol 13 - Color in a World of Monochrome [AAC-40]') + '/' + encodeURI('1.01. Sweet Man Like Me.m4a');

(async () => {
  const someHttpRequestStream = await axios.get(url, { responseType: "stream"})
  const stream = await fileType.stream(someHttpRequestStream.data);
  if (stream.fileType && stream.fileType.mime.startsWith('audio/')) {
    console.log('Found audio file' + stream.fileType.mime);
    const tokenizer = await strtok3.fromStream(stream);
    const mm = await musicMetadata.parseFromTokenizer(tokenizer);
    console.log(mm);
  }
})();

on runkit.com

Under normal conditions you would rely on the HTTP mime type:

const musicMetadata = require("music-metadata");
const strtok3 = require('strtok3')
const axios = require('axios');
const url = 'https://test-audio.netlify.app/' + encodeURI('Various Artists - 2008 - netBloc Vol 13 - Color in a World of Monochrome [AAC-40]') + '/' + encodeURI('1.01. Sweet Man Like Me.m4a');

(async () => {
  const someHttpRequestStream = await axios.get(url, { responseType: "stream"})
  if (someHttpRequestStream.headers['content-type'] && someHttpRequestStream.headers['content-type'].startsWith('audio/')) {
    const mm = await musicMetadata.parseStream(someHttpRequestStream.data);
    console.log(mm);
  }
})();

If you trust the HTTP MIME-type you can pass that to music-metadata, that will skip the internal type detection.

const musicMetadata = require("music-metadata");
const strtok3 = require('strtok3')
const axios = require('axios');
const url = 'https://test-audio.netlify.app/' + encodeURI('Various Artists - 2008 - netBloc Vol 13 - Color in a World of Monochrome [AAC-40]') + '/' + encodeURI('1.01. Sweet Man Like Me.m4a');

(async () => {
  const someHttpRequestStream = await axios.get(url, { responseType: "stream"});
  const contentType = someHttpRequestStream.headers['content-type'];
  if (contentType && contentType.startsWith('audio/')) {
    const mm = await musicMetadata.parseStream(someHttpRequestStream.data, {mimeType: contentType});
    console.log(mm);
  }
})();

Yet another option is to read from the URL twice. Once for the type, the second type for the data.

@iwaduarte
Copy link
Author

You can use fileType.stream() to generate a second artificial stream. Not that this file-type detection will fail if to much data is required to determine the file type. This is something which will easily happen with MP3 files, with a preceding ID3v2 tag header.

I have created 2 [stream.PassThrough] https://nodejs.org/api/stream.html#stream_class_stream_passthrough )(transform streams) and then I would send them directly to music-metadata and file-type stream methods. It does not work, it fails silently. However, if I use music-metadata with writing to the disk they work flawlessly.

const concurrentFunction = (stream, filesize, filePath) => {
    const saveLocally = stream.pipe(new stream.PassThrough());
    const musicMetadataStream = stream.pipe(new stream.PassThrough());
    saveLocally.pipe(fs.createWriteStream(filePath))
    const savePromise = new Promise((resolve, reject) => {
        saveLocally.on('end', () => {
            resolve({path: filePath, message: `File ${filePath} saved Locally`});
        })
        saveLocally.on('error', (err) => {
            reject(err);
        })
    });
    return Promise.all(
        [
            musicMetadata.parseStream(musicMetadataStream, {size: filesize}, {duration: true}),
            savePromise
        ]);
}

As you can see above I am reading concurrently the same stream. So I do not know why would that not work for those two packages. Am I missing something ?

Under normal conditions you would rely on the HTTP mime type:
If you trust the HTTP MIME-type you can pass that to music-metadata, that will skip the internal type detection.

Unfortunately I can not trust mime. I need the library to be able to determine if I have a valid file or not.

Yet another option is to read from the URL twice. Once for the type, the second type for the data.

That is ineffective and not a solution for my use case.

I would ask you also. Why could I not force use file-type inside the music-metadata as an option in the constructor configuration? Let's say I do not want to provide anything but let music-metadata to figure out that for me using file-type. Would be hard to implement that feature ? I know it is another repo. But if you think that could be done I could create a PR or issue there and continue the discussion

@Borewit
Copy link
Owner

Borewit commented Aug 18, 2021

As you can see above I am reading concurrently the same stream. So I do not know why would that not work for those two packages. Am I missing something ?

Apparently you can create kind of 2 duplicate streams with stream.PassThrough(). I pretty sure that these 2 streams (maybe I should say 3) do not behave entirely independent. By writing to a file, you at least have one consumer (stream reader) which reads the entire stream. That may not the case combining music-metadata and file-type. These will stop reading from the stream when they got they found all the metadata, respectively determined the file-type.

@iwaduarte
Copy link
Author

As you can see above I am reading concurrently the same stream. So I do not know why would that not work for those two packages. Am I missing something ?

Apparently you can create kind of 2 duplicate streams with stream.PassThrough(). I pretty sure that these 2 streams (maybe I should say 3) do not behave entirely independent. By writing to a file, you at least have one consumer (stream reader) which reads the entire stream. That may not the case combining music-metadata and file-type. These will stop reading from the stream when they got they found all the metadata, respectively determined the file-type.

They are not independent. They rely on the main source and its backpressure. So if one of passthrough stream is not reading it will stop the main stream. Therefore, I believe the fix would be to close the stream after stop reading. I am considering this error is on file-type I will try some tests at my side to confirm.

You have not answered my final question though.

@Borewit
Copy link
Owner

Borewit commented Aug 19, 2021

Why could I not force use file-type inside the music-metadata as an option in the constructor configuration?

That is actually the way it's done @iwaduarte. I case the file type is not provided by MIME-type or extension, it will fallback on file-type auto detection.

Yet this is not recommanded, as file-type is guessed on no more then an inital portion of the file. So if you have MP3, with 500 kb ID3v2 tag header (which is realistic if a large cover is embedded), it is likely not able to figure out is an MP3 file.

@iwaduarte
Copy link
Author

iwaduarte commented Aug 20, 2021

Why could I not force use file-type inside the music-metadata as an option in the constructor configuration?

That is actually the way it's done @iwaduarte. I case the file type is not provided by MIME-type or extension, it will fallback on file-type auto detection.

Yet this is not recommanded, as file-type is guessed on no more then an inital portion of the file. So if you have MP3, with 500 kb ID3v2 tag header (which is realistic if a large cover is embedded), it is likely not able to figure out is an MP3 file.

I know that it is the way you have it (I have checked the codebase). But what I meant is to have the output from file-type. Because it is more user friendly.

For instance { ext: 'mp3', mime: 'audio/mpeg' } is better than 'MPEG 1 Layer 3' or {ext: 'wav', mime: 'audio/vnd.wave'} is better than 'WAVE' and so on. I had to do a map sometimes on codec and sometimes on extension using your package. and that is very fragile.

I know that file-types operates using magic numbers. Of course would be better to provide the mime-type to your library but that would require use of file-type first (in my use case). Do you get my point here ?

And I haven't also seeing any referencing in their npm repo to the problem you are pointing out. (I am guessing you could take longer to identify these ID3V2 tag header ? Or this is a very specific use case ?).

In summary what I would like is to have the mime and ext offered in the final output. Specifically when the music-metadata had to rely on file-type. Does that make sense ?

@Borewit
Copy link
Owner

Borewit commented Aug 20, 2021

And I haven't also seeing any referencing in their npm repo to the problem you are pointing out. (I am guessing you could take longer to identify these ID3V2 tag header ? Or this is a very specific use case ?).

I am actually the one who raised the issue of the 4100 byte sample and I have refactored file-type overcode that issue. Only the fileTypeStream(readableStream, options?) has that limitation, since very recent now on a user defined sample size.

In summary what I would like is to have the mime and ext offered in the final output. Specifically when the music-metadata had to rely on file-type. Does that make sense ?

I wanted to show you the limitation with the following example:

const fs = require("fs");
const FileType = require('file-type');

const path = 'fixture/02 - Poxfil - Solid Ground.mp3';

(async () => {

  // File type detection with full file access
  const fileType = await FileType.fromFile(path);
  console.log(fileType); // expected to detect the MP3

  // File type detection use preceeding sample of 4100 bytes
  const stream1 = fs.createReadStream(path);
  const stream2 = await FileType.stream(stream1);
  console.log(stream2.fileType); // expected to not detect MP3
})();

The MP3 file:
02 - Poxfil - Solid Ground.zip

Actually it returned the right file type.

The output was:

{ ext: 'mp3', mime: 'audio/mpeg' }
{ ext: 'mp3', mime: 'audio/mpeg' }

I forgot that if just the ID3 header is found it will guess it's an MP3. I can promise you, I can make it guess wrong, as there are a few audio formats who also use ID3v2 headers.

Point is, for the best type determination, it may require parsing a very large portion of the file. That get you in trouble, if you read from a stream, want the super duper detection, and then decide what to do with the stream.

For that very same reason I cannot rely on the super duper file detection of file-type in music-metadata.
But in general I am only interested in the container type. I don't care is it an .mka, .mkv or webm, these are all using the Matroska container.

Can the audio file type be specified even better potentially in music-metadata? Yes that would be possible. Should music-metadata be kind of plugin then of file-type. We thought about specialized plugins indeed.

Can we return the most suitable mime-type based on the content in music-metadata? Yeah, that can be made so,

@iwaduarte
Copy link
Author

iwaduarte commented Aug 21, 2021

And I haven't also seeing any referencing in their npm repo to the problem you are pointing out. (I am guessing you could take longer to identify these ID3V2 tag header ? Or this is a very specific use case ?).

Actually it returned the right file type.

The output was:

{ ext: 'mp3', mime: 'audio/mpeg' }
{ ext: 'mp3', mime: 'audio/mpeg' }

I forgot that if just the ID3 header is found it will guess it's an MP3. I can promise you, I can make it guess wrong, as there are a few audio formats who also use ID3v2 headers.

Awesome example but I guess I could also make music-metadata guess wrong if I provide the wrong mime. Couldn't I ? I think I have tested that before.

Point is, for the best type determination, it may require parsing a very large portion of the file. That get you in trouble, if you read from a stream, want the super duper detection, and then decide what to do with the stream.

For that very same reason I cannot rely on the super duper file detection of file-type in music-metadata.

I do not understand here. Are you saying that music-metadata goes further in detecting after file-type step ? I thought it was 2 ordered steps:

  1. Use mime
  2. Go for file-type
  3. Do we have any other step here ?

But in general I am only interested in the container type. I don't care is it an .mka, .mkv or webm, these are all using the Matroska container.

Can the audio file type be specified even better potentially in music-metadata? Yes that would be possible. Should music-metadata be kind of plugin then of file-type. We thought about specialized plugins indeed.

I think that it is not the way to go is it ? I guess metadata surpass file type extension. How would you put let's say duration property inside file-type ?

Can we return the most suitable mime-type based on the content in music-metadata? Yeah, that can be made so

Awesome. Should I create an issue for that (or maybe we should just refer this discussion) ?

@Borewit
Copy link
Owner

Borewit commented Aug 22, 2021

Awesome example but I guess I could also make music-metadata guess wrong if I provide the wrong mime. Couldn't I ? I think I have tested that before.

I am sorry, no, you cannot. If the MIME-type is provided there is nothing to guess. You probably mean, that you can provide non complaint data and you get some error. Sure.
The way data should be interpreted is specified via the MIME-type or maybe derived from file extension. To allow maybe some use cases where data is unknown, there is a bit of magic in their to determine the type based on the content, yet this is not the recommended way, as determine the file-type via the content is not a reliable mechanism by itself and in unconventional for good reasons.

I do not understand here. Are you saying that music-metadata goes further in detecting after file-type step ? I thought it was 2 ordered steps:

  1. Use mime
  2. Go for file-type
  3. Do we have any other step here ?

The get music-metadata start parsing, the right container (parser) must be selected. As explained above, preferably the MIME-type or file extension is used for that. Only when those are not available, file-type is used for content based detection.

Some extensions or MIME-type are depending on the codec being used.

For example, somewhere, in the ASF parser we may say, this should be an audio/x-ms-wma or based on the WMA codec or audio/x-ms-wmv because there is video stream present as well. That is typical where the distinction of file-type ends or start to be flaky. I would have preferred audio/asf should video/asf would have been standardized for any ASF based format. But Microsoft choose to introduce a bunch of new ones just to distinct their propitiatory codec type.

Awesome. Should I create an issue for that (or maybe we should just refer this discussion) ?

Maybe it is good if you create a new issue and summary what the plan is. As this subject is my opinion tricky and easily causes misunderstandings, I think it is import to clear upfront on what the change should be.

@crossan007
Copy link

crossan007 commented Feb 24, 2022

I'm seeing something similar here; where I have two PassThrough streams; into which I've piped a main stream; the async call to file-type's fromStream does not complete until the PassThrough stream I've given it is "ended".

The blocking async call seems to be the async read in peek-readable's (a dependency of strtok3) StreamReader.js

A simple, hacky workaround I implemented was to "end" the PassThrough stream given to file-type as soon as it becomes readable after making the call to fromStream:

const fileTypeP = FileType.fromStream(stream);
stream.on("readable",()=>{
    stream.end();
 });
const fileType = await fileTypeP

I created a more detailed report here: sindresorhus/file-type#532 (comment)

@Borewit Borewit closed this as completed Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants