Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for aac format #208

Closed
wants to merge 4 commits into from
Closed

Add support for aac format #208

wants to merge 4 commits into from

Conversation

Borewit
Copy link
Collaborator

@Borewit Borewit commented Apr 28, 2019

Change
Map AAC stored in a MPEG-4 ADTS container to explicit .aac extension instead of .m4.

Rational
To me it looks like only AAC profiles can be specified in an ADTS profile, the MPEG-4 Audio Object:

  1. AAC Main
  2. AAC LC (Low Complexity)
  3. AAC SSR (Scalable Sample Rate)
  4. AAC LTP (Long Term Prediction)

Therefor I think it is safe to map MPEG-4/ADTS to .aac.

Relates issue: #206

Related documentation:

@sindresorhus
Copy link
Owner

// @stroncium

@sindresorhus sindresorhus changed the title Support aac Add support for aac format Apr 29, 2019
Copy link
Contributor

@stroncium stroncium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a glance the whole idea seems about right.

I didn't check if any other profiles are possible though(which might require additional checks to not falsely mark them aac).

index.js Outdated
@@ -443,7 +443,7 @@ const fileType = input => {
check([0xFF, 0xF0], {offset: start, mask: [0xFF, 0xFC]}) // MPEG 4 layer 0 using ADTS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AAAAAAAA AAAABCCD

Letter Length (bits) Description
A 12 syncword 0xFFF, all bits must be 1
B 1 MPEG Version: 0 for MPEG-4, 1 for MPEG-2
C 2 Layer: always 0
D 1 protection absent, Warning, set to 1 if there is no CRC and 0 if there is CRC

2nd byte check should be 0b11110000(0xF0) with mask 0b11111110(0xFE) for aac then(and remaining options caught separately).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, shouldn't mpeg2 be caught too?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There actually already is an ADTS/MPEG-2/ACC test file, originating from PR #136. Currently with .mp2 extension, it looks like to reverse engineered the extension based on the MPEG version.

Okay, lets assume this is ext. .acc mime audio/acc candidate as well.

index.js Outdated
@@ -443,7 +443,7 @@ const fileType = input => {
check([0xFF, 0xF0], {offset: start, mask: [0xFF, 0xFC]}) // MPEG 4 layer 0 using ADTS
) {
return {
ext: 'mp4',
ext: 'aac',
mime: 'audio/mpeg'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Media type should probably be audio/aac, it's standardized.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Media type should probably be audio/aac, it's standardized.

I think you are right: (ref)

"audio/aac" for sequences of ADTS frames, as specified in ISO/IEC 14496-3:2009

@Borewit
Copy link
Collaborator Author

Borewit commented Apr 30, 2019

@stroncium thanks for your review, can you please check again?

@Borewit
Copy link
Collaborator Author

Borewit commented May 1, 2019

I think there is a reasonable risk to falsely mark files withing the MPEG / ADTS tree due to the MPEG sync-sequence scan. 0xFFE is not really a unique sync sequence,given the fact we apply this on any given file, and we allow it to be be anywhere in the sample.

Another vulnerability is the fall back scenario of ID3 to MP3 identification. This simply means any non-MP3 file, which is using ID3 headers with some cover art embedded (stretching the ID3 header easily beyond the sample size) will be falsely identified as MP3.

Searching the ID3 string within same sync detection loop is even worse, ID3 may also appear as a payload of a chunk / atom.

A simple solution improvement might be to decide to drop support of MPEG files, with the sync not at the beginning or right after the ID3 header where it should be. This will cause the fixture*offset*.mp3 to fail (already tried it).

But this issue was about improving the mapping of MPEG-2/4 ADTS / AAC files, let's get that done first.

@sindresorhus
Copy link
Owner

// @stroncium

if (check([0x10], {offset: start + 1, mask: [0x16]})) {
// Check for (ADTS) MPEG-2
if (check([0x08], {offset: start + 1, mask: [0x08]})) {
return {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this returns exactly the same as alternative

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean,

				// Check for (ADTS) MPEG-2
				if (check([0x08], {offset: start + 1, mask: [0x08]})) {
					return {
						ext: 'aac',
						mime: 'audio/aac'
					};
				}

				// Must be (ADTS) MPEG-4
				return {
					ext: 'aac',
					mime: 'audio/aac'
				};

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I created the distinction 'between MPEG-2 and MPEG-4.
I updated the return types of MPEG-4 pending a sample file for MPEG-2.

Because I wanted to have some evidence that this was done like that for files like that,

Then I found out such a fixture was already present. I updated the return type, and I thought, maybe good to the distinct to keep as is, just for clarity we covered both cases, and allow to pin point to that specific case if there is still an issue.

@stroncium
Copy link
Contributor

Another vulnerability is the fall back scenario of ID3 to MP3 identification. This simply means any non-MP3 file, which is using ID3 headers with some cover art embedded (stretching the ID3 header easily beyond the sample size) will be falsely identified as MP3.

I've never heard of ID3 being used anywhere except in mp3. However, here is my older critics:

Also, the whole mp2/3 part is kinda weird.

  • it is a stream, yes, and can be resumed, but it's weird to do that for the purpose of file type detection, as you can detect any file with embedded mp3 incorrectly. (libmagic doesn't do that for example)
  • even if there is no embedded mp3, the stream is well compressed and matching those simple checks at arbitrary offsets will bring false positives on most files around
  • checking just offsets 0 and 1 is completely counter-intuitive, it should be either all reachable offsets or just the start, otherwise it's just asking for hard to debug bugs
  • ID3 part doesn't belong there, ID3v2 is embedded either at beginning or end of file

And, if it is the way I think and ID3 it isn't used anywhere but in MP3, the skip part won't be needed(though won't really hurt too, false positive chance is nearly zero).

Regarding my original concern, what I was pointing to was possibility of other profiles existing on layer 0. However, after checking I see it doesn't seem to be this way.

Wouldn't it be better to do just buffer[position] & 0xFF === 0x00 style checks if we check single byte? (Definitely more readable for me, but that might be just me.)

@Borewit
Copy link
Collaborator Author

Borewit commented May 4, 2019

I've never heard of ID3 being used anywhere except in mp3

That is apparently not a very reliable indicator.

You don't know what you don't know.

Other formats which may contain ID3 v2 tag headers I know are:

  • ADTS/ACC
  • MP1 & MP2
  • FLAC (although non standard, it is done, applications allow to write these tag headers )
  • MusePack
  • DSF
  • DSDIFF
  • WAVE (non standard, but it is done, and supported by some applications)
  • AIFF
  • WavPack

The string "ID3" may appear at other places. Can be even the name of an artist like ID3 or SLIMKID3 or maybe the track title (ID3). Maybe a text file mentioning the ID3 algorithm.

And, if it is the way I think and ID3 it isn't used anywhere but in MP3, the skip part won't be needed(though won't really hurt too, false positive chance is nearly zero).

Which is based on a wrong assumption.

ID3 part doesn't belong there, ID3v2 is embedded either at beginning or end of file

If the ID3v2 does not appear at the beginning, but further on (and only further on), it is not an MP3 file.

ID3v2 typically appear at the beginning of the file. The only exception I am aware of is DSF (which I qualify as a design mistake). Maybe you are confused with APEv2.

I think we agree on the risk of false positives => created issue #210

This is about AAC, the rest we can cover in other issues.

@sindresorhus
Copy link
Owner

@Borewit What's left to finish this PR?

@sindresorhus
Copy link
Owner

Closing for now. Feel free to open a new PR when/if you have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants